PhenoFeatureFinder.utils

Functions

median_of_ratios_normalisation(→ pandas.DataFrame)

Normalize a dataframe with the median of ratios method

calculate_percentile(df[, my_percentile])

Compute the q-th percentile of data.

compute_metrics_classification(y_predictions, y_trues, ...)

Compute a series of metrics for classification tasks

plot_confusion_matrix(y_predictions, y_trues)

Plot confusion matrix

extract_samples_to_condition(df[, name_grouping_var, ...])

A utility function to extract the grouping factor (e.g. 'genotype') from sample names.

Module Contents

PhenoFeatureFinder.utils.median_of_ratios_normalisation(_data: pandas.DataFrame) pandas.DataFrame

Normalize a dataframe with the median of ratios method from DESeq2.

input data (as a pandas dataframe), e.g.:

sample1 sample2 sample3

gene1 0.00000 10.0000 4.00000 gene2 2.00000 6.00000 12.0000 gene3 33.5000 55.0000 200.000

normalized output:

sample1 sample2 sample3

gene1 0.00000 10.6444 1.57882 gene2 4.76032 6.38664 4.73646 gene3 78.5453 58.5442 78.9410

References

StatQuest: https://www.youtube.com/watch?v=UFB993xufUU HBC Harvard: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html

PhenoFeatureFinder.utils.calculate_percentile(df, my_percentile=50)

Compute the q-th percentile of data. Returns the q-th percentile of the array elements.

Parameters:

my_percentile (float, optional) – Percentile which must be between 0 and 100.

See also

numpy.percentile

https

//numpy.org/doc/stable/reference/generated/numpy.percentile.html

PhenoFeatureFinder.utils.compute_metrics_classification(y_predictions, y_trues, positive_class)

Compute a series of metrics for classification tasks

Util function designed to work downstream of the search for the best model. Will compute the following metrics:

  • balanced accuracy

  • precision

  • recall

  • f1 score

Parameters:
  • y_predictions (list) – List of class predictions.

  • y_trues (list) – List of the true values (from the test set)

  • positive_class (str) – The name of the positive class for calculation of true positives, true negatives, etc.

Returns:

model_metrics_df – Dataframe with the balanced accuracy, precision, recall and f1 score calculated.

Return type:

pandas.core.frame.DataFrame

See also

https

//scikit-learn.org/stable/modules/model_evaluation.html

PhenoFeatureFinder.utils.plot_confusion_matrix(y_predictions, y_trues)

Plot confusion matrix

Parameters:
  • y_predictions (list) – List of class predictions.

  • y_trues (list) – List of the true values (from the test set)

  • positive_class (str) – The name of the positive class for calculation of true positives, true negatives, etc.

Returns:

model_metrics_df – Dataframe with the balanced accuracy, precision, recall and f1 score calculated.

Return type:

pandas.core.frame.DataFrame

See also

https

//scikit-learn.org/stable/modules/model_evaluation.html

PhenoFeatureFinder.utils.extract_samples_to_condition(df, name_grouping_var='genotype', separator_replicates='_')

A utility function to extract the grouping factor (e.g. ‘genotype’) from sample names.

Uses melting (wide to long) and split grouping variable from biological replicates using specified separator.

Parameters:
  • df (pandas.core.DataFrame)

  • name_grouping_var (str, optional) – Name of the variable used as grouping variable (default is ‘genotype’).

  • separator_replicates (str, optional) – The separator between the grouping variable and the biological replicates ( default is underscore ‘_’)

Return type:

A dataframe with the correspondence between samples and experimental condition (grouping variable).

Notes

Input dataframe
genotypeA_rep1 | genotypeA_rep2 | genotypeA_rep3 | genotypeA_rep4 |

|----------------|—————-|----------------|—————-|

feature_id

metabolite1 | 1246 | 1245 | 12345 | 12458 |
metabolite2 | 0 | 0 | 0 | 0 |
metabolite3 | 10 | 0 | 0 | 154 |

Output dataframe

sample | genotype | replicate |

|--------------------|—————-|----------------| | genotypeA_rep1 | genotypeA | rep1 | | genotypeA_rep2 | genotypeA | rep2 | | genotypeA_rep3 | genotypeA | rep3 | | genotypeA_rep4 | genotypeA | rep4 | | etc.