PhenoFeatureFinder.utils ======================== .. py:module:: PhenoFeatureFinder.utils Functions --------- .. autoapisummary:: PhenoFeatureFinder.utils.median_of_ratios_normalisation PhenoFeatureFinder.utils.calculate_percentile PhenoFeatureFinder.utils.compute_metrics_classification PhenoFeatureFinder.utils.plot_confusion_matrix PhenoFeatureFinder.utils.extract_samples_to_condition Module Contents --------------- .. py:function:: median_of_ratios_normalisation(_data: pandas.DataFrame) -> pandas.DataFrame Normalize a dataframe with the median of ratios method from DESeq2. input data (as a pandas dataframe), e.g.: sample1 sample2 sample3 gene1 0.00000 10.0000 4.00000 gene2 2.00000 6.00000 12.0000 gene3 33.5000 55.0000 200.000 normalized output: sample1 sample2 sample3 gene1 0.00000 10.6444 1.57882 gene2 4.76032 6.38664 4.73646 gene3 78.5453 58.5442 78.9410 .. rubric:: References StatQuest: https://www.youtube.com/watch?v=UFB993xufUU HBC Harvard: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html .. py:function:: calculate_percentile(df, my_percentile=50) Compute the q-th percentile of data. Returns the q-th percentile of the array elements. :param my_percentile: Percentile which must be between 0 and 100. :type my_percentile: float, optional .. seealso:: :obj:`numpy.percentile` :obj:`https` //numpy.org/doc/stable/reference/generated/numpy.percentile.html .. py:function:: compute_metrics_classification(y_predictions, y_trues, positive_class) Compute a series of metrics for classification tasks Util function designed to work downstream of the search for the best model. Will compute the following metrics: - balanced accuracy - precision - recall - f1 score :param y_predictions: List of class predictions. :type y_predictions: list :param y_trues: List of the true values (from the test set) :type y_trues: list :param positive_class: The name of the positive class for calculation of true positives, true negatives, etc. :type positive_class: str :returns: **model_metrics_df** -- Dataframe with the balanced accuracy, precision, recall and f1 score calculated. :rtype: `pandas.core.frame.DataFrame` .. seealso:: :obj:`https` //scikit-learn.org/stable/modules/model_evaluation.html .. py:function:: plot_confusion_matrix(y_predictions, y_trues) Plot confusion matrix :param y_predictions: List of class predictions. :type y_predictions: list :param y_trues: List of the true values (from the test set) :type y_trues: list :param positive_class: The name of the positive class for calculation of true positives, true negatives, etc. :type positive_class: str :returns: **model_metrics_df** -- Dataframe with the balanced accuracy, precision, recall and f1 score calculated. :rtype: `pandas.core.frame.DataFrame` .. seealso:: :obj:`https` //scikit-learn.org/stable/modules/model_evaluation.html .. py:function:: extract_samples_to_condition(df, name_grouping_var='genotype', separator_replicates='_') A utility function to extract the grouping factor (e.g. 'genotype') from sample names. Uses melting (wide to long) and split grouping variable from biological replicates using specified separator. :param df: :type df: pandas.core.DataFrame :param name_grouping_var: Name of the variable used as grouping variable (default is 'genotype'). :type name_grouping_var: str, optional :param separator_replicates: The separator between the grouping variable and the biological replicates ( default is underscore '_') :type separator_replicates: str, optional :rtype: A dataframe with the correspondence between samples and experimental condition (grouping variable). .. rubric:: Notes Input dataframe | genotypeA_rep1 | genotypeA_rep2 | genotypeA_rep3 | genotypeA_rep4 | |----------------|----------------|----------------|----------------| feature_id | metabolite1 | 1246 | 1245 | 12345 | 12458 | | metabolite2 | 0 | 0 | 0 | 0 | | metabolite3 | 10 | 0 | 0 | 154 | Output dataframe | sample | genotype | replicate | |--------------------|----------------|----------------| | genotypeA_rep1 | genotypeA | rep1 | | genotypeA_rep2 | genotypeA | rep2 | | genotypeA_rep3 | genotypeA | rep3 | | genotypeA_rep4 | genotypeA | rep4 | | etc.