PhenoFeatureFinder.utils
========================

.. py:module:: PhenoFeatureFinder.utils


Functions
---------

.. autoapisummary::

   PhenoFeatureFinder.utils.median_of_ratios_normalisation
   PhenoFeatureFinder.utils.calculate_percentile
   PhenoFeatureFinder.utils.compute_metrics_classification
   PhenoFeatureFinder.utils.plot_confusion_matrix
   PhenoFeatureFinder.utils.extract_samples_to_condition


Module Contents
---------------

.. py:function:: median_of_ratios_normalisation(_data: pandas.DataFrame) -> pandas.DataFrame

   Normalize a dataframe with the median of ratios method
   from DESeq2.


   input data (as a pandas dataframe), e.g.:

           sample1    sample2    sample3
   gene1   0.00000    10.0000    4.00000
   gene2   2.00000    6.00000    12.0000
   gene3   33.5000    55.0000    200.000

   normalized output:
           sample1    sample2    sample3
   gene1   0.00000    10.6444    1.57882
   gene2   4.76032    6.38664    4.73646
   gene3   78.5453    58.5442    78.9410

   .. rubric:: References

   StatQuest: https://www.youtube.com/watch?v=UFB993xufUU
   HBC Harvard: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html


.. py:function:: calculate_percentile(df, my_percentile=50)

   Compute the q-th percentile of data.
   Returns the q-th percentile of the array elements.

   :param my_percentile: Percentile which must be between 0 and 100.
   :type my_percentile: float, optional

   .. seealso::

      :obj:`numpy.percentile`

      :obj:`https`
          //numpy.org/doc/stable/reference/generated/numpy.percentile.html


.. py:function:: compute_metrics_classification(y_predictions, y_trues, positive_class)

   Compute a series of metrics for classification tasks

   Util function designed to work downstream of the search for the best model.
   Will compute the following metrics:
     - balanced accuracy
     - precision
     - recall
     - f1 score

   :param y_predictions: List of class predictions.
   :type y_predictions: list
   :param y_trues: List of the true values (from the test set)
   :type y_trues: list
   :param positive_class: The name of the positive class for calculation of true positives, true negatives, etc.
   :type positive_class: str

   :returns: **model_metrics_df** -- Dataframe with the balanced accuracy, precision, recall and f1 score calculated.
   :rtype: `pandas.core.frame.DataFrame`

   .. seealso::

      :obj:`https`
          //scikit-learn.org/stable/modules/model_evaluation.html


.. py:function:: plot_confusion_matrix(y_predictions, y_trues)

   Plot confusion matrix

   :param y_predictions: List of class predictions.
   :type y_predictions: list
   :param y_trues: List of the true values (from the test set)
   :type y_trues: list
   :param positive_class: The name of the positive class for calculation of true positives, true negatives, etc.
   :type positive_class: str

   :returns: **model_metrics_df** -- Dataframe with the balanced accuracy, precision, recall and f1 score calculated.
   :rtype: `pandas.core.frame.DataFrame`

   .. seealso::

      :obj:`https`
          //scikit-learn.org/stable/modules/model_evaluation.html


.. py:function:: extract_samples_to_condition(df, name_grouping_var='genotype', separator_replicates='_')

   A utility function to extract the grouping factor (e.g. 'genotype') from sample names.

   Uses melting (wide to long) and split grouping variable from biological replicates using specified separator.

   :param df:
   :type df: pandas.core.DataFrame
   :param name_grouping_var: Name of the variable used as grouping variable (default is 'genotype').
   :type name_grouping_var: str, optional
   :param separator_replicates: The separator between the grouping variable and the biological replicates ( default is underscore '_')
   :type separator_replicates: str, optional

   :rtype: A dataframe with the correspondence between samples and experimental condition (grouping variable).

   .. rubric:: Notes

   Input dataframe
                       | genotypeA_rep1 | genotypeA_rep2 | genotypeA_rep3 | genotypeA_rep4 |
                       |----------------|----------------|----------------|----------------|
         feature_id
       | metabolite1   |   1246         | 1245           | 12345          | 12458          |
       | metabolite2   |   0            | 0              | 0              | 0              |
       | metabolite3   |   10           | 0              | 0              | 154            |

   Output dataframe

       | sample             | genotype       | replicate      |
       |--------------------|----------------|----------------|
       | genotypeA_rep1     |   genotypeA    | rep1           |
       | genotypeA_rep2     |   genotypeA    | rep2           |
       | genotypeA_rep3     |   genotypeA    | rep3           |
       | genotypeA_rep4     |   genotypeA    | rep4           |
       | etc.