My research has been focusing for more than 10 years on metabolomics data analysis and systems biology.

Metabolomics (untargeted analyis of small molecules involved in biochemical reactions) is of major interest for phenotype characterization and biomarker discovery. High-resolution mass spectrometry (HRMS) is a technology of choice for metabolomics due to its sensitiviy and resolution.

Analysis of metabolomics data (i.e. Computational metabolomics) is challenging for three reasons. First, the signal generated by mass spectrometry instruments is specific (retention time dimension, heteroscedastic noise, analytical drift). Second, as in other omics, the number of detected features is higher than the number of samples, and many variables are correlated. Third, structural characterization of the metabolites based on their mass and fragmentation pattern is often only partial due to the chemical diversity of the metabolome. Computational metabolomics therefore requires developments in signal processing, statistics, and chemoinformatics.

Computational metabolomics

I first implemented the Orthogonal Partial Least-Squares (OPLS) approach for regression and classification from Trygg and Wold (2002) as an R package named ropls (Thévenot et al, 2015). OPLS algorithm is a variation of PLS and allows to model separately the orthogonal variation (i.e. non-correlated to the response) and the predictive variation (i.e. correlated to the response), and thus facilitates model interpretation.

ropls

Philippe Rinaudo and myself then developed a new methodology for feature selection, of the wrapper type, which assesses the significance of the features for the model performance (biosigner R package). The wrapping of three classifiers (PLS-DA, Random Forest and Support Vector Machine) with this methodology resulted in stable signatures of restricted size, when applied to real metabolomics and transcriptomics datasets (Rinaudo et al, 2016).

biosigner

All algorithms are available both as R packages (Bioconductor repository) and Galaxy modules (Workflow4Metabolomics online infrastructure jointly developed and maintained by the French Institute of Bioinformatics and MetaboHUB).

W4M