Top 2 papers published by David Enot from Institut Gustave Roussy in 2006

Journal Article•DOI•

Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals

[...]

David Enot¹, Manfred Beckmann, David P. Overy, John Draper•Institutions (1)

Aberystwyth University¹

03 Oct 2006-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A pair-wise comparison of related plant genotypes with strong phenotypic differences demonstrated that robust models are not only reproducible but also logically structured, highlighting correlated m/z derived from just a small number of explanatory metabolites reflecting the biological differences between sample classes.

...read moreread less

Abstract: Powerful algorithms are required to deal with the dimensionality of metabolomics data. Although many achieve high classification accuracy, the models they generate have limited value unless it can be demonstrated that they are reproducible and statistically relevant to the biological problem under investigation. Random forest (RF) generates models, without any requirement for dimensionality reduction or feature selection, in which individual variables are ranked for significance and displayed in an explicit manner. In metabolome fingerprinting by mass spectrometry, each metabolite can be represented by signals at several m/z. Exploiting a prior understanding of expected biochemical differences between sample classes, we aimed to develop meaningful metrics relevant to the significance both of the overall RF model and individual, potentially explanatory, signals. Pair-wise comparison of related plant genotypes with strong phenotypic differences demonstrated that robust models are not only reproducible but also logically structured, highlighting correlated m/z derived from just a small number of explanatory metabolites reflecting the biological differences between sample classes. RF models were also generated by using groupings of samples known to be increasingly phenotypically similar. Although classification accuracy was often reasonable, we demonstrated reproducibly in both Arabidopsis and potato a performance threshold based on margin statistics beyond which such models showed little structure indicative of either generalizibility or further biological interpretability. In a multiclass problem using 25 Arabidopsis genotypes, despite the complicating effects of ecotype background and secondary metabolome perturbations common to several mutations, the ranking of metabolome signals by RF provided scope for deeper interpretability.

...read moreread less

44 citations

Book Chapter•DOI•

On the interpretation of high throughput MS based metabolomics fingerprints with random forest

[...]

David Enot¹, Manfred Beckmann¹, John Draper¹•Institutions (1)

Aberystwyth University¹

27 Sep 2006

TL;DR: In this article, the importance of RF margins and variable significance as well as prediction accuracy is discussed to provide insight into model generalisability and explanatory power for the extraction of relevant biological knowledge from metabolomics fingerprinting experiments.

...read moreread less

Abstract: We discuss application of a machine learning method, Random Forest (RF), for the extraction of relevant biological knowledge from metabolomics fingerprinting experiments. The importance of RF margins and variable significance as well as prediction accuracy is discussed to provide insight into model generalisability and explanatory power. A method is described for detection of relevant features while conserving the redundant structure of the fingerprint data. The methodology is illustrated using two datasets from electrospray ionisation mass spectrometry from 27 Arabidopsis genotypes and a set of transgenic potato lines.

...read moreread less

6 citations

Showing papers by "David Enot published in 2006"