scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis.

29 Feb 2012-Current Bioinformatics (Bentham Science Publishers)-Vol. 7, Iss: 1, pp 96-108
TL;DR: A state-of-the-art overview of the data processing tools available is provided, with their advantages and disadvantages, and comparisons are made to guide the reader.
Abstract: Biological systems are increasingly being studied in a holistic manner, using omics approaches, to provide quantitative and qualitative descriptions of the diverse collection of cellular components. Among the omics approaches, metabolomics, which deals with the quantitative global profiling of small molecules or metabolites, is being used extensively to explore the dynamic response of living systems, such as organelles, cells, tissues, organs and whole organisms, under diverse physiological and pathological conditions. This technology is now used routinely in a number of applications, including basic and clinical research, agriculture, microbiology, food science, nutrition, pharmaceutical research, environmental science and the development of biofuels. Of the multiple analytical platforms available to perform such analyses, nuclear magnetic resonance and mass spectrometry have come to dominate, owing to the high resolution and large datasets that can be generated with these techniques. The large multidimensional datasets that result from such studies must be processed and analyzed to render this data meaningful. Thus, bioinformatics tools are essential for the efficient processing of huge datasets, the characterization of the detected signals, and to align multiple datasets and their features. This paper provides a state-of-the-art overview of the data processing tools available, and reviews a collection of recent reports on the topic. Data conversion, pre-processing, alignment, normalization and statistical analysis are introduced, with their advantages and disadvantages, and comparisons are made to guide the reader.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This tutorial review aims to provide an introductory overview to several straightforward statistical methods such as principal component-discriminant function analysis (PC-DFA), support vector machines (SVM) and random forests (RF), which could very easily be used either to augment PLS or as alternative supervised learning methods to PLS-DA.

606 citations


Cites background from "Bioinformatics Tools for Mass Spect..."

  • ...Since metabolomics deals with large and highly complex datasets [19] here, we deliver what we would like to consider an introductory and essential explanation targeted, in some degree, toward researchers working in the exciting field of metabolomics, as well as others working with large and highly complex datasets (e....

    [...]

Journal ArticleDOI
TL;DR: Key recommendations made during the workshop included more coordination of efforts; development of new databases, software tools, and chemical libraries for the food metabolome; and shared repositories of metabolomic data.

402 citations


Cites methods from "Bioinformatics Tools for Mass Spect..."

  • ...The metabolic profile of raw data generated by the spectrometric analysis of biological samples can be analyzed in several steps (119, 120)....

    [...]

  • ...There are a large number of supervised methods (120, 127), with the most commonly used analysis for comparing 2 groups being partial least-squares discriminant analysis (PLS-DA) (132) or one of its several variants....

    [...]

Journal ArticleDOI
Patrycja Nowak-Sliwinska1, Kari Alitalo2, Elizabeth Allen3, Andrey Anisimov2, Alfred C. Aplin4, Robert Auerbach5, Hellmut G. Augustin6, Hellmut G. Augustin7, David O. Bates8, Judy R. van Beijnum9, R. Hugh F. Bender10, Gabriele Bergers11, Gabriele Bergers3, Andreas Bikfalvi12, Joyce Bischoff13, Barbara C. Böck6, Barbara C. Böck7, Peter C. Brooks14, Federico Bussolino15, Bertan Cakir13, Peter Carmeliet3, Daniel Castranova16, Anca Maria Cimpean, Ondine Cleaver17, George Coukos18, George E. Davis19, Michele De Palma20, Anna Dimberg21, Ruud P.M. Dings22, Valentin Djonov23, Andrew C. Dudley24, Neil Dufton25, Sarah-Maria Fendt3, Napoleone Ferrara26, Marcus Fruttiger27, Dai Fukumura13, Bart Ghesquière3, Bart Ghesquière28, Yan Gong13, Robert J. Griffin22, Adrian L. Harris29, Christopher C.W. Hughes10, Nan W. Hultgren10, M. Luisa Iruela-Arispe30, Melita Irving18, Rakesh K. Jain13, Raghu Kalluri31, Joanna Kalucka3, Robert S. Kerbel32, Jan Kitajewski33, Ingeborg Klaassen34, Hynda K. Kleinmann35, Pieter Koolwijk18, Elisabeth Kuczynski32, Brenda R. Kwak1, Koen Marien, Juan M. Melero-Martin13, Lance L. Munn13, Roberto F. Nicosia4, Agnès Noël36, Jussi Nurro37, Anna-Karin Olsson21, Tatiana V. Petrova38, Kristian Pietras, Roberto Pili39, Jeffrey W. Pollard40, Mark J. Post41, Paul H.A. Quax42, Gabriel A. Rabinovich43, Marius Raica, Anna M. Randi25, Domenico Ribatti44, Curzio Rüegg45, Reinier O. Schlingemann18, Reinier O. Schlingemann34, Stefan Schulte-Merker, Lois E.H. Smith13, Jonathan W. Song46, Steven A. Stacker47, Jimmy Stalin, Amber N. Stratman16, Maureen Van de Velde36, Victor W.M. van Hinsbergh18, Peter B. Vermeulen48, Johannes Waltenberger49, Brant M. Weinstein16, Hong Xin26, Bahar Yetkin-Arik34, Seppo Ylä-Herttuala37, Mervin C. Yoder39, Arjan W. Griffioen9 
University of Geneva1, University of Helsinki2, Katholieke Universiteit Leuven3, University of Washington4, University of Wisconsin-Madison5, Heidelberg University6, German Cancer Research Center7, University of Nottingham8, VU University Amsterdam9, University of California, Irvine10, University of California, San Francisco11, French Institute of Health and Medical Research12, Harvard University13, Maine Medical Center14, University of Turin15, National Institutes of Health16, University of Texas Southwestern Medical Center17, University of Lausanne18, University of Missouri19, École Polytechnique Fédérale de Lausanne20, Uppsala University21, University of Arkansas for Medical Sciences22, University of Bern23, University of Virginia24, Imperial College London25, University of California, San Diego26, University College London27, Flanders Institute for Biotechnology28, University of Oxford29, University of California, Los Angeles30, University of Texas MD Anderson Cancer Center31, University of Toronto32, University of Illinois at Chicago33, University of Amsterdam34, George Washington University35, University of Liège36, University of Eastern Finland37, Ludwig Institute for Cancer Research38, Indiana University39, University of Edinburgh40, Maastricht University41, Loyola University Medical Center42, National Scientific and Technical Research Council43, University of Bari44, University of Fribourg45, Ohio State University46, University of Melbourne47, University of Antwerp48, University of Münster49
TL;DR: In vivo, ex vivo, and in vitro bioassays that are available for the evaluation of angiogenesis are described and critical aspects that are relevant for their execution and proper interpretation are highlighted.
Abstract: The formation of new blood vessels, or angiogenesis, is a complex process that plays important roles in growth and development, tissue and organ regeneration, as well as numerous pathological conditions. Angiogenesis undergoes multiple discrete steps that can be individually evaluated and quantified by a large number of bioassays. These independent assessments hold advantages but also have limitations. This article describes in vivo, ex vivo, and in vitro bioassays that are available for the evaluation of angiogenesis and highlights critical aspects that are relevant for their execution and proper interpretation. As such, this collaborative work is the first edition of consensus guidelines on angiogenesis bioassays to serve for current and future reference.

397 citations

Journal ArticleDOI
TL;DR: The recent advances in metabolomics technologies have enabled a deeper investigation into the metabolism of cancer and a better understanding of how cancer cells use glycolysis, known as the “Warburg effect,” advantageously to produce the amino acids, nucleotides and lipids necessary for tumor proliferation and vascularization as discussed by the authors.
Abstract: Cancer is a devastating disease that alters the metabolism of a cell and the surrounding milieu. Metabolomics is a growing and powerful technology capable of detecting hundreds to thousands of metabolites in tissues and biofluids. The recent advances in metabolomics technologies have enabled a deeper investigation into the metabolism of cancer and a better understanding of how cancer cells use glycolysis, known as the “Warburg effect,” advantageously to produce the amino acids, nucleotides and lipids necessary for tumor proliferation and vascularization. Currently, metabolomics research is being used to discover diagnostic cancer biomarkers in the clinic, to better understand its complex heterogeneous nature, to discover pathways involved in cancer that could be used for new targets and to monitor metabolic biomarkers during therapeutic intervention. These metabolomics approaches may also provide clues to personalized cancer treatments by providing useful information to the clinician about the cancer patient’s response to medical interventions.

217 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations


"Bioinformatics Tools for Mass Spect..." refers methods in this paper

  • ...The FDR method [122], is commonly used in gene expression analyses, and is now also used in metabolomic studies, [11], where a large number of variables are analyzed simultaneously, and thus multiple comparisons are conducted....

    [...]

Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Journal ArticleDOI
TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.

17,017 citations

Journal ArticleDOI
TL;DR: Survival analyses on a subcohort of patients with locally advanced breast cancer uniformly treated in a prospective study showed significantly different outcomes for the patients belonging to the various groups, including a poor prognosis for the basal-like subtype and a significant difference in outcome for the two estrogen receptor-positive groups.
Abstract: The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A total of 85 cDNA microarray experiments representing 78 cancers, three fibroadenomas, and four normal breast tissues were analyzed by hierarchical clustering. As reported previously, the cancers could be classified into a basal epithelial-like group, an ERBB2-overexpressing group and a normal breast-like group based on variations in gene expression. A novel finding was that the previously characterized luminal epithelial/estrogen receptor-positive group could be divided into at least two subgroups, each with a distinctive expression profile. These subtypes proved to be reasonably robust by clustering using two different gene sets: first, a set of 456 cDNA clones previously selected to reflect intrinsic properties of the tumors and, second, a gene set that highly correlated with patient outcome. Survival analyses on a subcohort of patients with locally advanced breast cancer uniformly treated in a prospective study showed significantly different outcomes for the patients belonging to the various groups, including a poor prognosis for the basal-like subtype and a significant difference in outcome for the two estrogen receptor-positive groups.

10,791 citations


"Bioinformatics Tools for Mass Spect..." refers background in this paper

  • ...Although this example was not a metabolomics application, a particularly successful example of HCL involved the clustering of gene expression in breast cancer, which suggested the existence of a new subtype of breast cancer in addition to the known classes [150]....

    [...]

Journal ArticleDOI
TL;DR: An LC/MS-based data analysis approach, XCMS, which incorporates novel nonlinear retention time alignment, matched filtration, peak detection, and peak matching, and is demonstrated using data sets from a previously reported enzyme knockout study and a large-scale study of plasma samples.
Abstract: Metabolite profiling in biomarker discovery, enzyme substrate assignment, drug activity/specificity determination, and basic metabolic research requires new data preprocessing approaches to correlate specific metabolites to their biological origin. Here we introduce an LC/MS-based data analysis approach, XCMS, which incorporates novel nonlinear retention time alignment, matched filtration, peak detection, and peak matching. Without using internal standards, the method dynamically identifies hundreds of endogenous metabolites for use as standards, calculating a nonlinear retention time correction profile for each sample. Following retention time correction, the relative metabolite ion intensities are directly compared to identify changes in specific endogenous metabolites, such as potential biomarkers. The software is demonstrated using data sets from a previously reported enzyme knockout study and a large-scale study of plasma samples. XCMS is freely available under an open-source license at http://metlin...

3,963 citations


"Bioinformatics Tools for Mass Spect..." refers background or methods in this paper

  • ...A comparison of peak detection algorithms of LC-MS data using centWave [68], matched filter implemented in XCMS [53] and MZmine [56] showed that there was only a partial overlap in the results obtained with these methods, and a number of peaks were only detected by one software (not overlapped) [68]....

    [...]

  • ...[158] Benton HP, Wong DM, Trauger SA, Siuzdak G. XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization....

    [...]

  • ...Evaluation of the alignment of LC-MS data using six freely available software packages, including XCMS [53], MZmine [56], msInspect [103] and OpenMS [55], concluded that no single software perfectly aligned the datasets [104]....

    [...]

  • ...detection and peak alignment [53] Free R language *)...

    [...]

  • ...Typical data processing flow for MS data has been previously reviewed by Katajamaa and Ore i [34], and is now implemented in a variety of software packages [52-57]....

    [...]