scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

TL;DR: A new generation of a popular open-source data processing toolbox, MZmine 2 is introduced, suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses.
Abstract: Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2. A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings. Newly introduced functionality includes the identification of peaks using online databases, MSn data support, improved isotope pattern support, scatter plot visualization, and a new method for peak list alignment based on the random sample consensus (RANSAC) algorithm. The performance of the RANSAC alignment was evaluated using synthetic datasets as well as actual experimental data, and the results were compared to those obtained using other alignment algorithms. MZmine 2 is freely available under a GNU GPL license and can be obtained from the project website at: http://mzmine.sourceforge.net/ . The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An overview of the main functional modules and the general workflow of MetaboAnalyst 4.0 is provided, followed by 12 detailed protocols: © 2019 by John Wiley & Sons, Inc.
Abstract: MetaboAnalyst (https://www.metaboanalyst.ca) is an easy-to-use web-based tool suite for comprehensive metabolomic data analysis, interpretation, and integration with other omics data. Since its first release in 2009, MetaboAnalyst has evolved significantly to meet the ever-expanding bioinformatics demands from the rapidly growing metabolomics community. In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst supports a wide array of functions for statistical, functional, as well as data visualization tasks. Some of the most widely used approaches include PCA (principal component analysis), PLS-DA (partial least squares discriminant analysis), clustering analysis and visualization, MSEA (metabolite set enrichment analysis), MetPA (metabolic pathway analysis), biomarker selection via ROC (receiver operating characteristic) curve analysis, as well as time series and power analysis. The current version of MetaboAnalyst (4.0) features a complete overhaul of the user interface and significantly expanded underlying knowledge bases (compound database, pathway libraries, and metabolite sets). Three new modules have been added to support pathway activity prediction directly from mass peaks, biomarker meta-analysis, and network-based multi-omics data integration. To enable more transparent and reproducible analysis of metabolomic data, we have released a companion R package (MetaboAnalystR) to complement the web-based application. This article provides an overview of the main functional modules and the general workflow of MetaboAnalyst 4.0, followed by 12 detailed protocols: © 2019 by John Wiley & Sons, Inc. Basic Protocol 1: Data uploading, processing, and normalization Basic Protocol 2: Identification of significant variables Basic Protocol 3: Multivariate exploratory data analysis Basic Protocol 4: Functional interpretation of metabolomic data Basic Protocol 5: Biomarker analysis based on receiver operating characteristic (ROC) curves Basic Protocol 6: Time-series and two-factor data analysis Basic Protocol 7: Sample size estimation and power analysis Basic Protocol 8: Joint pathway analysis Basic Protocol 9: MS peaks to pathway activities Basic Protocol 10: Biomarker meta-analysis Basic Protocol 11: Knowledge-based network exploration of multi-omics data Basic Protocol 12: MetaboAnalystR introduction.

1,522 citations


Cites methods from "MZmine 2: Modular framework for pro..."

  • ...MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data....

    [...]

  • ...A number of locally installable, freely available MS spectral preprocessing tools are available, including MetAlign (Lommen & Kools, 2012), OpenMS (Röst et al., 2016), MZmine (Pluskal, Castillo, Villar-Briones, & Oresic, 2010), XCMS (Smith, Want, O’Maille, Abagyan, & Siuzdak, 2006), mzMatch (Scheltema, Jankevics, Jansen, Swertz, & Breitling, 2011), and MS-DIAL (Tsugawa et al., 2015)....

    [...]

  • ...Many bioinformatics tools have been developed to support spectral processing, such as XCMS (Smith et al., 2006), MZmine (Pluskal et al., 2010), OpenMS (Röst et al., 2016), and MS-DIAL (Tsugawa et al., 2015)....

    [...]

Journal ArticleDOI
TL;DR: Because of the inherent sensitivity of metabolomics, subtle alterations in biological pathways can be detected to provide insight into the mechanisms that underlie various physiological conditions and aberrant processes, including diseases.
Abstract: Metabolomics, which is the profiling of metabolites in biofluids, cells and tissues, is routinely applied as a tool for biomarker discovery. Owing to innovative developments in informatics and analytical technologies, and the integration of orthogonal biological approaches, it is now possible to expand metabolomic analyses to understand the systems-level effects of metabolites. Moreover, because of the inherent sensitivity of metabolomics, subtle alterations in biological pathways can be detected to provide insight into the mechanisms that underlie various physiological conditions and aberrant processes, including diseases.

1,440 citations

Journal ArticleDOI
21 Jul 2016-Nature
TL;DR: It is shown how the human gut microbiome impacts the serum metabolome and associates with insulin resistance in 277 non-diabetic Danish individuals and suggested that microbial targets may have the potential to diminish insulin resistance and reduce the incidence of common metabolic and cardiovascular disorders.
Abstract: Insulin resistance is a forerunner state of ischaemic cardiovascular disease and type 2 diabetes. Here we show how the human gut microbiome impacts the serum metabolome and associates with insulin resistance in 277 non-diabetic Danish individuals. The serum metabolome of insulin-resistant individuals is characterized by increased levels of branched-chain amino acids (BCAAs), which correlate with a gut microbiome that has an enriched biosynthetic potential for BCAAs and is deprived of genes encoding bacterial inward transporters for these amino acids. Prevotella copri and Bacteroides vulgatus are identified as the main species driving the association between biosynthesis of BCAAs and insulin resistance, and in mice we demonstrate that P. copri can induce insulin resistance, aggravate glucose intolerance and augment circulating levels of BCAAs. Our findings suggest that microbial targets may have the potential to diminish insulin resistance and reduce the incidence of common metabolic and cardiovascular disorders.

1,309 citations

Journal ArticleDOI
TL;DR: XCMS Online provides a solution for the complete untargeted metabolomic workflow including feature detection, retention time correction, alignment, annotation, statistical analysis, and data visualization.
Abstract: Recently, interest in untargeted metabolomics has become prevalent in the general scientific community among an increasing number of investigators. The majority of these investigators, however, do not have the bioinformatic expertise that has been required to process metabolomic data by using command-line driven software programs. Here we introduce a novel platform to process untargeted metabolomic data that uses an intuitive graphical interface and does not require installation or technical expertise. This platform, called XCMS Online, is a web-based version of the widely used XCMS software that allows users to easily upload and process liquid chromatography/mass spectrometry data with only a few mouse clicks. XCMS Online provides a solution for the complete untargeted metabolomic workflow including feature detection, retention time correction, alignment, annotation, statistical analysis, and data visualization. Results can be browsed online in an interactive, customizable table showing statistics, chrom...

1,045 citations

Journal ArticleDOI
TL;DR: It is shown that pre-programmed developmental processes in plants result in consistent patterns in the chemical composition of root exudates, which provides a mechanistic underpinning for the process of rhizosphere microbial community assembly and provides an attractive direction for the manipulation of the Rhizosphere microbiome for beneficial outcomes.
Abstract: Like all higher organisms, plants have evolved in the context of a microbial world, shaping both their evolution and their contemporary ecology. Interactions between plant roots and soil microorganisms are critical for plant fitness in natural environments. Given this co-evolution and the pivotal importance of plant-microbial interactions, it has been hypothesized, and a growing body of literature suggests, that plants may regulate the composition of their rhizosphere to promote the growth of microorganisms that improve plant fitness in a given ecosystem. Here, using a combination of comparative genomics and exometabolomics, we show that pre-programmed developmental processes in plants (Avena barbata) result in consistent patterns in the chemical composition of root exudates. This chemical succession in the rhizosphere interacts with microbial metabolite substrate preferences that are predictable from genome sequences. Specifically, we observed a preference by rhizosphere bacteria for consumption of aromatic organic acids exuded by plants (nicotinic, shikimic, salicylic, cinnamic and indole-3-acetic). The combination of these plant exudation traits and microbial substrate uptake traits interact to yield the patterns of microbial community assembly observed in the rhizosphere of an annual grass. This discovery provides a mechanistic underpinning for the process of rhizosphere microbial community assembly and provides an attractive direction for the manipulation of the rhizosphere microbiome for beneficial outcomes.

1,020 citations

References
More filters
Journal ArticleDOI
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Abstract: Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

24,024 citations

Journal ArticleDOI
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

23,396 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...The RANSAC algorithm [20] is a non-deterministic iterative algorithm that estimates parameters of a mathematical model from a set of observed data, which may include outliers....

    [...]

Journal ArticleDOI
TL;DR: A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Abstract: Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

8,195 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...The flexibility of MZmine 2, however, allows for easy expansion to other dataset types such as gas chromatography-MS, as well as interoperation with popular proteomics search engines such as MASCOT....

    [...]

  • ...For proteomic applications, a module allowing identification of peptide peaks using the MASCOT [19] search engine and MS/MS spectra is under development....

    [...]

Journal ArticleDOI
TL;DR: Locally weighted regression as discussed by the authors is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a function of the independent variables locally and in a moving fashion analogous to how a moving average is computed for a time series.
Abstract: Locally weighted regression, or loess, is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a function of the independent variables locally and in a moving fashion analogous to how a moving average is computed for a time series With local fitting we can estimate a much wider class of regression surfaces than with the usual classes of parametric functions, such as polynomials The goal of this article is to show, through applications, how loess can be used for three purposes: data exploration, diagnostic checking of parametric models, and providing a nonparametric regression surface Along the way, the following methodology is introduced: (a) a multivariate smoothing procedure that is an extension of univariate locally weighted regression; (b) statistical procedures that are analogous to those used in the least-squares fitting of parametric functions; (c) several graphical methods that are useful tools for understanding loess estimates and checking the a

5,188 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...Step 3: Apply the locally-weighted scatterplot smoothing (LOESS) method for regression [21] on all points in the model obtained with RANSAC....

    [...]

Journal ArticleDOI
TL;DR: METLIN includes an annotated list of known metabolite structural information that is easily cross-correlated with its catalogue of high-resolution Fourier transform mass spectrometry (FTMS) spectra, tandem mass spectrumetry (MS/MS) Spectra, and LC/MS data.
Abstract: Endogenous metabolites have gained increasing interest over the past 5 years largely for their implications in diagnostic and pharmaceutical biomarker discovery. METLIN (http://metlin.scripps.edu), a freely accessible web-based data repository, has been developed to assist in a broad array of metabolite research and to facilitate metabolite identification through mass analysis. METLINincludes an annotated list of known metabolite structural information that is easily cross-correlated with its catalogue of high-resolution Fourier transform mass spectrometry (FTMS) spectra, tandem mass spectrometry (MS/MS) spectra, and LC/MS data.

1,953 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...In MZmine 2, identification of peaks can be performed either by searching a custom database of m/z values and retention times, or by connecting to an online resource such as PubChem [15], KEGG [16], METLIN [17], or HMDB [18] directly from the MZmine 2 interface (Figure 4)....

    [...]

  • ...Smith CA, O’Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, Custodio DE, Abagyan R, Siuzdak G: METLIN: a metabolite mass spectral database....

    [...]

Related Papers (5)
Mingxun Wang, Jeremy Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don D. Nguyen, Jeramie D. Watrous, Clifford A. Kapono, Tal Luzzatto-Knaan, Carla Porto, Amina Bouslimani, Alexey V. Melnik, Michael J. Meehan, Wei-Ting Liu, Max Crüsemann, Paul D. Boudreau, Eduardo Esquenazi, Mario Sandoval-Calderón, Roland D. Kersten, Laura A. Pace, Robert A. Quinn, Katherine R. Duncan, Cheng-Chih Hsu, Dimitrios J. Floros, Ronnie G. Gavilan, Karin Kleigrewe, Trent R. Northen, Rachel J. Dutton, Delphine Parrot, Erin E. Carlson, Bertrand Aigle, Charlotte Frydenlund Michelsen, Lars Jelsbak, Christian Sohlenkamp, Pavel A. Pevzner, Anna Edlund, Anna Edlund, Jeffrey S. McLean, Jeffrey S. McLean, Jörn Piel, Brian T. Murphy, Lena Gerwick, Chih-Chuang Liaw, Yu-Liang Yang, Hans-Ulrich Humpf, Maria Maansson, Robert A. Keyzers, Amy C. Sims, Andrew R. Johnson, Ashley M. Sidebottom, Brian E. Sedio, Andreas Klitgaard, Charles B. Larson, Charles B. Larson, Cristopher A. Boya P., Daniel Torres-Mendoza, David Gonzalez, Denise Brentan Silva, Denise Brentan Silva, Lucas Miranda Marques, Daniel P. Demarque, Egle Pociute, Ellis C. O’Neill, Enora Briand, Enora Briand, Eric J. N. Helfrich, Eve A. Granatosky, Evgenia Glukhov, Florian Ryffel, Hailey Houson, Hosein Mohimani, Jenan J. Kharbush, Yi Zeng, Julia A. Vorholt, Kenji L. Kurita, Pep Charusanti, Kerry L. McPhail, Kristian Fog Nielsen, Lisa Vuong, Maryam Elfeki, Matthew F. Traxler, Niclas Engene, Nobuhiro Koyama, Oliver B. Vining, Ralph S. Baric, Ricardo Pianta Rodrigues da Silva, Samantha J. Mascuch, Sophie Tomasi, Stefan Jenkins, Venkat R. Macherla, Thomas Hoffman, Vinayak Agarwal, Philip G. Williams, Jingqui Dai, Ram P. Neupane, Joshua R. Gurr, Andrés M. C. Rodríguez, Anne Lamsa, Chen Zhang, Kathleen Dorrestein, Brendan M. Duggan, Jehad Almaliti, Pierre-Marie Allard, Prasad Phapale, Louis-Félix Nothias, Theodore Alexandrov, Marc Litaudon, Jean-Luc Wolfender, Jennifer E. Kyle, Thomas O. Metz, Tyler Peryea, Dac-Trung Nguyen, Danielle VanLeer, Paul Shinn, Ajit Jadhav, Rolf Müller, Katrina M. Waters, Wenyuan Shi, Xueting Liu, Lixin Zhang, Rob Knight, Paul R. Jensen, Bernhard O. Palsson, Kit Pogliano, Roger G. Linington, Marcelino Gutiérrez, Norberto Peporine Lopes, William H. Gerwick, William H. Gerwick, Bradley S. Moore, Bradley S. Moore, Pieter C. Dorrestein, Pieter C. Dorrestein, Nuno Bandeira, Nuno Bandeira