scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

TL;DR: A new generation of a popular open-source data processing toolbox, MZmine 2 is introduced, suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses.
Abstract: Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2. A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings. Newly introduced functionality includes the identification of peaks using online databases, MSn data support, improved isotope pattern support, scatter plot visualization, and a new method for peak list alignment based on the random sample consensus (RANSAC) algorithm. The performance of the RANSAC alignment was evaluated using synthetic datasets as well as actual experimental data, and the results were compared to those obtained using other alignment algorithms. MZmine 2 is freely available under a GNU GPL license and can be obtained from the project website at: http://mzmine.sourceforge.net/ . The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses.

Content maybe subject to copyright    Report

Citations
More filters
Posted ContentDOI
10 Feb 2020-bioRxiv
TL;DR: This work compute pairwise Spearman and Pearson correlation coefficients based on the LC-MS profiles and derive correlation networks by retaining only correlations higher than a threshold, and suggests a new approach for studying classification of cocoa samples with nested attributes of processing stage sample types and country of origin.
Abstract: In order to implement quality control measures and create fine flavor products, an important objective in cocoa processing industry is to realize standards for characterization of cocoa raw materials, intermediate and finished products with respect to their processing stages and countries of origin. Towards this end, various works have studied separability or distinguishability of cocoa samples belonging to various processing stages in a typical cocoa processing pipeline or to different origins. Limited amount of success has been possible in this direction in that unfermented and fermented cocoa samples have been shown to group into separate clusters in PCA. However, a clear clustering with respect to the country of origin has remained elusive. In this work we suggest an alternative approach to this problem through the framework of correlation networks. For 140 cocoa samples belonging to eight countries and three progressive stages in a typical cocoa processing pipeline we compute pairwise Spearman and Pearson correlation coefficients based on the LC-MS profiles and derive correlation networks by retaining only correlations higher than a threshold. Progressively increasing this threshold reveals, first, processing stage (or sample type) modules (or network clusters) at low and intermediate values of correlation threshold and then country specific modules at high correlation thresholds. We present both qualitative and quantitative evidence through network visualization and node connectivity statistics. Besides demonstrating separability of the two data properties via this network-based method, our work suggests a new approach for studying classification of cocoa samples with nested attributes of processing stage sample types and country of origin along with possibility of including additional factors, e.g., hybrid variety, etc. in the analysis.

2 citations


Cites methods from "MZmine 2: Modular framework for pro..."

  • ...Briefly, LC-MS data of all the samples was processed using MZMine (Pluskal et al., 2010) giving peak area list and corresponding m/z ratio and retention times....

    [...]

Book ChapterDOI
TL;DR: This chapter introduces methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel and gives an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring.
Abstract: In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.

2 citations

Journal ArticleDOI
TL;DR: In this paper , an LC-MS/MS analysis approach synchronized with feature-based molecular networks was adopted to offer a holistic overview of the metabolome diversity of the Egyptian Centaurea species.
Abstract: Centaurea is a genus compromising over 250 herbaceous flowering species and is used traditionally to treat several ailments. Among the Egyptian Centaurea species, C. lipii was reported to be cytotoxic against multidrug-resistant cancer cells. In this context, we aimed to explore the metabolome of C. lipii and compare it to other members of the genus in pursuance of identifying its bioactive principles. An LC-MS/MS analysis approach synchronized with feature-based molecular networks was adopted to offer a holistic overview of the metabolome diversity of the Egyptian Centaurea species. The studied plants included C. alexandrina, C. calcitrapa, C. eryngioides, C. glomerata, C. lipii, C. pallescens, C. pumilio, and C. scoparia. Their constitutive metabolome showed diverse chemical classes such as cinnamic acids, sesquiterpene lactones, flavonoids, and lignans. Linking the recorded metabolome to the previously reported cytotoxicity identified sesquiterpene lactones as the major contributors to this activity. To confirm our findings, bioassay-guided fractionation of C. lipii was adopted and led to the isolation of the sesquiterpene lactone cynaropicrin with an IC50 of 1.817 µM against the CCRF-CEM leukemia cell line. The adopted methodology highlighted the uniqueness of the constitutive metabolome of C. lipii and determined the sesquiterpene lactones to be the responsible cytotoxic metabolites.

2 citations

Journal ArticleDOI
26 May 2022-Toxins
TL;DR: Investigation of the venom of the Iranian scorpion Hottentotta saulcyi by applying mass-spectrometry-based proteomic and lipidomic approaches to assess the diversity of components revealed that the venom’s proteome composition is largely dominated by Na+- and K+-channel-impairing toxic peptides.
Abstract: Scorpion venom is a complex secretory mixture of components with potential biological and physiological properties that attracted many researchers due to promising applications from clinical and pharmacological perspectives. In this study, we investigated the venom of the Iranian scorpion Hottentotta saulcyi (Simon, 1880) by applying mass-spectrometry-based proteomic and lipidomic approaches to assess the diversity of components present in the venom. The data revealed that the venom’s proteome composition is largely dominated by Na+- and K+-channel-impairing toxic peptides, following the enzymatic and non-enzymatic protein families, e.g., angiotensin-converting enzyme, serine protease, metalloprotease, hyaluronidase, carboxypeptidase, and cysteine-rich secretory peptide. Furthermore, lipids comprise ~1.2% of the dry weight of the crude venom. Phospholipids, ether-phospholipids, oxidized-phospholipids, triacylglycerol, cardiolipins, very-long-chain sphingomyelins, and ceramides were the most intensely detected lipid species in the scorpion venom, may acting either independently or synergistically during the envenomation alongside proteins and peptides. The results provide detailed information on the chemical makeup of the venom, helping to improve our understanding of biological molecules present in it, leading to a better insight of the medical significance of the venom, and improving the medical care of patients suffering from scorpion accidents in the relevant regions such as Iran, Iraq, Turkey, and Afghanistan.

2 citations

Posted ContentDOI
03 Jun 2022-bioRxiv
TL;DR: This study provides molecular insight into cardenolide sequestration and highlights the great potential of mass spectrometry imaging for understanding the kinetics of multiple compounds including endogenous metabolites, plant toxins, or insecticides in insects.
Abstract: Although being famous for sequestering milkweed cardenolides, the mechanism of sequestration and where cardenolides are localized in caterpillars of the monarch butterfly (Danaus plexippus) is still unknown. While monarchs tolerate cardenolides by a resistant Na+/K+-ATPase, it is unclear how closely related species such as the non-sequestering common crow (Euploea core) cope with these toxins. Using novel atmospheric-pressure scanning microprobe matrix-assisted laser/desorption ionization mass spectrometry imaging, we compared the distribution of cardenolides in caterpillars of D. plexippus and E. core. Specifically, we tested at which physiological scale quantitative differences between both species are mediated and how cardenolides distribute across body tissues. Whereas D. plexippus sequestered most cardenolides from milkweed (Asclepias curassavica), no cardenolides were found in the tissues of E. core. Remarkably, quantitative differences already manifest in the gut lumen: while monarchs retain and accumulate cardenolides above plant concentrations, the toxins are degraded in the gut lumen of crows. We visualized cardenolide transport over the monarch midgut epithelium and identified integument cells as the final site of storage where defenses might be perceived by predators. Our study provides molecular insight into cardenolide sequestration and highlights the great potential of mass spectrometry imaging for understanding the kinetics of multiple compounds including endogenous metabolites, plant toxins, or insecticides in insects.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Abstract: Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

24,024 citations

Journal ArticleDOI
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

23,396 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...The RANSAC algorithm [20] is a non-deterministic iterative algorithm that estimates parameters of a mathematical model from a set of observed data, which may include outliers....

    [...]

Journal ArticleDOI
TL;DR: A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Abstract: Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

8,195 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...The flexibility of MZmine 2, however, allows for easy expansion to other dataset types such as gas chromatography-MS, as well as interoperation with popular proteomics search engines such as MASCOT....

    [...]

  • ...For proteomic applications, a module allowing identification of peptide peaks using the MASCOT [19] search engine and MS/MS spectra is under development....

    [...]

Journal ArticleDOI
TL;DR: Locally weighted regression as discussed by the authors is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a function of the independent variables locally and in a moving fashion analogous to how a moving average is computed for a time series.
Abstract: Locally weighted regression, or loess, is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a function of the independent variables locally and in a moving fashion analogous to how a moving average is computed for a time series With local fitting we can estimate a much wider class of regression surfaces than with the usual classes of parametric functions, such as polynomials The goal of this article is to show, through applications, how loess can be used for three purposes: data exploration, diagnostic checking of parametric models, and providing a nonparametric regression surface Along the way, the following methodology is introduced: (a) a multivariate smoothing procedure that is an extension of univariate locally weighted regression; (b) statistical procedures that are analogous to those used in the least-squares fitting of parametric functions; (c) several graphical methods that are useful tools for understanding loess estimates and checking the a

5,188 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...Step 3: Apply the locally-weighted scatterplot smoothing (LOESS) method for regression [21] on all points in the model obtained with RANSAC....

    [...]

Journal ArticleDOI
TL;DR: METLIN includes an annotated list of known metabolite structural information that is easily cross-correlated with its catalogue of high-resolution Fourier transform mass spectrometry (FTMS) spectra, tandem mass spectrumetry (MS/MS) Spectra, and LC/MS data.
Abstract: Endogenous metabolites have gained increasing interest over the past 5 years largely for their implications in diagnostic and pharmaceutical biomarker discovery. METLIN (http://metlin.scripps.edu), a freely accessible web-based data repository, has been developed to assist in a broad array of metabolite research and to facilitate metabolite identification through mass analysis. METLINincludes an annotated list of known metabolite structural information that is easily cross-correlated with its catalogue of high-resolution Fourier transform mass spectrometry (FTMS) spectra, tandem mass spectrometry (MS/MS) spectra, and LC/MS data.

1,953 citations


"MZmine 2: Modular framework for pro..." refers methods in this paper

  • ...In MZmine 2, identification of peaks can be performed either by searching a custom database of m/z values and retention times, or by connecting to an online resource such as PubChem [15], KEGG [16], METLIN [17], or HMDB [18] directly from the MZmine 2 interface (Figure 4)....

    [...]

  • ...Smith CA, O’Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, Custodio DE, Abagyan R, Siuzdak G: METLIN: a metabolite mass spectral database....

    [...]

Related Papers (5)
Mingxun Wang, Jeremy Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don D. Nguyen, Jeramie D. Watrous, Clifford A. Kapono, Tal Luzzatto-Knaan, Carla Porto, Amina Bouslimani, Alexey V. Melnik, Michael J. Meehan, Wei-Ting Liu, Max Crüsemann, Paul D. Boudreau, Eduardo Esquenazi, Mario Sandoval-Calderón, Roland D. Kersten, Laura A. Pace, Robert A. Quinn, Katherine R. Duncan, Cheng-Chih Hsu, Dimitrios J. Floros, Ronnie G. Gavilan, Karin Kleigrewe, Trent R. Northen, Rachel J. Dutton, Delphine Parrot, Erin E. Carlson, Bertrand Aigle, Charlotte Frydenlund Michelsen, Lars Jelsbak, Christian Sohlenkamp, Pavel A. Pevzner, Anna Edlund, Anna Edlund, Jeffrey S. McLean, Jeffrey S. McLean, Jörn Piel, Brian T. Murphy, Lena Gerwick, Chih-Chuang Liaw, Yu-Liang Yang, Hans-Ulrich Humpf, Maria Maansson, Robert A. Keyzers, Amy C. Sims, Andrew R. Johnson, Ashley M. Sidebottom, Brian E. Sedio, Andreas Klitgaard, Charles B. Larson, Charles B. Larson, Cristopher A. Boya P., Daniel Torres-Mendoza, David Gonzalez, Denise Brentan Silva, Denise Brentan Silva, Lucas Miranda Marques, Daniel P. Demarque, Egle Pociute, Ellis C. O’Neill, Enora Briand, Enora Briand, Eric J. N. Helfrich, Eve A. Granatosky, Evgenia Glukhov, Florian Ryffel, Hailey Houson, Hosein Mohimani, Jenan J. Kharbush, Yi Zeng, Julia A. Vorholt, Kenji L. Kurita, Pep Charusanti, Kerry L. McPhail, Kristian Fog Nielsen, Lisa Vuong, Maryam Elfeki, Matthew F. Traxler, Niclas Engene, Nobuhiro Koyama, Oliver B. Vining, Ralph S. Baric, Ricardo Pianta Rodrigues da Silva, Samantha J. Mascuch, Sophie Tomasi, Stefan Jenkins, Venkat R. Macherla, Thomas Hoffman, Vinayak Agarwal, Philip G. Williams, Jingqui Dai, Ram P. Neupane, Joshua R. Gurr, Andrés M. C. Rodríguez, Anne Lamsa, Chen Zhang, Kathleen Dorrestein, Brendan M. Duggan, Jehad Almaliti, Pierre-Marie Allard, Prasad Phapale, Louis-Félix Nothias, Theodore Alexandrov, Marc Litaudon, Jean-Luc Wolfender, Jennifer E. Kyle, Thomas O. Metz, Tyler Peryea, Dac-Trung Nguyen, Danielle VanLeer, Paul Shinn, Ajit Jadhav, Rolf Müller, Katrina M. Waters, Wenyuan Shi, Xueting Liu, Lixin Zhang, Rob Knight, Paul R. Jensen, Bernhard O. Palsson, Kit Pogliano, Roger G. Linington, Marcelino Gutiérrez, Norberto Peporine Lopes, William H. Gerwick, William H. Gerwick, Bradley S. Moore, Bradley S. Moore, Pieter C. Dorrestein, Pieter C. Dorrestein, Nuno Bandeira, Nuno Bandeira