scispace - formally typeset
Search or ask a question
Author

Mathijs Vogelzang

Bio: Mathijs Vogelzang is an academic researcher from Institute for Systems Biology. The author has contributed to research in topics: Mass spectrometry data format & Data management. The author has an hindex of 2, co-authored 2 publications receiving 951 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The 'mzXML' format is introduced, an open, generic XML (extensible markup language) representation of MS data that will facilitate data management, interpretation and dissemination in proteomics research.
Abstract: A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.

788 citations

Journal ArticleDOI
TL;DR: A robust method is developed that detects high quality spectra within the fraction of spectra unassigned by conventional sequence database searching and computes a quality score for each spectrum and demonstrates that iterative search strategies applied to such detected high qualitySpectra significantly increase the number of specta that can be assigned from datasets and that biologically interesting new insights can be gained from existing data.

203 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The ProteoWizard Toolkit is developed, a robust set of open-source, software libraries and applications designed to facilitate proteomics research that implements the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats.
Abstract: Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the interpretation of vast proteomics datasets remains a challenge. There have been several calls for improvements and standardization of proteomics data analysis frameworks, as well as for an application-programming interface for proteomics data access4,5. In response, we have developed the ProteoWizard Toolkit, a robust set of open-source, software libraries and applications designed to facilitate proteomics research. The libraries implement the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats. In addition, diverse software classes enable rapid development of vendor-agnostic proteomics software. Additionally, ProteoWizard projects and applications, building upon the core libraries, are becoming standard tools for enabling significant proteomics inquiries.

2,480 citations

Journal ArticleDOI
TL;DR: This review presents an overview of the dynamically developing field of mass spectrometry-based metabolomics, a technique that analyzes all detectable analytes in a given sample with subsequent classification of samples and identification of differentially expressed metabolites, which define the sample classes.
Abstract: This review presents an overview of the dynamically developing field of mass spectrometry-based metabolomics. Metabolomics aims at the comprehensive and quantitative analysis of wide arrays of metabolites in biological samples. These numerous analytes have very diverse physico-chemical properties and occur at different abundance levels. Consequently, comprehensive metabolomics investigations are primarily a challenge for analytical chemistry and specifically mass spectrometry has vast potential as a tool for this type of investigation. Metabolomics require special approaches for sample preparation, separation, and mass spectrometric analysis. Current examples of those approaches are described in this review. It primarily focuses on metabolic fingerprinting, a technique that analyzes all detectable analytes in a given sample with subsequent classification of samples and identification of differentially expressed metabolites, which define the sample classes. To perform this complex task, data analysis tools, metabolite libraries, and databases are required. Therefore, recent advances in metabolomics bioinformatics are also discussed.

1,954 citations

Journal ArticleDOI
TL;DR: The ProteoWizard project provides a modular and extensible set of open-source, cross-platform tools and libraries that perform proteomics data analyses and enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access.
Abstract: Summary: The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library. Availability: Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at http://proteowizard.sourceforge.net. This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged. Contact: gro.draziwoetorp@nerrad; ude.alcu@garap Supplementary information: Supplementary data are available at Bioinformatics online.

1,611 citations

Journal ArticleDOI
09 Dec 2010-Nature
TL;DR: It is demonstrated that quantitative reactivity profiling can form the basis for screening and functional assignment of cysteines in computationally designed proteins, where it discriminated catalytically active from inactive cysteine hydrolase designs.
Abstract: Cysteine is the most intrinsically nucleophilic amino acid in proteins, where its reactivity is tuned to perform diverse biochemical functions The absence of a consensus sequence that defines functional cysteines in proteins has hindered their discovery and characterization Here we describe a proteomics method to profile quantitatively the intrinsic reactivity of cysteine residues en masse directly in native biological systems Hyper-reactivity was a rare feature among cysteines and it was found to specify a wide range of activities, including nucleophilic and reductive catalysis and sites of oxidative modification Hyper-reactive cysteines were identified in several proteins of uncharacterized function, including a residue conserved across eukaryotic phylogeny that we show is required for yeast viability and is involved in iron-sulphur protein biogenesis We also demonstrate that quantitative reactivity profiling can form the basis for screening and functional assignment of cysteines in computationally designed proteins, where it discriminated catalytically active from inactive cysteine hydrolase designs

1,295 citations

Book ChapterDOI
E.R. Davies1
01 Jan 1990
TL;DR: This chapter introduces the subject of statistical pattern recognition (SPR) by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates comparable with those of an ideal Bayes’ classifier.
Abstract: This chapter introduces the subject of statistical pattern recognition (SPR). It starts by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates comparable with those of an ideal Bayes’ classifier. The concepts of an optimal number of features, representativeness of the training data, and the need to avoid overfitting to the training data are stressed. The chapter shows that methods such as the support vector machine and artificial neural networks are subject to these same training limitations, although each has its advantages. For neural networks, the multilayer perceptron architecture and back-propagation algorithm are described. The chapter distinguishes between supervised and unsupervised learning, demonstrating the advantages of the latter and showing how methods such as clustering and principal components analysis fit into the SPR framework. The chapter also defines the receiver operating characteristic, which allows an optimum balance between false positives and false negatives to be achieved.

1,189 citations