scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A cross-platform toolkit for mass spectrometry and proteomics

TL;DR: The ProteoWizard Toolkit is developed, a robust set of open-source, software libraries and applications designed to facilitate proteomics research that implements the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats.
Abstract: Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the interpretation of vast proteomics datasets remains a challenge. There have been several calls for improvements and standardization of proteomics data analysis frameworks, as well as for an application-programming interface for proteomics data access4,5. In response, we have developed the ProteoWizard Toolkit, a robust set of open-source, software libraries and applications designed to facilitate proteomics research. The libraries implement the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats. In addition, diverse software classes enable rapid development of vendor-agnostic proteomics software. Additionally, ProteoWizard projects and applications, building upon the core libraries, are becoming standard tools for enabling significant proteomics inquiries.

Content maybe subject to copyright    Report

Citations
More filters
01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

Journal ArticleDOI
TL;DR: The PX submission tool simplifies the process of submitting data to PRIDE by automating the very labor-intensive and therefore time-heavy and expensive process of manually downloading and editing files.
Abstract: 5. Tools available and ways to submit data to PX ............................................................. 11 5.1. MS/MS data submissions to PRIDE .................................................................................... 11 5.1.1. Creation of supported files for “Complete” submissions .................................................. 11 5.1.1.1. PRIDE XML .................................................................................................................................. 11 5.1.1.2. mzIdentML ................................................................................................................................. 13 5.1.2. Checking the files before submission (initial quality assessment) ..................................... 14 5.1.3. File submission to PRIDE: the PX submission tool ............................................................. 15 5.1.3.1. General Information ................................................................................................................... 15 5.1.3.2. Functionality, Design and Implementation Details .................................................................... 15 5.1.3.3. New open source libraries made available with PX submission tool ......................................... 18 5.1.3.4. PX Submission Tool Java Web Start ............................................................................................ 18 5.1.4. File submission to PRIDE: Command line support using Aspera ........................................ 19 5.1.5. Examples of Partial submissions to PRIDE ......................................................................... 19 5.2. SRM data submissions via PASSEL ..................................................................................... 20

2,436 citations

Journal ArticleDOI
TL;DR: The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
Abstract: Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.

2,209 citations

Journal ArticleDOI
18 Sep 2014-Nature
TL;DR: Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.
Abstract: Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. Messenger RNA transcript abundance did not reliably predict protein abundance differences between tumours. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA 'microsatellite instability/CpG island methylation phenotype' transcriptomic subtype, but had distinct mutation, methylation and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates, including HNF4A (hepatocyte nuclear factor 4, alpha), TOMM34 (translocase of outer mitochondrial membrane 34) and SRC (SRC proto-oncogene, non-receptor tyrosine kinase). Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.

1,183 citations

Journal ArticleDOI
TL;DR: This work provides a freely available computer-generated tandem mass spectral library of 212,516 spectra covering 119,200 compounds from 26 lipid compound classes, including phospholipids, glycerolipid, bacterial lipoglycans and plant glycolipids.
Abstract: Current tandem mass spectral libraries for lipid annotations in metabolomics are limited in size and diversity. We provide a freely available computer-generated tandem mass spectral library of 212,516 spectra covering 119,200 compounds from 26 lipid compound classes, including phospholipids, glycerolipids, bacterial lipoglycans and plant glycolipids. We show platform independence by using tandem mass spectra from 40 different mass spectrometer types including low-resolution and high-resolution instruments.

729 citations

References
More filters
Journal ArticleDOI
19 May 2011-Nature
TL;DR: Using a quantitative model, the first genome-scale prediction of synthesis rates of mRNAs and proteins is obtained and it is found that the cellular abundance of proteins is predominantly controlled at the level of translation.
Abstract: Gene expression is a multistep process that involves the transcription, translation and turnover of messenger RNAs and proteins. Although it is one of the most fundamental processes of life, the entire cascade has never been quantified on a genome-wide scale. Here we simultaneously measured absolute mRNA and protein abundance and turnover by parallel metabolic pulse labelling for more than 5,000 genes in mammalian cells. Whereas mRNA and protein levels correlated better than previously thought, corresponding half-lives showed no correlation. Using a quantitative model we have obtained the first genome-scale prediction of synthesis rates of mRNAs and proteins. We find that the cellular abundance of proteins is predominantly controlled at the level of translation. Genes with similar combinations of mRNA and protein stability shared functional properties, indicating that half-lives evolved under energetic and dynamic constraints. Quantitative information about all stages of gene expression provides a rich resource and helps to provide a greater understanding of the underlying design principles.

5,635 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

Journal ArticleDOI
TL;DR: The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM).
Abstract: Summary: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.

3,794 citations

Journal ArticleDOI
TL;DR: The ProteoWizard project provides a modular and extensible set of open-source, cross-platform tools and libraries that perform proteomics data analyses and enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access.
Abstract: Summary: The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library. Availability: Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at http://proteowizard.sourceforge.net. This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged. Contact: gro.draziwoetorp@nerrad; ude.alcu@garap Supplementary information: Supplementary data are available at Bioinformatics online.

1,611 citations

Journal ArticleDOI
TL;DR: The mapping of a protein interaction network around 32 known and candidate TNF-α/NF-κB pathway components is reported by using an integrated approach comprising tandem affinity purification, liquid-chromatography tandem mass spectrometry, network analysis and directed functional perturbation studies using RNA interference.
Abstract: Signal transduction pathways are modular composites of functionally interdependent sets of proteins that act in a coordinated fashion to transform environmental information into a phenotypic response. The pro-inflammatory cytokine tumour necrosis factor (TNF)-α triggers a signalling cascade, converging on the activation of the transcription factor NF-κB, which forms the basis for numerous physiological and pathological processes. Here we report the mapping of a protein interaction network around 32 known and candidate TNF-α/NF-κB pathway components by using an integrated approach comprising tandem affinity purification, liquid-chromatography tandem mass spectrometry, network analysis and directed functional perturbation studies using RNA interference. We identified 221 molecular associations and 80 previously unknown interactors, including 10 new functional modulators of the pathway. This systems approach provides significant insight into the logic of the TNF-α/NF-κB pathway and is generally applicable to other pathways relevant to human disease.

956 citations