scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry.

28 Oct 2003-Analytical Chemistry (American Chemical Society)-Vol. 75, Iss: 23, pp 6648-6657
TL;DR: The utility of the ASAPRatio program was clearly demonstrated by its speed and the accuracy of the generated protein abundance ratios and by its capability to identify specific core components of the RNA polymerase II transcription complex within a high background of copurifying proteins.
Abstract: We describe an algorithm for the automated statistical analysis of protein abundance ratios (ASAPRatio) of proteins contained in two samples. Proteins are labeled with distinct stable-isotope tags and fragmented, and the tagged peptide fragments are separated by liquid chromatography (LC) and analyzed by electrospray ionization (ESI) tandem mass spectrometry (MS/MS). The algorithm utilizes the signals recorded for the different isotopic forms of peptides of identical sequence and numerical and statistical methods, such as Savitzky-Golay smoothing filters, statistics for weighted samples, and Dixon's test for outliers, to evaluate protein abundance ratios and their associated errors. The algorithm also provides a statistical assessment to distinguish proteins of significant abundance changes from a population of proteins of unchanged abundance. To evaluate its performance, two sets of LC-ESI-MS/MS data were analyzed by the ASAPRatio algorithm without human intervention, and the data were related to the expected and manually validated values. The utility of the ASAPRatio program was clearly demonstrated by its speed and the accuracy of the generated protein abundance ratios and by its capability to identify specific core components of the RNA polymerase II transcription complex within a high background of copurifying proteins.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Serac, software developed to evaluate the ability of each method to quantify relative changes in protein abundance is described, with overall spectral counting proved to be a more sensitive method for detecting proteins that undergo changes in abundance, whereas peak area intensity measurements yielded more accurate estimates of protein ratios.

1,241 citations


Cites background or methods from "Automated statistical analysis of p..."

  • ...3B, open squares for BSA), although combining the information from multiple peptides for each protein increased the confidence in the estimated ratio in much the same way that ratios from stable isotope analysis are combined to form protein ratios (32, 33)....

    [...]

  • ...Protein Ratios from Peak Area Intensities—The Serac PASC module calculates ratios of ion intensities for peptides matched between different experiments and averages the peptide ratios as a measure of protein change as in stable isotope labeling studies (32, 33)....

    [...]

Journal ArticleDOI
TL;DR: The difficulties of interpreting shotgun proteomic data are illustrated and the need for common nomenclature and transparent informatic approaches are discussed and related issues such as the state of protein sequence databases and their role in shotgun proteomics analysis, interpretation of relative peptide quantification data in the presence of multiple protein isoforms, and the integration of proteomic and transcriptional data are discussed.

983 citations


Cites methods from "Automated statistical analysis of p..."

  • ...Quantitative information (not discussed in the original publication) was extracted from the data using an automated tool, ASAPRatio (63), and then confirmed by manual inspection....

    [...]

Journal ArticleDOI
TL;DR: The 'mzXML' format is introduced, an open, generic XML (extensible markup language) representation of MS data that will facilitate data management, interpretation and dissemination in proteomics research.
Abstract: A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.

788 citations

Journal ArticleDOI
TL;DR: The full workflow of the TPP is described, along with an example on a sample data set, demonstrating that the setup and use of the tools are straightforward and well supported and do not require specialized informatic resources or knowledge.
Abstract: The Trans-Proteomic Pipeline (TPP) is a suite of software tools for the analysis of MS/MS data sets. The tools encompass most of the steps in a proteomic data analysis workflow in a single, integrated software system. Specifically, the TPP supports all steps from spectrometer output file conversion to protein-level statistical validation, including quantification by stable isotope ratios. We describe here the full workflow of the TPP and the tools therein, along with an example on a sample data set, demonstrating that the setup and use of the tools are straightforward and well supported and do not require specialized informatic resources or knowledge.

756 citations


Cites background or methods from "Automated statistical analysis of p..."

  • ...Within this workflow, the quantification analysis tools XPRESS [17], ASAPRatio [18], or Libra [19] may be used with data that derive from isotopically or isobarically labeled samples....

    [...]

  • ...The more recent ASAPRatio [18] is more sophisticated in its measurement of, and aggregation of measurements from, multiple peptide ions from the same peptide, as well as aggregation at the protein level....

    [...]

  • ...Step 9 of the tutorial demonstrates the use of ASAPRatio on the sample data set to derive abundance ratios for the two samples....

    [...]

Journal ArticleDOI
TL;DR: The Trans‐Proteomic Pipeline is described, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels, and enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a range of different database search programs.
Abstract: The analysis of tandem mass (MS/MS) data to identify and quantify proteins is hampered by the heterogeneity of file formats at the raw spectral data, peptide identification, and protein identification levels. Different mass spectrometers output their raw spectral data in a variety of proprietary formats, and alternative methods that assign peptides to MS/MS spectra and infer protein identifications from those peptide assignments each write their results in different formats. Here we describe an MS/MS analysis platform, the Trans-Proteomic Pipeline, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels. This platform enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a variety of different database search programs. We demonstrate this by applying the pipeline to data sets generated by ThermoFinnigan LCQ, ABI 4700 MALDI-TOF/TOF, and Waters Q-TOF instruments, and searched in turn using SEQUEST, Mascot, and COMET.

726 citations


Cites background from "Automated statistical analysis of p..."

  • ...…many of these steps: PeptideProphet validates peptides assigned to MS/MS spectra (Keller et al, 2002a), XPRESS (Han et al, 2001) andASAPRatio (Li et al, 2003) quantitate peptides andproteins in differentially labeled samples, Pep3D enables a viewof the raw spectral data (Li et al, 2004), and…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Abstract: Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

8,195 citations

Journal ArticleDOI
TL;DR: A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample, and it is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications.
Abstract: A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation−maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identif...

4,544 citations

Journal ArticleDOI
04 May 2001-Science
TL;DR: An integrated approach to build, test, and refine a model of a cellular pathway, in which perturbations to critical pathway components are analyzed using DNA microarrays, quantitative proteomics, and databases of known physical interactions, suggests hypotheses about the regulation of galactose utilization and physical interactions between this and a variety of other metabolic pathways.
Abstract: We demonstrate an integrated approach to build, test, and refine a model of a cellular pathway, in which perturbations to critical pathway components are analyzed using DNA microarrays, quantitative proteomics, and databases of known physical interactions. Using this approach, we identify 997 messenger RNAs responding to 20 systematic perturbations of the yeast galactose-utilization pathway, provide evidence that approximately 15 of 289 detected proteins are regulated posttranscriptionally, and identify explicit physical interactions governing the cellular response to each perturbation. We refine the model through further iterations of perturbation and global measurements, suggesting hypotheses about the regulation of galactose utilization and physical interactions between this and a variety of other metabolic pathways.

2,056 citations

Journal ArticleDOI
TL;DR: This study establishes that mass spectrometry provides the required throughput, the certainty of identification, and the general applicability to serve as the method of choice to connect genome and proteome.
Abstract: The function of many of the uncharacterized openreadingframesdiscoveredbygenomicsequencingcanbe determined at the level of expressed gene products, the proteome.However,identifyingthecognategenefromminute amounts of protein has been one of the major problems in molecularbiology.Usingyeastasanexample,wedemonstrate here that mass spectrometric protein identification is a generalsolutiontothisproblemgivenacompletelysequenced genome. As a first screen, our strategy uses automated laser desorption ionization mass spectrometry of the peptide mix- tures produced by in-gel tryptic digestion of a protein. Up to 90% of proteins are identified by searching sequence data bases by lists of peptide masses obtained with high accuracy. The remaining proteins are identified by partially sequencing several peptides of the unseparated mixture by nanoelectro- spray tandem mass spectrometry followed by data base searchingwithmultiplepeptidesequencetags.Inblindtrials, themethodledtounambiguousidentificationinallcases.In the largest individual protein identification project to date, a total of 150 gel spots—many of them at subpicomole amounts—were successfully analyzed, greatly enlarging a yeast two-dimensional gel data base. More than 32 proteins were novel and matched to previously uncharacterized open reading frames in the yeast genome. This study establishes that mass spectrometry provides the required throughput, the certainty of identification, and the general applicability to serve as the method of choice to connect genome and proteome.

1,456 citations

Journal ArticleDOI
TL;DR: Stable isotopic amino acids in cell culture is employed to differentially label proteins in EGF-stimulated versus unstimulated cells and SILAC combined with modification-based affinity purification is a useful approach to detect specific and functional protein-protein interactions.
Abstract: Mass spectrometry-based proteomics can reveal protein-protein interactions on a large scale, but it has been difficult to separate background binding from functionally important interactions and still preserve weak binders. To investigate the epidermal growth factor receptor (EGFR) pathway, we employ stable isotopic amino acids in cell culture (SILAC) to differentially label proteins in EGF-stimulated versus unstimulated cells. Combined cell lysates were affinity-purified over the SH2 domain of the adapter protein Grb2 (GST-SH2 fusion protein) that specifically binds phosphorylated EGFR and Src homologous and collagen (Shc) protein. We identified 228 proteins, of which 28 were selectively enriched upon stimulation. EGFR and Shc, which interact directly with the bait, had large differential ratios. Many signaling molecules specifically formed complexes with the activated EGFR-Shc, as did plectin, epiplakin, cytokeratin networks, histone H3, the glycosylphosphatidylinositol (GPI)-anchored molecule CD59, and two novel proteins. SILAC combined with modification-based affinity purification is a useful approach to detect specific and functional protein-protein interactions.

730 citations