scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

01 Mar 2007-Nature Methods (Nature Publishing Group)-Vol. 4, Iss: 3, pp 207-214
TL;DR: This work clarifies the preferred methodology by addressing four issues based on observed decoy hit frequencies: the major assumptions made with this database search strategy are reasonable, concatenated target-decoy database searches are preferable to separate target and decoydatabase searches, and the theoretical error associated with target-Decoy false positive (FP) rate measurements can be estimated.
Abstract: Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.
Citations
More filters
Journal ArticleDOI
TL;DR: MaxQuant, an integrated suite of algorithms specifically developed for high-resolution, quantitative MS data, detects peaks, isotope clusters and stable amino acid isotope–labeled (SILAC) peptide pairs as three-dimensional objects in m/z, elution time and signal intensity space and achieves mass accuracy in the p.p.b. range.
Abstract: Efficient analysis of very large amounts of raw data for peptide identification and protein quantification is a principal challenge in mass spectrometry (MS)-based proteomics. Here we describe MaxQuant, an integrated suite of algorithms specifically developed for high-resolution, quantitative MS data. Using correlation analysis and graph theory, MaxQuant detects peaks, isotope clusters and stable amino acid isotope-labeled (SILAC) peptide pairs as three-dimensional objects in m/z, elution time and signal intensity space. By integrating multiple mass measurements and correcting for linear and nonlinear mass offsets, we achieve mass accuracy in the p.p.b. range, a sixfold increase over standard techniques. We increase the proportion of identified fragmentation spectra to 73% for SILAC peptide pairs via unambiguous assignment of isotope and missed-cleavage state and individual mass precision. MaxQuant automatically quantifies several hundred thousand peptides per SILAC-proteome experiment and allows statistically robust identification and quantification of >4,000 proteins in mammalian cell lysates.

12,340 citations


Cites methods from "Target-decoy search strategy for in..."

  • ...We use a database containing all true protein sequences, concatenated with reversed nonsense versions of these sequence...

    [...]

Journal ArticleDOI
04 Sep 2008-Nature
TL;DR: It is shown that a single miRNA can repress the production of hundreds of proteins, but that this repression is typically relatively mild, and the data suggest that a mi RNA can, by direct or indirect effects, tune protein synthesis from thousands of genes.
Abstract: Animal microRNAs (miRNAs) regulate gene expression by inhibiting translation and/or by inducing degradation of target messenger RNAs. It is unknown how much translational control is exerted by miRNAs on a genome-wide scale. We used a new proteomic approach to measure changes in synthesis of several thousand proteins in response to miRNA transfection or endogenous miRNA knockdown. In parallel, we quantified mRNA levels using microarrays. Here we show that a single miRNA can repress the production of hundreds of proteins, but that this repression is typically relatively mild. A number of known features of the miRNA-binding site such as the seed sequence also govern repression of human protein synthesis, and we report additional target sequence characteristics. We demonstrate that, in addition to downregulating mRNA levels, miRNAs also directly repress translation of hundreds of genes. Finally, our data suggest that a miRNA can, by direct or indirect effects, tune protein synthesis from thousands of genes.

3,412 citations

Journal ArticleDOI
TL;DR: An updated protocol covering the most important basic computational workflows for mass-spectrometry-based proteomics data analysis, including those designed for quantitative label-free proteomics, MS1-level labeling and isobaric labeling techniques is presented.
Abstract: MaxQuant is one of the most frequently used platforms for mass-spectrometry (MS)-based proteomics data analysis Since its first release in 2008, it has grown substantially in functionality and can be used in conjunction with more MS platforms Here we present an updated protocol covering the most important basic computational workflows, including those designed for quantitative label-free proteomics, MS1-level labeling and isobaric labeling techniques This protocol presents a complete description of the parameters used in MaxQuant, as well as of the configuration options of its integrated search engine, Andromeda This protocol update describes an adaptation of an existing protocol that substantially modifies the technique Important concepts of shotgun proteomics and their implementation in MaxQuant are briefly reviewed, including different quantification strategies and the control of false-discovery rates (FDRs), as well as the analysis of post-translational modifications (PTMs) The MaxQuant output tables, which contain information about quantification of proteins and PTMs, are explained in detail Furthermore, we provide a short version of the workflow that is applicable to data sets with simple and standard experimental designs The MaxQuant algorithms are efficiently parallelized on multiple processors and scale well from desktop computers to servers with many cores The software is written in C# and is freely available at http://wwwmaxquantorg

2,811 citations

Journal ArticleDOI
TL;DR: It is found that TGF-β was required for the in vitro development of microglia that express the microglial molecular signature characteristic of adultmicroglia and that microglian were absent in the CNS of TGF -β1–deficient mice.
Abstract: Microglia are myeloid cells of the CNS that participate both in normal CNS function and in disease. We investigated the molecular signature of microglia and identified 239 genes and 8 microRNAs that were uniquely or highly expressed in microglia versus myeloid and other immune cells. Of the 239 genes, 106 were enriched in microglia as compared with astrocytes, oligodendrocytes and neurons. This microglia signature was not observed in microglial lines or in monocytes recruited to the CNS, and was also observed in human microglia. We found that TGF-β was required for the in vitro development of microglia that express the microglial molecular signature characteristic of adult microglia and that microglia were absent in the CNS of TGF-β1-deficient mice. Our results identify a unique microglial signature that is dependent on TGF-β signaling and provide insights into microglial biology and the possibility of targeting microglia for the treatment of CNS disease.

1,902 citations

Journal ArticleDOI
11 Jul 2008-Cell
TL;DR: This work predicts 19 proteins to be important for the function of complex I (CI) of the electron transport chain and validate a subset of these predictions using RNAi, including C8orf38, which is shown to have an inherited mutation in a lethal, infantile CI deficiency.

1,836 citations


Cites methods from "Target-decoy search strategy for in..."

  • ...- reverse database search, based on the target-decoy strategy (Elias and Gygi, 2007) and...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Abstract: Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

8,195 citations

Journal ArticleDOI
TL;DR: The approach described in this manuscript provides a convenient method to interpret tandem mass spectra with known sequences in a protein database.

6,317 citations


"Target-decoy search strategy for in..." refers background in this paper

  • ...and applying scoring filters based on how the filters performed on an often smaller and unrelated training data se...

    [...]

Journal ArticleDOI
TL;DR: A statistical model is presented to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST, demonstrating that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides.
Abstract: We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.

4,861 citations

Journal ArticleDOI
TL;DR: The combination of strong cation exchange (SCX) and reversed-phase (RP) chromatography to achieve two-dimensional separation prior to MS/MS and 1,504 yeast proteins were unambiguously identified in this single analysis.
Abstract: Highly complex protein mixtures can be directly analyzed after proteolysis by liquid chromatography coupled with tandem mass spectrometry (LC−MS/MS). In this paper, we have utilized the combination of strong cation exchange (SCX) and reversed-phase (RP) chromatography to achieve two-dimensional separation prior to MS/MS. One milligram of whole yeast protein was proteolyzed and separated by SCX chromatography (2.1 mm i.d.) with fraction collection every minute during an 80-min elution. Eighty fractions were reduced in volume and then re-injected via an autosampler in an automated fashion using a vented-column (100 μm i.d.) approach for RP-LC−MS/MS analysis. More than 162 000 MS/MS spectra were collected with 26 815 matched to yeast peptides (7537 unique peptides). A total of 1504 yeast proteins were unambiguously identified in this single analysis. We present a comparison of this experiment with a previously published yeast proteome analysis by Yates and colleagues (Washburn, M. P.; Wolters, D.; Yates, J. ...

1,654 citations


"Target-decoy search strategy for in..." refers background in this paper

  • ...This is most simply done by reversing the target protein sequence...

    [...]

Journal ArticleDOI
TL;DR: A large-scale phosphorylation data set is provided with a measured error rate as determined by the target-decoy approach, an approach to maximize data set sensitivity by efficiently distracting incorrect peptide spectral matches (PSMs) is demonstrated, and a probability-based score is presented, the Ascore, that measures the probability of correct phosphorylated site localization based on the presence and intensity of site-determining ions in MS/MS spectra.
Abstract: Data analysis and interpretation remain major logistical challenges when attempting to identify large numbers of protein phosphorylation sites by nanoscale reverse-phase liquid chromatography/tandem mass spectrometry (LC-MS/MS) (Supplementary Figure 1 online). In this report we address challenges that are often only addressable by laborious manual validation, including data set error, data set sensitivity and phosphorylation site localization. We provide a large-scale phosphorylation data set with a measured error rate as determined by the target-decoy approach, we demonstrate an approach to maximize data set sensitivity by efficiently distracting incorrect peptide spectral matches (PSMs), and we present a probability-based score, the Ascore, that measures the probability of correct phosphorylation site localization based on the presence and intensity of site-determining ions in MS/MS spectra. We applied our methods in a fully automated fashion to nocodazole-arrested HeLa cell lysate where we identified 1,761 nonredundant phosphorylation sites from 491 proteins with a peptide false-positive rate of 1.3%.

1,465 citations