scispace - formally typeset
Search or ask a question
Author

Andrew Keller

Other affiliations: Institute for Systems Biology
Bio: Andrew Keller is an academic researcher from University of Washington. The author has contributed to research in topics: Proteomics & Tandem mass spectrometry. The author has an hindex of 21, co-authored 39 publications receiving 11291 citations. Previous affiliations of Andrew Keller include Institute for Systems Biology.

Papers
More filters
Journal ArticleDOI
TL;DR: A statistical model is presented to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST, demonstrating that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides.
Abstract: We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.

4,861 citations

Journal ArticleDOI
TL;DR: A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample, and it is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications.
Abstract: A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation−maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identif...

4,544 citations

Journal ArticleDOI
TL;DR: The Trans‐Proteomic Pipeline is described, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels, and enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a range of different database search programs.
Abstract: The analysis of tandem mass (MS/MS) data to identify and quantify proteins is hampered by the heterogeneity of file formats at the raw spectral data, peptide identification, and protein identification levels. Different mass spectrometers output their raw spectral data in a variety of proprietary formats, and alternative methods that assign peptides to MS/MS spectra and infer protein identifications from those peptide assignments each write their results in different formats. Here we describe an MS/MS analysis platform, the Trans-Proteomic Pipeline, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels. This platform enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a variety of different database search programs. We demonstrate this by applying the pipeline to data sets generated by ThermoFinnigan LCQ, ABI 4700 MALDI-TOF/TOF, and Waters Q-TOF instruments, and searched in turn using SEQUEST, Mascot, and COMET.

726 citations

Journal ArticleDOI
TL;DR: The use of per-methyl esterification of peptides for relative quantification of proteins between two mixtures of proteins and automated de novo sequence derivation on the same dataset is demonstrated.
Abstract: We have demonstrated the use of per-methyl esterification of peptides for relative quantification of proteins between two mixtures of proteins and automated de novo sequence derivation on the same dataset. Protein mixtures for comparison were digested to peptides and resultant peptides methylated using either d0- or d3-methanol. Methyl esterification of peptides converted carboxylic acids, such as are present on the side chains of aspartic and glutamic acid as well as the carboxyl terminus, to their corresponding methyl esters. The separate d0- and d3-methylated peptide mixtures were combined and the mixture subjected to microcapillary high performance liquid chromatography/tandem mass spectrometry (HPLC/MS/MS). Parent proteins of methylated peptides were identified by correlative database searching of peptide tandem mass spectra. Ratios of proteins in the two original mixtures could be calculated by normalization of the area under the curve for identical charge states of d0- to d3-methylated peptides. An algorithm was developed that derived, without intervention, peptide sequence de novo by comparison of tandem mass spectra of d0- and d3-peptide methyl esters.

295 citations

Journal ArticleDOI
TL;DR: This work describes a data set of low energy tandem mass spectra generated from a control mixture of known protein components that can be used to evaluate the accuracy of several methods to identify peptides.
Abstract: Several methods have been used to identify peptides that correspond to tandem mass spectra. In this work, we describe a data set of low energy tandem mass spectra generated from a control mixture of known protein components that can be used to evaluate the accuracy of these methods. As an example, these spectra were searched by the SEQUEST application against a human peptide sequence database. The numbers of resulting correct and incorrect peptide assignments were then determined. We show how the sensitivity and error rate are affected by the use of various filtering criteria based upon SEQUEST scores and the number of tryptic termini of assigned peptides.

274 citations


Cited by
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
13 Mar 2003-Nature
TL;DR: The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.
Abstract: Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein-protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.

6,597 citations

Journal ArticleDOI
TL;DR: SILAC is a simple, inexpensive, and accurate procedure that can be used as a quantitative proteomic approach in any cell culture system and is applied to the relative quantitation of changes in protein expression during the process of muscle cell differentiation.

5,653 citations

Journal ArticleDOI
TL;DR: A statistical model is presented to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST, demonstrating that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides.
Abstract: We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.

4,861 citations

Journal ArticleDOI
TL;DR: A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample, and it is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications.
Abstract: A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation−maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identif...

4,544 citations