scispace - formally typeset
Search or ask a question
Author

Matthew Tepel

Bio: Matthew Tepel is an academic researcher from Harvard University. The author has contributed to research in topics: De novo peptide sequencing & Tandem mass spectrometry. The author has an hindex of 3, co-authored 3 publications receiving 472 citations.

Papers
More filters
Proceedings ArticleDOI
01 Feb 2000
TL;DR: The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions by implicitly transforming the spectral data into an NC-spectrum graph G (V, E) where /V/ = 2k + 2, and this approach can be further used to discover a modified amino acid in O(/V//E/) time.
Abstract: Tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged molecules of prefix and suffix peptide subsequences and then measures mass/charge ratios of ...

242 citations

Journal ArticleDOI
Ting Chen1, Ming-Yang Kao, Matthew Tepel1, John Rush1, George M. Church1 
TL;DR: In this paper, the authors proposed a dynamic programming-based method to reconstruct the peptide sequence from a given tandem mass spectral data of k ions by implicitly transforming the spectral data into an NC-spectrum graph G (V, E).
Abstract: Tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged molecules of prefix and suffix peptide subsequences and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G (V, E) where /V/ = 2k + 2, we can solve this problem in O(/V//E/) time and O(/V/2) space using dynamic programming. For an ideal noise-free spectrum with only b- and y-ions, we improve the algorithm to O(/V/ + /E/) time and O(/V/) space. Our approach can be further used to discover a modified amino acid in O(/V//E/) time. The algorithms have been implemented and tested on experimental data.

224 citations

Posted Content
Ting Chen1, Ming-Yang Kao, Matthew Tepel1, John Rush1, George M. Church1 
TL;DR: In this paper, the authors proposed to transform the spectral data into an NC-spectrum graph and solve the de novo peptide sequencing problem in O(|V|+|E|) time and space using dynamic programming.
Abstract: The tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged prefix and suffix subsequences, and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G=(V,E) where |V|=2k+2, we can solve this problem in O(|V|+|E|) time and O(|V|) space using dynamic programming. Our approach can be further used to discover a modified amino acid in O(|V||E|) time and to analyze data with other types of noise in O(|V||E|) time. Our algorithms have been implemented and tested on actual experimental data.

15 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new de novo sequencing software package, PEAKS, is described, to extract amino acid sequence information without the use of databases, using a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum.
Abstract: A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing software package, PEAKS, to extract amino acid sequence information without the use of databases. PEAKS uses a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum. The output of the software gives amino acid sequences with confidence scores for the entire sequences, as well as an additional novel positional scoring scheme for portions of the sequences. The performance of PEAKS is compared with Lutefisk, a well-known de novo sequencing software, using quadrupole-time-of-flight (Q-TOF) data obtained for several tryptic peptides from standard proteins.

1,239 citations

Journal ArticleDOI
TL;DR: Modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization, are presented.
Abstract: This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mining are also shown.

805 citations

Journal ArticleDOI
TL;DR: A new algorithm, SHERENGA, is developed for de novo interpretation of MS/MS spectral interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer.
Abstract: Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide s...

601 citations

Journal ArticleDOI
TL;DR: A tool is described, InsPecT, to identify posttranslational modifications using tandem mass spectrometry data, which identifies modified peptides with better or equivalent accuracy than other database search tools while being 2 orders of magnitude faster than SEQUEST, and substantially faster than X!TANDEM on complex mixtures.
Abstract: Reliable identification of posttranslational modifications is key to understanding various cellular regulatory processes. We describe a tool, InsPecT, to identify posttranslational modifications using tandem mass spectrometry data. InsPecT constructs database filters that proved to be very successful in genomics searches. Given an MS/MS spectrum S and a database D, a database filter selects a small fraction of database D that is guaranteed (with high probability) to contain a peptide that produced S. InsPecT uses peptide sequence tags as efficient filters that reduce the size of the database by a few orders of magnitude while retaining the correct peptide with very high probability. In addition to filtering, InsPecT also uses novel algorithms for scoring and validating in the presence of modifications, without explicit enumeration of all variants. InsPecT identifies modified peptides with better or equivalent accuracy than other database search tools while being 2 orders of magnitude faster than SEQUEST, ...

588 citations

Journal ArticleDOI
TL;DR: ProLuCID was able to identify as many as 25% more proteins than SEQUEST and is able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data.

420 citations