Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements

doi:10.1093/NAR/29.14.2994

Open AccessJournal ArticleDOI

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements

Alejandro A. Schäffer, +7 more

- 15 Jul 2001 -

Nucleic Acids Research

- Vol. 29, Iss: 14, pp 2994-3005

TLDR

The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST, and the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition.

Abstract:

PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758± 0.005 to 0.895 ± 0.003. This does not include the benefits from four modifications we included in the ‘baseline’ version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a positionspecific scoring system tuned to that sequence’s amino acid composition. The use of compositionbased statistics is particularly beneficial for largescale automated applications of PSI-BLAST.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

MUSCLE: multiple sequence alignment with high accuracy and high throughput

Robert C. Edgar

- 01 Mar 2004 -

Nucleic Acids Research

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.

...read moreread less

Journal ArticleDOI

BLAST+: architecture and applications.

Christiam Camacho, +6 more

- 15 Dec 2009 -

BMC Bioinformatics

TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.

...read moreread less

Journal ArticleDOI

MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment

Sudhir Kumar, +2 more

- 01 Jun 2004 -

Briefings in Bioinformatics

TL;DR: An overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA is provided.

...read moreread less

Journal ArticleDOI

Database resources of the National Center for Biotechnology Information

David L. Wheeler, +12 more

- 01 Jan 2004 -

Nucleic Acids Research

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.

...read moreread less

Book

Accelerated Profile HMM Searches

Sean R. Eddy

TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Basic Local Alignment Search Tool

Stephen F. Altschul, +4 more

- 01 Oct 1990 -

Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Stephen F. Altschul, +6 more

- 01 Sep 1997 -

Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Journal ArticleDOI

The Protein Data Bank

Helen M. Berman, +7 more

- 01 Jan 2000 -

Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

Journal ArticleDOI

Improved tools for biological sequence comparison.

William R. Pearson, +1 more

- 01 Apr 1988 -

Proceedings of the National Academy of S...

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.

...read moreread less

Journal ArticleDOI

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981 -

Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less