Kalign – an accurate and fast multiple sequence alignment algorithm

doi:10.1186/1471-2105-6-298

Open AccessJournal ArticleDOI

Kalign – an accurate and fast multiple sequence alignment algorithm

Timo Lassmann, +1 more

- 12 Dec 2005 -

BMC Bioinformatics

- Vol. 6, Iss: 1, pp 298-298

Chats0

TLDR

Kalign, a method employing the Wu-Manber string-matching algorithm, is developed to improve both the accuracy and speed of multiple sequence alignment and is especially well suited for the increasingly important task of aligning large numbers of sequences.

Abstract:

The alignment of multiple protein sequences is a fundamental step in the analysis of biological data It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods Kalign is a fast and robust alignment method It is especially well suited for the increasingly important task of aligning large numbers of sequences

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega

Fabian Sievers, +11 more

- 01 Jan 2011 -

Molecular Systems Biology

TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.

...read moreread less

Journal ArticleDOI

Database resources of the National Center for Biotechnology Information

David L. Wheeler, +12 more

- 01 Jan 2004 -

Nucleic Acids Research

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.

...read moreread less

Journal ArticleDOI

GENCODE: The reference human genome annotation for The ENCODE Project

Jennifer Harrow, +40 more

- 01 Sep 2012 -

Genome Research

TL;DR: This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.

...read moreread less

Journal ArticleDOI

Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments

Gerard Talavera, +1 more

- 01 Aug 2007 -

Systematic Biology

TL;DR: Whether phylogenetic reconstruction improves after alignment cleaning or not is examined and cleaned alignments produce better topologies although, paradoxically, with lower bootstrap, which indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.

...read moreread less

Journal ArticleDOI

Recent developments in the MAFFT multiple sequence alignment program

Kazutaka Katoh, +1 more

- 01 Jul 2008 -

Briefings in Bioinformatics

TL;DR: The initial version of the MAFFT program was developed in 2002 and was updated in 2007 with two new techniques: the PartTree algorithm and the Four-way consistency objective function, which improved the scalability of progressive alignment and the accuracy of ncRNA alignment.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Basic Local Alignment Search Tool

Stephen F. Altschul, +4 more

- 01 Oct 1990 -

Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Stephen F. Altschul, +6 more

- 01 Sep 1997 -

Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Journal ArticleDOI

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

Julie D. Thompson, +2 more

- 11 Nov 1994 -

Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Journal ArticleDOI

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Naruya Saitou, +1 more

- 01 Jul 1987 -

Molecular Biology and Evolution

TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.

...read moreread less

Journal ArticleDOI

MUSCLE: multiple sequence alignment with high accuracy and high throughput

Robert C. Edgar

- 01 Mar 2004 -

Nucleic Acids Research

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.

...read moreread less