T-Coffee: A novel method for fast and accurate multiple sequence alignment.

doi:10.1006/JMBI.2000.4042

Home
/
Papers
/
T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Journal Article•DOI•

T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Cedric Notredame¹, Cedric Notredame², Cedric Notredame³, Desmond G. Higgins⁴, Jaap Heringa¹ - Show less +1 more•Institutions (4)

National Institute for Medical Research¹, ISREC², Centre national de la recherche scientifique³, University College Cork⁴

08 Sep 2000-Journal of Molecular Biology (Academic Press)-Vol. 302, Iss: 1, pp 205-217

TL;DR: A new method for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives but avoids the most serious pitfalls caused by the greedy nature of this algorithm.

read less

About: This article is published in Journal of Molecular Biology.The article was published on 2000-09-08. It has received 6727 citations till now. The article focuses on the topics: Multiple sequence alignment & Alignment-free sequence analysis.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

MUSCLE: multiple sequence alignment with high accuracy and high throughput

[...]

Robert C. Edgar

01 Mar 2004-Nucleic Acids Research

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.

...read moreread less

Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

...read moreread less

37,524 citations

Cites background or methods from "T-Coffee: A novel method for fast a..."

...A variant on this strategy is used by T-Coffee (5), which aligns pro®les by optimizing a score derived from local and global alignments of all pairs of input sequences....
[...]
...for alignment accuracy discrimination (5,7,8) as fewer assumptions are made about the population distribution....
[...]

Journal Article•DOI•

Clustal W and Clustal X version 2.0

[...]

Mark A. Larkin¹, Gordon Blackshields², Nigel P. Brown², R. Chenna², Paul A. McGettigan², Hamish McWilliam², Franck Valentin², Iain M. Wallace², Andreas Wilm², Rodrigo Lopez², J.D. Thompson², Toby J. Gibson², Desmond G. Higgins² - Show less +9 more•Institutions (2)

University College Dublin¹, European Bioinformatics Institute²

01 Nov 2007-Bioinformatics

TL;DR: The Clustal W and ClUSTal X multiple sequence alignment programs have been completely rewritten in C++ to facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems.

...read moreread less

Abstract: Summary: The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems. Availability: The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/ Contact: clustalw@ucd.ie

...read moreread less

25,325 citations

Cites methods from "T-Coffee: A novel method for fast a..."

...This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems....
[...]

Journal Article•DOI•

Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega

[...]

Fabian Sievers¹, Andreas Wilm², David Dineen¹, Toby J. Gibson, Kevin Karplus³, Weizhong Li⁴, Rodrigo Lopez⁴, Hamish McWilliam⁴, Michael Remmert⁵, Johannes Söding⁵, Julie D. Thompson⁶, Desmond G. Higgins¹ - Show less +8 more•Institutions (6)

University College Dublin¹, Genome Institute of Singapore², University of California, Santa Cruz³, European Bioinformatics Institute⁴, Ludwig Maximilian University of Munich⁵, University of Strasbourg⁶

01 Jan 2011-Molecular Systems Biology

TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.

...read moreread less

Abstract: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

...read moreread less

12,489 citations

Cites background or methods from "T-Coffee: A novel method for fast a..."

...…most (203 out of 218) BAliBASE test cases, the number of sequences is small and the script runs L-INS-i, which is the slow accurate program that uses the consistency heuristic (Notredame et al, 2000) that is also used by MSAprobs (Liu et al, 2010), Probalign, Probcons (Do et al, 2005) and T-Coffee....
[...]
...To counteract this effect, the consistency principle was developed (Notredame et al, 2000)....
[...]
...This has allowed the production of a new generation of more accurate aligners (e.g. T-Coffee (Notredame et al, 2000)) but at the expense of ease of computation....
[...]
...Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega...
[...]

Journal Article•DOI•

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

[...]

Kazutaka Katoh¹, Kazuharu Misawa, Kei-ichi Kuma¹, Takashi Miyata¹•Institutions (1)

Kyoto University¹

15 Jul 2002-Nucleic Acids Research

TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.

...read moreread less

Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

...read moreread less

12,003 citations

Cites background or result from "T-Coffee: A novel method for fast a..."

...We have applied four methods described in Methods, NWAP-2, NW-NS-2, FFT-NS-2 and FFT-NS-i, to this database to compare their ef®ciencies with those of ®ve existing methods, DIALIGN (29,30), PIMA (31), CLUSTALW (7) version 1.82, PRRP (32) and T-COFFEE (9)....
[...]
...T-COFFEE marked the highest average accuracy, but the accuracy of FFT-NS-i is comparable with that of T-COFFEE....
[...]
...On the basis of such considerations, Notredame et al. (9) formulates a combination of NW and SW alignment procedures in T-COFFEE....
[...]
...Considerable improvements in the accuracy have recently been made in CLUSTALW (7) version 1.8, the most popular alignment program with excellent portability and operativity, and T-COFFEE (9), which provides alignments of the highest accuracy among known methods to date....
[...]
...The CPU times and the sum-of-pairs and column scores (8) of NW-AP-2, NW-NS-2, FFT-NS-2 and FFT-NS-i were compared with those of two existing methods, CLUSTALW (version 1.82) and T-COFFEE using these two data sets (Table 2)....
[...]

Journal Article•DOI•

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

[...]

Robert C. Edgar¹•Institutions (1)

University of California, Berkeley¹

19 Aug 2004-BMC Bioinformatics

TL;DR: MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs, and a new option, MUSCLE-fast, designed for high-throughput applications.

...read moreread less

Abstract: In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle .

...read moreread less

7,617 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

[...]

Julie D. Thompson, Desmond G. Higgins, Toby J. Gibson

11 Nov 1994-Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

...read moreread less

63,427 citations

"T-Coffee: A novel method for fast a..." refers methods in this paper

...ClustalW (Thompson et al., 1994) is a progressive-alignment method....
[...]
...The most commonly used heuristic methods are based on the progressive-alignment strategy (Feng & Doolittle, 1987; Hogeweg & Hesper, 1984; Taylor, 1988). with ClustalW (Thompson et al., 1994) being the most widely used implementation....
[...]
...In the progressive alignment (Thompson et al., 1994), pair-wise alignments are ®rst made to produce a distance matrix between all the sequences, which in turn is used to produce a guide tree using the neighbor-joining method (Saitou & Nei, 1987)....
[...]
...We use a so-called progressive strategy (Feng & Doolittle, 1987; Taylor, 1988; Thompson et al., 1994), which is similar to that used in ClustalW....
[...]
...with ClustalW (Thompson et al., 1994) being the most widely used implementation....
[...]

Journal Article•DOI•

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

[...]

Naruya Saitou¹, Masatoshi Nei•Institutions (1)

University of Texas Health Science Center at Houston¹

01 Jul 1987-Molecular Biology and Evolution

TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.

...read moreread less

Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

...read moreread less

57,055 citations

"T-Coffee: A novel method for fast a..." refers methods in this paper

...In the progressive alignment (Thompson et al., 1994), pair-wise alignments are ®rst made to produce a distance matrix between all the sequences, which in turn is used to produce a guide tree using the neighbor-joining method (Saitou & Nei, 1987)....
[...]
...In the progressive alignment (Thompson et al., 1994), pair-wise alignments are first made to produce a distance matrix between all the sequences, which in turn is used to produce a guide tree using the neighbor-joining method ( Saitou & Nei, 1987 )....
[...]

Journal Article•DOI•

Improved tools for biological sequence comparison.

[...]

William R. Pearson¹, David J. Lipman•Institutions (1)

University of Virginia¹

01 Apr 1988-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.

...read moreread less

Abstract: We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

...read moreread less

12,432 citations

Journal Article•DOI•

Identification of common molecular subsequences.

[...]

Temple F. Smith¹, Michael S. Waterman²•Institutions (2)

Northern Michigan University¹, Los Alamos National Laboratory²

25 Mar 1981-Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

10,262 citations

"T-Coffee: A novel method for fast a..." refers methods in this paper

...For two-sequence comparisons, there is the well-known Smith and Waterman (1981) algorithm....
[...]
...For two-sequence comparisons, there is the well-known Smith and Waterman (1981) algorithm....
[...]

Book•

Atlas of protein sequence and structure

[...]

M. A. Chang, M. O. Dayhoff, R. V. Eck, M. R. Sochard

01 Jan 1965

6,855 citations

"T-Coffee: A novel method for fast a..." refers methods in this paper

...General empirical models of protein evolution (Benner et al., 1992; Dayhoff, 1978; Henikoff & Henikoff, 1992) are widely used instead, but these can be dif®cult to apply when the sequences are less than 30 % identical (Sander & Schneider, 1991)....
[...]