Clustal W and Clustal X version 2.0

doi:10.1093/BIOINFORMATICS/BTM404

Home
/
Papers
/
Clustal W and Clustal X version 2.0

Journal Article•DOI•

Clustal W and Clustal X version 2.0

Mark A. Larkin¹, Gordon Blackshields², Nigel P. Brown², R. Chenna², Paul A. McGettigan², Hamish McWilliam², Franck Valentin², Iain M. Wallace², Andreas Wilm², Rodrigo Lopez², J.D. Thompson², Toby J. Gibson², Desmond G. Higgins² - Show less +9 more•Institutions (2)

University College Dublin¹, European Bioinformatics Institute²

01 Nov 2007-Bioinformatics (Oxford University Press)-Vol. 23, Iss: 21, pp 2947-2948

TL;DR: The Clustal W and ClUSTal X multiple sequence alignment programs have been completely rewritten in C++ to facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems.

read less

Abstract: Summary: The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems. Availability: The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/ Contact: clustalw@ucd.ie

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega

[...]

Fabian Sievers¹, Andreas Wilm², David Dineen¹, Toby J. Gibson, Kevin Karplus³, Weizhong Li⁴, Rodrigo Lopez⁴, Hamish McWilliam⁴, Michael Remmert⁵, Johannes Söding⁵, Julie D. Thompson⁶, Desmond G. Higgins¹ - Show less +8 more•Institutions (6)

University College Dublin¹, Genome Institute of Singapore², University of California, Santa Cruz³, European Bioinformatics Institute⁴, Ludwig Maximilian University of Munich⁵, University of Strasbourg⁶

01 Jan 2011-Molecular Systems Biology

TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.

...read moreread less

Abstract: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

...read moreread less

12,489 citations

Journal Article•DOI•

Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

[...]

Bruce J. Walker¹, Thomas Abeel², Terrance Shea¹, Margaret Priest¹, Amr Abouelliel¹, Sharadha Sakthikumar¹, Christina A. Cuomo¹, Qiandong Zeng¹, Jennifer R. Wortman¹, Sarah Young¹, Ashlee M. Earl¹ - Show less +7 more•Institutions (2)

Broad Institute¹, Ghent University²

19 Nov 2014-PLOS ONE

TL;DR: Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.

...read moreread less

Abstract: Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

...read moreread less

5,659 citations

Journal Article•DOI•

SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building

[...]

Manolo Gouy, Stéphane Guindon, Olivier Gascuel

01 Feb 2010-Molecular Biology and Evolution

TL;DR: SeaView version 4 combines all the functions of the widely used programs SeaView and Phylo_win, and expands them by adding network access to sequence databases, alignment with arbitrary algorithm, maximum-likelihood tree building with PhyML, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees.

...read moreread less

Abstract: We present SeaView version 4, a multiplatform program designed to facilitate multiple alignment and phylogenetic tree building from molecular sequence data through the use of a graphical user interface. SeaView version 4 combines all the functions of the widely used programs SeaView (in its previous versions) and Phylo_win, and expands them by adding network access to sequence databases, alignment with arbitrary algorithm, maximum-likelihood tree building with PhyML, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees. In relation to the wide present offer of tools and algorithms for phylogenetic analyses, SeaView is especially useful for teaching and for occasional users of such software. SeaView is freely available at http://pbil.univ-lyon1.fr/software/seaview.

...read moreread less

5,074 citations

Cites methods from "Clustal W and Clustal X version 2.0..."

...We thank Nicolas Galtier for contributing code from the Phylo_win program....
[...]

Journal Article•DOI•

Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions

[...]

Chunaram Choudhary¹, Chanchal Kumar¹, Florian Gnad¹, Michael L. Nielsen¹, Michael Rehman¹, Tobias C. Walther¹, Jesper V. Olsen¹, Matthias Mann¹ - Show less +4 more•Institutions (1)

Max Planck Society¹

14 Aug 2009-Science

TL;DR: A proteomic-scale analysis of protein acetylation suggests that it is an important biological regulatory mechanism and the regulatory scope of lysine acetylations is broad and comparable with that of other major posttranslational modifications.

...read moreread less

Abstract: Lysine acetylation is a reversible posttranslational modification of proteins and plays a key role in regulating gene expression. Technological limitations have so far prevented a global analysis of lysine acetylation's cellular roles. We used high-resolution mass spectrometry to identify 3600 lysine acetylation sites on 1750 proteins and quantified acetylation changes in response to the deacetylase inhibitors suberoylanilide hydroxamic acid and MS-275. Lysine acetylation preferentially targets large macromolecular complexes involved in diverse cellular processes, such as chromatin remodeling, cell cycle, splicing, nuclear transport, and actin nucleation. Acetylation impaired phosphorylation-dependent interactions of 14-3-3 and regulated the yeast cyclin-dependent kinase Cdc28. Our data demonstrate that the regulatory scope of lysine acetylation is broad and comparable with that of other major posttranslational modifications.

...read moreread less

3,787 citations

Journal Article•DOI•

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations

[...]

Thibaut Jombart¹, Sébastien Devillard², Francois Balloux¹•Institutions (2)

Imperial College London¹, University of Lyon²

15 Oct 2010-BMC Genetics

TL;DR: The Discriminant Analysis of Principal Components (DAPC) is introduced, a multivariate method designed to identify and describe clusters of genetically related individuals that performs generally better than STRUCTURE at characterizing population subdivision.

...read moreread less

Abstract: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations. We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.

...read moreread less

3,770 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

[...]

Julie D. Thompson, Desmond G. Higgins, Toby J. Gibson

11 Nov 1994-Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

...read moreread less

63,427 citations

"Clustal W and Clustal X version 2.0..." refers methods in this paper

...This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems....
[...]
...The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/ clustalw2/ Contact: clustalw@ucd.ie...
[...]

Journal Article•DOI•

The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.

[...]

Julie D. Thompson¹, Toby J. Gibson, Frederica Plewniak¹, Francois Jeanmougin¹, Desmond G. Higgins² - Show less +1 more•Institutions (2)

French Institute of Health and Medical Research¹, University College Cork²

01 Dec 1997-Nucleic Acids Research

TL;DR: ClUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W, providing an integrated system for performing multiple sequence and profile alignments and analysing the results.

...read moreread less

Abstract: CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

...read moreread less

38,522 citations

"Clustal W and Clustal X version 2.0..." refers methods in this paper

...This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems....
[...]
...The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/ clustalw2/ Contact: clustalw@ucd.ie...
[...]
...This has made the code complicated to maintain and develop, as the graphical interface must be constantly modified and recompiled for new operating systems and desktop environments (Windows, Macintosh, VMS, Unix and Linux)....
[...]
...The Qt toolbox provides a native look and feel on Windows, Linux and Mac platforms....
[...]

Journal Article•DOI•

MUSCLE: multiple sequence alignment with high accuracy and high throughput

[...]

Robert C. Edgar

01 Mar 2004-Nucleic Acids Research

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.

...read moreread less

Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

...read moreread less

37,524 citations

"Clustal W and Clustal X version 2.0..." refers background or methods in this paper

...They are needed routinely as parts of more complicated analyses or analysis pipelines and there are several very widely used packages, e.g. Clustal W (Thompson et al., 1994), Clustal X (Thompson et al., 1997), T-Coffee (Notredame et al., 2000), MAFFT (Katoh et al., 2002) and MUSCLE (Edgar, 2004)....
[...]
...Availability: The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2....
[...]
...More recently, MAFFT and MUSCLE appeared; which were, initially, at least as accurate as Clustal, in terms of alignment accuracy, but which were also extremely fast; and able to align many thousands of sequences....
[...]

Journal Article•DOI•

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

[...]

Kazutaka Katoh¹, Kazuharu Misawa, Kei-ichi Kuma¹, Takashi Miyata¹•Institutions (1)

Kyoto University¹

15 Jul 2002-Nucleic Acids Research

TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.

...read moreread less

Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

...read moreread less

12,003 citations

"Clustal W and Clustal X version 2.0..." refers background in this paper

...Availability: The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2....
[...]

Journal Article•DOI•

T-Coffee: A novel method for fast and accurate multiple sequence alignment.

[...]

Cedric Notredame¹, Cedric Notredame², Cedric Notredame³, Desmond G. Higgins⁴, Jaap Heringa² - Show less +1 more•Institutions (4)

Centre national de la recherche scientifique¹, National Institute for Medical Research², ISREC³, University College Cork⁴

08 Sep 2000-Journal of Molecular Biology

TL;DR: A new method for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives but avoids the most serious pitfalls caused by the greedy nature of this algorithm.

...read moreread less

6,727 citations

"Clustal W and Clustal X version 2.0..." refers methods in this paper

...This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems....
[...]