Template-based protein structure modeling using the RaptorX web server

doi:10.1038/NPROT.2012.085

Home
/
Papers
/
Template-based protein structure modeling using the RaptorX web server

Journal Article•DOI•

Template-based protein structure modeling using the RaptorX web server

Morten Källberg¹, Morten Källberg², Haipeng Wang¹, Sheng Wang¹, Jian Peng¹, Zhiyong Wang¹, Hui Lu², Jinbo Xu¹ - Show less +4 more•Institutions (2)

Toyota Technological Institute at Chicago¹, University of Illinois at Chicago²

01 Aug 2012-Nature Protocols (Nature Publishing Group)-Vol. 7, Iss: 8, pp 1511-1522

TL;DR: This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling.

read less

Abstract: A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX ( http://raptorx.uchicago.edu/ ) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ∼35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ∼6,000 sequences submitted by ∼1,600 users from around the world.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The Phyre2 web portal for protein modeling, prediction and analysis

[...]

Lawrence A. Kelley¹, Stefans Mezulis¹, Christopher M. Yates², Christopher M. Yates¹, Mark N. Wass¹, Mark N. Wass³, Michael J.E. Sternberg¹ - Show less +3 more•Institutions (3)

Imperial College London¹, University College London², University of Kent³

07 May 2015-Nature Protocols

TL;DR: An updated protocol for Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants for a user's protein sequence.

...read moreread less

Abstract: Phyre2 is a web-based tool for predicting and analyzing protein structure and function. Phyre2 uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyze amino acid variants in a protein sequence. Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2 . A typical structure prediction will be returned between 30 min and 2 h after submission.

...read moreread less

7,941 citations

Journal Article•DOI•

SWISS-MODEL: homology modelling of protein structures and complexes.

[...]

Andrew Waterhouse¹, Andrew Waterhouse², Martino Bertoni¹, Martino Bertoni², Stefan Bienert¹, Stefan Bienert², Gabriel Studer², Gabriel Studer¹, Gerardo Tauriello², Gerardo Tauriello¹, Rafal Gumienny¹, Rafal Gumienny², Florian T Heer², Florian T Heer¹, Tjaart A. P. de Beer², Tjaart A. P. de Beer¹, Christine Rempfer¹, Christine Rempfer², Lorenza Bordoli², Lorenza Bordoli¹, Rosalba Lepore¹, Rosalba Lepore², Torsten Schwede², Torsten Schwede¹ - Show less +20 more•Institutions (2)

University of Basel¹, Swiss Institute of Bioinformatics²

02 Jul 2018-Nucleic Acids Research

TL;DR: An update to the SWISS-MODEL server is presented, which includes the implementation of a new modelling engine, ProMod3, and the introduction a new local model quality estimation method, QMEANDisCo.

...read moreread less

Abstract: Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known protein sequences and experimentally determined structures. Fully automated workflows and servers simplify and streamline the homology modelling process, also allowing users without a specific computational expertise to generate reliable protein models and have easy access to modelling results, their visualization and interpretation. Here, we present an update to the SWISS-MODEL server, which pioneered the field of automated modelling 25 years ago and been continuously further developed. Recently, its functionality has been extended to the modelling of homo- and heteromeric complexes. Starting from the amino acid sequences of the interacting proteins, both the stoichiometry and the overall structure of the complex are inferred by homology modelling. Other major improvements include the implementation of a new modelling engine, ProMod3 and the introduction a new local model quality estimation method, QMEANDisCo. SWISS-MODEL is freely available at https://swissmodel.expasy.org.

...read moreread less

7,022 citations

Journal Article•DOI•

SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information

[...]

Marco Biasini¹, Stefan Bienert¹, Andrew Waterhouse¹, Konstantin Arnold¹, Gabriel Studer¹, Tobias Schmidt¹, Florian Kiefer¹, Tiziano Gallo Cassarino¹, Martino Bertoni¹, Lorenza Bordoli¹, Torsten Schwede¹, Torsten Schwede² - Show less +8 more•Institutions (2)

Swiss Institute of Bioinformatics¹, University of Basel²

01 Jul 2014-Nucleic Acids Research

TL;DR: The latest version of the SWISS-MODEL expert system for protein structure modelling is described, which makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models.

...read moreread less

Abstract: Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and select the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided template selection step allows building models in different functional states. SWISS-MODEL is available at http://swissmodel.expasy.org/.

...read moreread less

4,235 citations

Journal Article•DOI•

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

[...]

Sheng Wang¹, Siqi Sun¹, Zhen Li¹, Renyu Zhang¹, Jinbo Xu¹ - Show less +1 more•Institutions (1)

Toyota Technological Institute at Chicago¹

05 Jan 2017-PLOS Computational Biology

TL;DR: A new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks that greatly outperforms existing methods and leads to much more accurate contact-assisted folding.

...read moreread less

Abstract: Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/

...read moreread less

779 citations

Journal Article•DOI•

RAD51B in Familial Breast Cancer

[...]

Liisa M. Pelttari¹, Sofia Khan¹, Mikko Vuorela², Johanna I. Kiiski¹, Sara Vilske¹, Viivi Nevanlinna¹, Salla Ranta¹, Johanna Schleutker³, Johanna Schleutker⁴, Johanna Schleutker⁵, Robert Winqvist², Anne Kallioniemi⁵, Thilo Dörk⁶, Natalia Bogdanova⁶, Jonine Figueroa, Paul D.P. Pharoah⁷, Marjanka K. Schmidt⁸, Alison M. Dunning⁷, Montserrat Garcia-Closas⁹, Manjeet K. Bolla⁷, Joe Dennis⁷, Kyriaki Michailidou⁷, Qin Wang⁷, John L. Hopper¹⁰, Melissa C. Southey¹⁰, Efraim H. Rosenberg⁸, Peter A. Fasching¹¹, Peter A. Fasching¹², Matthias W. Beckmann¹¹, Julian Peto¹³, Isabel dos-Santos-Silva¹³, Elinor J. Sawyer¹⁴, Ian Tomlinson¹⁵, Barbara Burwinkel¹⁶, Barbara Burwinkel¹⁷, Harald Surowy¹⁶, Harald Surowy¹⁷, Pascal Guénel¹⁸, Thérèse Truong¹⁸, Stig E. Bojesen¹⁹, Stig E. Bojesen²⁰, Børge G. Nordestgaard²⁰, Børge G. Nordestgaard¹⁹, Javier Benitez, Anna González-Neira, Susan L. Neuhausen²¹, Hoda Anton-Culver²², Hermann Brenner¹⁷, Volker Arndt¹⁷, Alfons Meindl²³, Rita K. Schmutzler²⁴, Hiltrud Brauch²⁵, Hiltrud Brauch¹⁷, Hiltrud Brauch²⁶, Thomas Brüning²⁷, Annika Lindblom²⁸, Sara Margolin²⁸, Arto Mannermaa²⁹, Jaana M. Hartikainen²⁹, Georgia Chenevix-Trench³⁰, kConFab¹⁰, kConFab³⁰, Aocs Investigators³¹, Laurien Van Dyck³¹, Hilde Janssen³², Hilde Janssen¹⁷, Jenny Chang-Claude¹⁷, Anja Rudolph, Paolo Radice, Paolo Peterlongo³³, Emily Hallberg³³, Janet E. Olson¹⁰, Janet E. Olson³⁴, Graham G. Giles¹⁰, Graham G. Giles³⁴, Roger L. Milne³⁵, Christopher A. Haiman³⁵, Fredrick Schumacher³⁶, Jacques Simard³⁶, Martine Dumont³⁷, Martine Dumont³⁸, Vessela N. Kristensen³⁷, Vessela N. Kristensen³⁸, Anne Lise Børresen-Dale³⁹, Wei Zheng³⁹, Alicia Beeghly-Fadiel⁴⁰, Mervi Grip⁴¹, Mervi Grip⁴², Irene L. Andrulis⁴², Gord Glendon⁴³, Peter Devilee⁴⁴, Caroline Seynaeve⁴⁴, Maartje J. Hooning⁴⁵, Margriet Collée⁴⁶, Angela Cox⁴⁶, Simon S. Cross⁷, Mitul Shah⁷, Robert Luben¹⁷, Ute Hamann⁴⁷, Ute Hamann¹⁷, Diana Torres⁴⁸, Anna Jakubowska⁴⁸, Jan Lubinski³³, Fergus J. Couch, Drakoulis Yannoukakos⁹, Nick Orr⁹, Anthony J. Swerdlow²⁸, Hatef Darabi²⁸, Jingmei Li²⁸, Kamila Czene²⁸, Per Hall⁷, Douglas F. Easton¹, Johanna Mattson¹, Carl Blomqvist¹, Kristiina Aittomäki¹, Heli Nevanlinna - Show less +112 more•Institutions (48)

University of Helsinki¹, University of Oulu², Turku University Hospital³, University of Turku⁴, University of Tampere⁵, Hannover Medical School⁶, University of Cambridge⁷, Netherlands Cancer Institute⁸, Institute of Cancer Research⁹, University of Melbourne¹⁰, University of Erlangen-Nuremberg¹¹, University of California, Los Angeles¹², University of London¹³, King's College London¹⁴, Wellcome Trust Centre for Human Genetics¹⁵, Heidelberg University¹⁶, German Cancer Research Center¹⁷, French Institute of Health and Medical Research¹⁸, University of Copenhagen¹⁹, Copenhagen University Hospital²⁰, Beckman Research Institute²¹, University of California, Irvine²², Technische Universität München²³, University of Cologne²⁴, Bosch²⁵, University of Tübingen²⁶, Ruhr University Bochum²⁷, Karolinska Institutet²⁸, University of Eastern Finland²⁹, QIMR Berghofer Medical Research Institute³⁰, Katholieke Universiteit Leuven³¹, University of Hamburg³², Mayo Clinic³³, Cancer Council Victoria³⁴, University of Southern California³⁵, Laval University³⁶, The Breast Cancer Research Foundation³⁷, Oslo University Hospital³⁸, Vanderbilt University³⁹, Oulu University Hospital⁴⁰, University of Toronto⁴¹, Lunenfeld-Tanenbaum Research Institute⁴², Leiden University Medical Center⁴³, Erasmus University Rotterdam⁴⁴, Erasmus University Medical Center⁴⁵, University of Sheffield⁴⁶, Pontifical Xavierian University⁴⁷, Pomeranian Medical University⁴⁸

05 May 2016-PLOS ONE

TL;DR: It is suggested that loss-of-function mutations in RAD 51B are rare, but common variation at the RAD51B region is significantly associated with familial breast cancer risk.

...read moreread less

Abstract: Common variation on 14q24.1, close to RAD51B, has been associated with breast cancer: rs999737 and rs2588809 with the risk of female breast cancer and rs1314913 with the risk of male breast cancer. The aim of this study was to investigate the role of RAD51B variants in breast cancer predisposition, particularly in the context of familial breast cancer in Finland. We sequenced the coding region of RAD51B in 168 Finnish breast cancer patients from the Helsinki region for identification of possible recurrent founder mutations. In addition, we studied the known rs999737, rs2588809, and rs1314913 SNPs and RAD51B haplotypes in 44,791 breast cancer cases and 43,583 controls from 40 studies participating in the Breast Cancer Association Consortium (BCAC) that were genotyped on a custom chip (iCOGS). We identified one putatively pathogenic missense mutation c.541C>T among the Finnish cancer patients and subsequently genotyped the mutation in additional breast cancer cases (n = 5259) and population controls (n = 3586) from Finland and Belarus. No significant association with breast cancer risk was seen in the meta-analysis of the Finnish datasets or in the large BCAC dataset. The association with previously identified risk variants rs999737, rs2588809, and rs1314913 was replicated among all breast cancer cases and also among familial cases in the BCAC dataset. The most significant association was observed for the haplotype carrying the risk-alleles of all the three SNPs both among all cases (odds ratio (OR): 1.15, 95% confidence interval (CI): 1.11-1.19, P = 8.88 x 10-16) and among familial cases (OR: 1.24, 95% CI: 1.16-1.32, P = 6.19 x 10-11), compared to the haplotype with the respective protective alleles. Our results suggest that loss-of-function mutations in RAD51B are rare, but common variation at the RAD51B region is significantly associated with familial breast cancer risk.

...read moreread less

715 citations

Cites methods from "Template-based protein structure mo..."

...Secondary structure prediction was done with RaptorX [27] and protein-protein interaction with PredictProtein [28]....
[...]
...According to RaptorX secondary structure prediction software the arginine in position 181 is located in beta-sheet with the likelihood of 83.4% but, as determined by PredictProtein, the amino acid is not predicted to directly participate in protein-protein interactions....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Protein Data Bank

[...]

Helen M. Berman¹, John D. Westbrook, Zukang Feng, Gary L. Gilliland, Talapady N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne - Show less +4 more•Institutions (1)

Rutgers University¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

...read moreread less

34,239 citations

"Template-based protein structure mo..." refers background or methods in this paper

...The numbered labels indicate the location of the following screen features: (1) tabs for switching between the three-state and eight-state prediction; (2) hovering over a residue will give detailed statistics on the secondary-state distribution; (3) the status: a current running time of the job; (4) a download link for the prediction results; and (5) a color-code legend for secondary structure diagram....
[...]
...The numbered labels indicate the location of the following screen features: (1) a drop-down menu for switching between alternative alignments; (2) the alignment between target sequence and template; (3) indication of the status: a current running time of the job; (4) a link for download of the prediction result; and (5) a legend indicating the alignment color coding....
[...]
...The numbered labels indicate the location of the following screen features: (1) the rank of currently selected model; (2) the quality score of the model; (3) the PDB IDs for the set templates used for modeling; (4) a drop-down menu for selecting alternative structure models; (5) tabs for switching between structure prediction, function annotation and BLAST output; (6) interactive viewer displaying the currently selected model structure; (7) menu for controlling the interactive viewer; (8) alignment used for structure modeling; (9) indication of the status: a current running time of the job; (10) download links for prediction results; and (11) a user guide for the interactive structure viewer....
[...]

Journal Article•DOI•

AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility

[...]

Garrett M. Morris¹, Ruth Huey¹, William Lindstrom¹, Michel F. Sanner¹, Richard K. Belew², David S. Goodsell¹, Arthur J. Olson¹ - Show less +3 more•Institutions (2)

Scripps Research Institute¹, University of California, San Diego²

01 Dec 2009-Journal of Computational Chemistry

TL;DR: AutoDock4 incorporates limited flexibility in the receptor and its utility in analysis of covalently bound ligands is reported, using both a grid‐based docking method and a modification of the flexible sidechain technique.

...read moreread less

Abstract: We describe the testing and release of AutoDock4 and the accompanying graphical user interface AutoDockTools. AutoDock4 incorporates limited flexibility in the receptor. Several tests are reported here, including a redocking experiment with 188 diverse ligand-protein complexes and a cross-docking experiment using flexible sidechains in 87 HIV protease complexes. We also report its utility in analysis of covalently bound ligands, using both a grid-based docking method and a modification of the flexible sidechain technique.

...read moreread less

15,616 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

Journal Article•DOI•

Pfam: the protein families database.

[...]

Robert D. Finn¹, Alex Bateman², Jody Clements¹, Penelope Coggill², Ruth Y. Eberhardt², Sean R. Eddy¹, Andreas Heger, Kirstie Hetherington³, Liisa Holm, Jaina Mistry², Erik L. L. Sonnhammer⁴, John Tate², Marco Punta² - Show less +9 more•Institutions (4)

Howard Hughes Medical Institute¹, European Bioinformatics Institute², Wellcome Trust Sanger Institute³, Stockholm University⁴

01 Jan 2014-Nucleic Acids Research

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.

...read moreread less

Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

...read moreread less

9,415 citations

Journal Article•DOI•

T-Coffee: A novel method for fast and accurate multiple sequence alignment.

[...]

Cedric Notredame¹, Cedric Notredame², Cedric Notredame³, Desmond G. Higgins⁴, Jaap Heringa¹ - Show less +1 more•Institutions (4)

National Institute for Medical Research¹, ISREC², Centre national de la recherche scientifique³, University College Cork⁴

08 Sep 2000-Journal of Molecular Biology

TL;DR: A new method for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives but avoids the most serious pitfalls caused by the greedy nature of this algorithm.

...read moreread less

6,727 citations