Home
/
Authors
/
Marco Punta

Author

Marco Punta

Other affiliations: St. Jude Children's Research Hospital, Structural Genomics Consortium, Wellcome Trust Sanger Institute ...read more

Bio: Marco Punta is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Structural genomics & Membrane protein. The author has an hindex of 29, co-authored 51 publications receiving 31990 citations. Previous affiliations of Marco Punta include St. Jude Children's Research Hospital & Structural Genomics Consortium.

Papers published on a yearly basis

2023
2022
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2003
2002
2000

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

Journal Article•DOI•

Pfam: the protein families database.

[...]

Robert D. Finn¹, Alex Bateman², Jody Clements¹, Penelope Coggill², Ruth Y. Eberhardt², Sean R. Eddy¹, Andreas Heger, Kirstie Hetherington³, Liisa Holm, Jaina Mistry², Erik L. L. Sonnhammer⁴, John Tate², Marco Punta² - Show less +9 more•Institutions (4)

Howard Hughes Medical Institute¹, European Bioinformatics Institute², Wellcome Trust Sanger Institute³, Stockholm University⁴

01 Jan 2014-Nucleic Acids Research

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.

...read moreread less

Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

...read moreread less

9,415 citations

Journal Article•DOI•

The Pfam protein families database: towards a more sustainable future

[...]

Robert D. Finn¹, Penelope Coggill¹, Ruth Y. Eberhardt², Ruth Y. Eberhardt¹, Sean R. Eddy³, Sean R. Eddy⁴, Jaina Mistry¹, Alex L. Mitchell¹, Simon C. Potter¹, Marco Punta⁵, Marco Punta¹, Matloob Qureshi¹, Amaia Sangrador-Vegas¹, Gustavo A. Salazar¹, John Tate², John Tate¹, Alex Bateman¹ - Show less +13 more•Institutions (5)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute², Howard Hughes Medical Institute³, Harvard University⁴, University of Paris⁵

04 Jan 2016-Nucleic Acids Research

TL;DR: Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set, and the facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

...read moreread less

Abstract: In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

...read moreread less

4,906 citations

Journal Article•DOI•

The InterPro protein families database: the classification resource after 15 years

[...]

Alex L. Mitchell¹, Hsin-Yu Chang¹, Louise C. Daugherty¹, Matthew Fraser¹, Sarah Hunter¹, Rodrigo Lopez¹, Craig McAnulla¹, Conor McMenamin¹, Gift Nuka¹, Sebastien Pesseat¹, Amaia Sangrador-Vegas¹, Maxim Scheremetjew¹, Claudia Rato¹, Siew-Yit Yong¹, Alex Bateman¹, Marco Punta¹, Teresa K. Attwood², Christian J. A. Sigrist³, Nicole Redaschi³, Catherine Rivoire³, Ioannis Xenarios³, Daniel Kahn, Dominique Guyot, Peer Bork¹, Ivica Letunic¹, Julian Gough⁴, Matt E. Oates⁴, Daniel H. Haft⁵, Hongzhan Huang⁶, Darren A. Natale⁶, Cathy H. Wu⁶, Christine A. Orengo⁷, Ian Sillitoe⁷, Huaiyu Mi⁸, Paul Thomas⁸, Robert D. Finn¹ - Show less +32 more•Institutions (8)

European Bioinformatics Institute¹, University of Manchester², Swiss Institute of Bioinformatics³, University of Bristol⁴, J. Craig Venter Institute⁵, Georgetown University Medical Center⁶, University College London⁷, University of Southern California⁸

28 Jan 2015-Nucleic Acids Research

TL;DR: The new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined, and the challenges faced by the resource given the explosive growth in sequence data in recent years are discussed.

...read moreread less

Abstract: The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.

...read moreread less

1,189 citations

Journal Article•DOI•

InterPro in 2011: new developments in the family and domain prediction database

[...]

Sarah Hunter¹, Philip Jones, Alex L. Mitchell, Rolf Apweiler, Teresa K. Attwood, Alex Bateman, Thomas E. Bernard, David Binns, Peer Bork, Sarah W. Burge, Edouard de Castro, Penny Coggill, Matthew Corbett, Ujjwal Das, Louise C. Daugherty, Lauranne Duquenne, Robert D. Finn, Matthew Fraser, Julian Gough, Daniel H. Haft, Nicolas Hulo, Daniel Kahn, Elizabeth Kelly, Ivica Letunic, David M. Lonsdale, Rodrigo Lopez, Martin Madera, John Maslen, Craig McAnulla, Jennifer McDowall, Conor McMenamin, Huaiyu Mi, Prudence Mutowo-Muellenet, Nicola Mulder, Darren A. Natale, Christine A. Orengo, Sebastien Pesseat, Marco Punta, Antony F. Quinn, Catherine Rivoire, Amaia Sangrador-Vegas, Jeremy D. Selengut, Christian J. A. Sigrist, Maxim Scheremetjew, John Tate, Manjulapramila Thimmajanarthanan, Paul Thomas, Cathy H. Wu, Corin Yeats, Siew Yit Yong - Show less +46 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2012-Nucleic Acids Research

TL;DR: An overview of new developments in the InterPro database and its associated software since 2009 is given, including updates to database content, curation processes and Web and programmatic interfaces.

...read moreread less

Abstract: InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and metagenomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to

...read moreread less

1,094 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Gene Ontology: tool for the unification of biology

[...]

M Ashburner¹, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. M. Cherry, Allan Peter Davis, Kara Dolinski, Selina S. Dwight, J.T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna E. Lewis, John C. Matese, Joel E. Richardson, M. Ringwald, Gerald M. Rubin, Gavin Sherlock - Show less +16 more•Institutions (1)

Stanford University¹

01 May 2000-Nature Genetics

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

...read moreread less

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read moreread less

35,225 citations

Journal Article•DOI•

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

[...]

Kazutaka Katoh¹, Daron M. Standley¹•Institutions (1)

Osaka University¹

01 Apr 2013-Molecular Biology and Evolution

TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.

...read moreread less

Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

...read moreread less

27,771 citations

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

Search and clustering orders of magnitude faster than BLAST

[...]

Robert C. Edgar

01 Oct 2010-Bioinformatics

TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.

...read moreread less

Abstract: Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

17,301 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

...read moreread less

14,075 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse