CDD/SPARCLE: the conserved domain database in 2020
Shennan Lu,Jiyao Wang,Farideh Chitsaz,Myra K. Derbyshire,Renata C. Geer,Noreen R. Gonzales,Marc Gwadz,David I. Hurwitz,Gabriele H. Marchler,James S. Song,Narmada Thanki,Roxanne A. Yamashita,Mingzhang Yang,Dachuan Zhang,Chanjuan Zheng,Christopher J. Lanczycki,Aron Marchler-Bauer +16 more
Reads0
Chats0
TLDR
As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research.Abstract:
As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.read more
Citations
More filters
Journal ArticleDOI
The InterPro protein families and domains database: 20 years on.
Matthias Blum,Hsin-Yu Chang,Sara Chuguransky,Tiago Grego,Swaathi Kandasaamy,Alex L. Mitchell,Gift Nuka,Typhaine Paysan-Lafosse,Matloob Qureshi,Shriya Raj,Lorna Richardson,Gustavo A. Salazar,Lowri Williams,Peer Bork,Alan Bridge,Julian Gough,Daniel H. Haft,Ivica Letunic,Aron Marchler-Bauer,Huaiyu Mi,Darren A. Natale,Marco Necci,Christine A. Orengo,Arun Prasad Pandurangan,Catherine Rivoire,Christian J. A. Sigrist,Ian Sillitoe,Narmada Thanki,Paul Thomas,Silvio C. E. Tosatto,Cathy H. Wu,Alex Bateman,Robert D. Finn +32 more
TL;DR: The status of InterPro (version 81.0) in its 20th year of operation, and its associated software, is reported, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.
Journal ArticleDOI
RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation
Wenjun Li,Kathleen R O’Neill,Daniel H. Haft,Michael DiCuccio,Vyacheslav Chetvernin,Azat Badretdin,George Coulouris,Farideh Chitsaz,Myra K. Derbyshire,A Scott Durkin,Noreen R. Gonzales,Marc Gwadz,Christopher J. Lanczycki,James S. Song,Narmada Thanki,Jiyao Wang,Roxanne A. Yamashita,Mingzhang Yang,Chanjuan Zheng,Aron Marchler-Bauer,Françoise Thibaud-Nissen +20 more
TL;DR: The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation.
Journal ArticleDOI
Protein sequence analysis using the MPI Bioinformatics Toolkit
Felix Gabler,Seung-Zin Nam,Sebastian Till,Milot Mirdita,Martin Steinegger,Martin Steinegger,Johannes Söding,Andrei N. Lupas,Vikram Alva +8 more
TL;DR: Detailed information is provided on utilizing the three most widely accessed tools within the MPI Bioinformatics Toolkit: HHpred for the detection of homologs, HHpred in conjunction with MODELLER for structure prediction and homology modeling, and CLANS for the visualization of relationships in large sequence datasets.
Journal ArticleDOI
COG database update: focus on microbial diversity, model organisms, and widespread pathogens.
Michael Y. Galperin,Yuri I. Wolf,Kira S. Makarova,Roberto Vera Alvarez,David Landsman,Eugene V. Koonin +5 more
TL;DR: The Clusters of Orthologous Genes (COG) database, created in 1997 and went through several rounds of updates, most recently, in 2014, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus.
Journal ArticleDOI
InterPro in 2022
Typhaine Paysan-Lafosse,Matthias Blum,Sara Chuguransky,Tiago Daniel Pereira Grego,Beatriz Pinto,Gustavo A. Salazar,Maxwell L. Bileschi,Peer Bork,Alan Bridge,Lucy J. Colwell,Julian Gough,Daniel H. Haft,Ivica Letunic,Aron Marchler-Bauer,Huaiyu Mi,Darren A. Natale,Christine A. Orengo,Arun Prasad Pandurangan,Catherine Rivoire,Christian J. A. Sigrist,Ian Sillitoe,Narmada Thanki,Paul Thomas,Silvio C. E. Tosatto,Cathy H. Wu,Alex Bateman +25 more
TL;DR: The InterPro database as discussed by the authors provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites, and provides a more user friendly access to the data.
References
More filters
Journal ArticleDOI
The Pfam protein families database in 2019.
Sara El-Gebali,Jaina Mistry,Alex Bateman,Sean R. Eddy,Aurelien Luciani,Simon C. Potter,Matloob Qureshi,Lorna Richardson,Gustavo A. Salazar,Alfredo Smart,Erik L. L. Sonnhammer,Layla Hirsh,Layla Hirsh,Lisanna Paladin,Damiano Piovesan,Silvio C. E. Tosatto,Robert D. Finn +16 more
TL;DR: A significant comparison to the structural classification database that led to the creation of 825 new families based on their set of uncharacterized families (EUFs) was carried out and Pfam entries were connected to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms.
Journal ArticleDOI
CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.
Aron Marchler-Bauer,Yu Bo,Lianyi Han,Jane He,Christopher J. Lanczycki,Shennan Lu,Farideh Chitsaz,Myra K. Derbyshire,Renata C. Geer,Noreen R. Gonzales,Marc Gwadz,David I. Hurwitz,Fu Lu,Gabriele H. Marchler,James S. Song,Narmada Thanki,Zhouxi Wang,Roxanne A. Yamashita,Dachuan Zhang,Chanjuan Zheng,Lewis Y. Geer,Stephen H. Bryant +21 more
TL;DR: NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints.
Journal ArticleDOI
CD-Search: protein domain annotations on the fly
TL;DR: The Conserved Domain Search service (CD-Search), a web-based tool for the detection of structural and functional domains in protein sequences, uses BLAST(R) heuristics to provide a fast, interactive service, and searches a comprehensive collection of domain models.
Journal ArticleDOI
The COG database: new developments in phylogenetic classification of proteins from complete genomes
Roman L. Tatusov,Darren A. Natale,Igor Garkavtsev,Tatiana Tatusova,Uma Shankavaram,Bachoti S. Rao,Boris Kiryutin,Michael Y. Galperin,Natalie D. Fedorova,Eugene V. Koonin +9 more
TL;DR: The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.
Journal ArticleDOI
20 years of the SMART protein domain annotation resource.
Ivica Letunic,Peer Bork +1 more
TL;DR: In its 20th year, the SMART analysis results pages have been streamlined again and its information sources have been updated, and the internal full text search engine has been redesigned and updated, resulting in greatly increased search speed.