Topic

Protein Annotation

About: Protein Annotation is a research topic. Over the lifetime, 279 publications have been published within this topic receiving 18757 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

KEGG as a reference resource for gene and protein annotation

[...]

Minoru Kanehisa¹, Yoko Sato², Masayuki Kawashima², Miho Furumichi¹, Mao Tanabe¹ - Show less +1 more•Institutions (2)

Kyoto University¹, Fujitsu²

04 Jan 2016-Nucleic Acids Research

TL;DR: The KEGG GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes, and new automatic annotation servers, BlastKOalA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from theGENES database.

...read moreread less

Abstract: KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.

...read moreread less

4,847 citations

Journal Article•DOI•

CDD: a Conserved Domain Database for the functional annotation of proteins

[...]

Aron Marchler-Bauer¹, Shennan Lu¹, John B. Anderson¹, Farideh Chitsaz¹, Myra K. Derbyshire¹, Carol DeWeese-Scott¹, Jessica H. Fong¹, Lewis Y. Geer¹, Renata C. Geer¹, Noreen R. Gonzales¹, Marc Gwadz¹, David I. Hurwitz¹, John D. Jackson¹, Zhaoxi Ke¹, Christopher J. Lanczycki¹, Fu-Ping Lu¹, Gabriele H. Marchler¹, Mikhail Mullokandov¹, Marina V. Omelchenko¹, Cynthia L. Robertson¹, James S. Song¹, Narmada Thanki¹, Roxanne A. Yamashita¹, Dachuan Zhang¹, Naigong Zhang¹, Chanjuan Zheng¹, Stephen H. Bryant¹ - Show less +23 more•Institutions (1)

National Institutes of Health¹

01 Jan 2011-Nucleic Acids Research

TL;DR: NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints.

...read moreread less

Abstract: NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

...read moreread less

2,934 citations

Journal Article•DOI•

The Universal Protein Resource (UniProt): an expanding universe of protein information

[...]

Cathy H. Wu¹, Rolf Apweiler, Amos Marc Bairoch, Darren A. Natale, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria Jesus Martin, Raja Mazumder, Claire O'Donovan, Nicole Redaschi, Baris E. Suzek - Show less +12 more•Institutions (1)

Georgetown University Medical Center¹

01 Jan 2006-Nucleic Acids Research

TL;DR: The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics.

...read moreread less

Abstract: The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.

...read moreread less

1,092 citations

Journal Article•DOI•

Protter: interactive protein feature visualization and integration with experimental proteomic data

[...]

Ulrich Omasits¹, Christian H. Ahrens¹, Sebastian Müller¹, Bernd Wollscheid¹•Institutions (1)

University of Zurich¹

15 Mar 2014-Bioinformatics

TL;DR: Protter, a web-based tool that supports interactive protein data analysis and hypothesis generation by visualizing both annotated sequence features and experimental proteomic data in the context of protein topology, is presented.

...read moreread less

Abstract: Summary: The ability to integrate and visualize experimental proteomic evidence in the context of rich protein feature annotations represents an unmet need of the proteomics community. Here we present Protter, a web-based tool that supports interactive protein data analysis and hypothesis generation by visualizing both annotated sequence features and experimental proteomic data in the context of protein topology. Protter supports numerous proteomic file formats and automatically integrates a variety of reference protein annotation sources, which can be readily extended via modular plugins. A built-in export function produces publication-quality customized protein illustrations, also for large datasets. Visualizations of surfaceome datasets show the specific utility of Protter for the integrated visual analysis of membrane proteins and peptide selection for targeted proteomics. Availability and implementation: The Protter web application is available at http://wlab.ethz.ch/protter. Source code and installation instructions are available at http://ulo.github.io/Protter/.

...read moreread less

969 citations

Journal Article•DOI•

RefSeq: an update on mammalian reference sequences

[...]

Kim D. Pruitt¹, Garth Brown¹, Susan M. Hiatt¹, Françoise Thibaud-Nissen¹, Alexander Astashyn¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Jennifer Hart¹, Melissa J. Landrum¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Nuala A. O'Leary¹, Shashikant Pujar¹, Bhanu Rajput¹, Sanjida H. Rangwala¹, Lillian D. Riddick¹, Andrei Shkeda¹, Hanzhen Sun¹, Pamela Tamez¹, Raymond E. Tully¹, Craig Wallin¹, David Webb¹, Janet Weber¹, Wendy Wu¹, Michael DiCuccio¹, Paul Kitts¹, Donna Maglott¹, Terence Murphy¹, James Ostell¹ - Show less +25 more•Institutions (1)

National Institutes of Health¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration.

...read moreread less

Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://wwwncbinlmnihgov/refseq/) We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project

...read moreread less

949 citations

Collapse

Network Information

Performance

Metrics

279

Papers

22,085

Citations

No. of papers in the topic in previous years
Year	Papers
2021	21
2020	14
2019	15
2018	12
2017	12
2016	10

Protein Annotation

Papers published on a yearly basis

Papers

Trending Questions (4)

Network Information

Related Topics (5)

Performance

Metrics