PROSITE, a protein domain database for functional characterization and annotation

doi:10.1093/NAR/GKP885

Open AccessJournal ArticleDOI

PROSITE, a protein domain database for functional characterization and annotation

Christian J. A. Sigrist, +6 more

- 01 Jan 2010 -

Nucleic Acids Research

- Vol. 38, pp 161-166

TLDR

AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources are described.

Abstract:

PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Kazutaka Katoh, +1 more

- 01 Apr 2013 -

Molecular Biology and Evolution

TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.

...read moreread less

Journal ArticleDOI

The Pfam protein families database

Marco Punta, +15 more

- 01 Jan 2000 -

Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Journal ArticleDOI

Detecting differential usage of exons from RNA-seq data.

Simon Anders, +2 more

- 21 Jun 2012 -

Genome Research

TL;DR: DEXSeq is presented, a statistical method to test for differential exon usage in RNA-seq data that uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account.

...read moreread less

Journal ArticleDOI

New and continuing developments at PROSITE

Christian J. A. Sigrist, +7 more

- 17 Nov 2012 -

Nucleic Acids Research

TL;DR: Recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery are described.

...read moreread less

Journal ArticleDOI

InterPro in 2011: new developments in the family and domain prediction database

Sarah Hunter, +49 more

- 01 Jan 2012 -

Nucleic Acids Research

TL;DR: An overview of new developments in the InterPro database and its associated software since 2009 is given, including updates to database content, curation processes and Web and programmatic interfaces.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Stephen F. Altschul, +6 more

- 01 Sep 1997 -

Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Journal ArticleDOI

The Pfam protein families database

Marco Punta, +15 more

- 01 Jan 2000 -

Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Journal ArticleDOI

Pfam: the protein families database.

Robert D. Finn, +12 more

- 01 Jan 2014 -

Nucleic Acids Research

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.

...read moreread less

Journal ArticleDOI

Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Andrew M. Waterhouse, +4 more

- 01 May 2009 -

Bioinformatics

TL;DR: Jalview 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server.

...read moreread less

Journal ArticleDOI

InterPro: the integrative protein signature database

Sarah Hunter, +37 more

- 01 Jan 2009 -

Nucleic Acids Research

TL;DR: The InterPro database integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs.

...read moreread less