scispace - formally typeset
Journal ArticleDOI

Pfam : a comprehensive database of protein domain families based on seed alignments

Reads0
Chats0
TLDR
A database based on hidden Markov model profiles (HMMs), which combines high quality and completeness, and a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified.
Abstract
Databases of multiple se- quence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a data- base is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas complete- ness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-Ais curated and contains well-character- ized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated auto- matically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from theCaenorhabditis elegans genome project were classified. We have also identified many novel family member- ships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-Afamilies have perma- nent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405-420, 1997. r1997 Wiley-Liss, Inc.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The Pfam protein families database

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI

The sequence of the human genome.

J. Craig Venter, +272 more
- 16 Feb 2001 - 
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Journal ArticleDOI

DAVID: Database for Annotation, Visualization, and Integrated Discovery

TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Journal ArticleDOI

Role for a bidentate ribonuclease in the initiation step of RNA interference

TL;DR: Dicer is a member of the RNase III family of nucleases that specifically cleave double-stranded RNAs, and is evolutionarily conserved in worms, flies, plants, fungi and mammals, and has a distinctive structure, which includes a helicase domain and dualRNase III motifs.
Journal ArticleDOI

Profile hidden Markov models.

TL;DR: Profile HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise and complement standard pairwise comparison methods for large-scale sequence analysis.
References
More filters
Journal ArticleDOI

Basic Local Alignment Search Tool

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Journal ArticleDOI

A comprehensive set of sequence analysis programs for the VAX

TL;DR: A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS operating system.
Journal ArticleDOI

SCOP: a structural classification of proteins database for the investigation of sequences and structures.

TL;DR: This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references.
Journal ArticleDOI

CLUSTAL V: improved software for multiple sequence alignment.

TL;DR: The CLUSTAL package of multiple sequence alignment programs has been completely rewritten and many new features added, the main new features are the ability to store and reuse old alignments and to calculate phylogenetic trees after alignment.
Related Papers (5)