Journal ArticleDOI
Pfam : a comprehensive database of protein domain families based on seed alignments
Reads0
Chats0
TLDR
A database based on hidden Markov model profiles (HMMs), which combines high quality and completeness, and a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified.Abstract:
Databases of multiple se- quence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a data- base is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas complete- ness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-Ais curated and contains well-character- ized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated auto- matically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from theCaenorhabditis elegans genome project were classified. We have also identified many novel family member- ships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-Afamilies have perma- nent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405-420, 1997. r1997 Wiley-Liss, Inc.read more
Citations
More filters
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
The sequence of the human genome.
J. Craig Venter,Mark Raymond Adams,Eugene W. Myers,Peter W. Li,Richard J. Mural,Granger G. Sutton,Hamilton O. Smith,Mark Yandell,Cheryl A. Evans,Robert A. Holt,Jeannine D. Gocayne,Peter Amanatides,Richard M. Ballew,Daniel H. Huson,Jennifer R. Wortman,Qing Zhang,Chinnappa D. Kodira,Xiangqun H. Zheng,Lin Chen,Marian P. Skupski,Gangadharan Subramanian,Paul Thomas,Jinghui Zhang,George L. Gabor Miklos,Catherine R. Nelson,Samuel Broder,Andrew G. Clark,J. H. Nadeau,Victor A. McKusick,Norton D. Zinder,Arnold J. Levine,Richard J. Roberts,M. I. Simon,Carolyn W. Slayman,Michael W. Hunkapiller,Randall Bolanos,Arthur L. Delcher,Ian M. Dew,Daniel Fasulo,Michael Flanigan,Liliana Florea,Aaron L. Halpern,Sridhar Hannenhalli,Saul A. Kravitz,Samuel Levy,Clark M. Mobarry,Knut Reinert,Karin A. Remington,Jane Abu-Threideh,Ellen M. Beasley,Kendra Biddick,Vivien Bonazzi,Rhonda Brandon,Michele Cargill,Ishwar Chandramouliswaran,Rosane Charlab,Kabir Chaturvedi,Zuoming Deng,Valentina Di Francesco,Patrick Dunn,Karen Eilbeck,Carlos Evangelista,Andrei Gabrielian,Weiniu Gan,Wangmao Ge,Fangcheng Gong,Zhiping Gu,Ping Guan,Thomas J. Heiman,Maureen E. Higgins,Rui-Ru Ji,Zhaoxi Ke,Karen A. Ketchum,Zhongwu Lai,Yiding Lei,Zhenya Li,Jiayin Li,Yong Liang,Xiaoying Lin,Fu Lu,Gennady V. Merkulov,Natalia Milshina,Helen M. Moore,Ashwinikumar K Naik,Vaibhav A. Narayan,Beena Neelam,Deborah Nusskern,Douglas B. Rusch,Steven L. Salzberg,Wei Shao,Bixiong Chris Shue,Jingtao Sun,Zhen Yuan Wang,Aihui Wang,Xin Wang,Jian Wang,Ming-Hui Wei,Ron Wides,Chunlin Xiao,Chunhua Yan,Alison Yao,Jane Ye,Ming Zhan,Weiqing Zhang,Hongyu Zhang,Qi Zhao,Liansheng Zheng,Fei Zhong,Wenyan Zhong,Shiaoping C. Zhu,Shaying Zhao,Dennis A. Gilbert,Suzanna Baumhueter,Gene Spier,Christine Carter,Anibal Cravchik,Trevor Woodage,Feroze Ali,Huijin An,Aderonke Awe,Danita Baldwin,Holly Baden,Mary Barnstead,Ian Barrow,Karen Beeson,Dana A. Busam,Amy Carver,Ming Lai Cheng,Liz Curry,Steve Danaher,Lionel Davenport,Raymond Desilets,Susanne Dietz,Kristina Dodson,Lisa Doup,Steven Ferriera,Neha Garg,Andres Gluecksmann,Brit J. Hart,Jason Haynes,Charles Haynes,Cheryl Heiner,Suzanne Hladun,Damon Hostin,Jarrett Houck,Timothy Howland,Chinyere Ibegwam,Jeffery Johnson,Francis Kalush,Lesley Kline,Shashi Koduru,Amy Love,Felecia Mann,David May,Steven McCawley,Tina C. McIntosh,Ivy McMullen,Mee Moy,Linda Moy,Brian Murphy,Keith Nelson,Cynthia Pfannkoch,Eric Pratts,Vinita Puri,Hina Qureshi,Matthew Reardon,Robert Rodriguez,Yu-Hui Rogers,Deanna Romblad,Bob Ruhfel,Richard T. Scott,Cynthia Sitter,Michelle Smallwood,Erin Stewart,Renee Strong,Ellen Suh,Reginald Thomas,Ni Ni Tint,Sukyee Tse,Claire Vech,Gary Wang,Jeremy Wetter,Sherita Williams,Monica Williams,Sandra Windsor,Emily Winn-Deen,Keriellen Wolfe,Jayshree Zaveri,Karena Zaveri,Josep F. Abril,Roderic Guigó,Michael J. Campbell,Kimmen Sjölander,Brian Karlak,Anish Kejariwal,Huaiyu Mi,Betty Lazareva,Thomas Hatton,Apurva Narechania,Karen Diemer,Anushya Muruganujan,Nan Guo,Shinji Sato,Vineet Bafna,Sorin Istrail,Ross Lippert,Russell Schwartz,Brian P. Walenz,Shibu Yooseph,David Allen,Anand Basu,James Baxendale,Louis Blick,Marcelo Caminha,John Carnes-Stine,Parris Caulk,Yen-Hui Chiang,My Coyne,Carl Dahlke,Anne Deslattes Mays,Maria Dombroski,Michael Donnelly,Dale Ely,Shiva Esparham,Carl Fosler,Harold Gire,Stephen Glanowski,Kenneth Glasser,Anna Glodek,Mark Gorokhov,Ken Graham,Barry Gropman,Michael Harris,Jeremy Heil,Scott Henderson,Jeffrey Hoover,Donald Jennings,Catherine Jordan,James Jordan,John Kasha,Leonid Kagan,Cheryl L. Kraft,Alexander Levitsky,Mark Lewis,Xiangjun Liu,John Lopez,Daniel Ma,William H. Majoros,Joe McDaniel,Sean C. Murphy,Matthew Newman,Trung Hieu Nguyen,Ngoc Nguyen,Marc Nodell,Sue Pan,Jim Peck,Marshall Peterson,William Rowe,Robert Sanders,John Scott,Michael Simpson,Thomas J. Smith,Arlan Sprague,Timothy B. Stockwell,Russell Turner,Eli Venter,Mei Wang,Meiyuan Wen,David Wu,Mitchell Wu,Ashley Xia,Ali Zandieh,Xiaohong Zhu +272 more
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Journal ArticleDOI
DAVID: Database for Annotation, Visualization, and Integrated Discovery
Glynn Dennis,Brad T. Sherman,Douglas A. Hosack,Jun Jun Yang,Wei Gao,H. Clifford Lane,Richard A. Lempicki +6 more
TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Journal ArticleDOI
Role for a bidentate ribonuclease in the initiation step of RNA interference
TL;DR: Dicer is a member of the RNase III family of nucleases that specifically cleave double-stranded RNAs, and is evolutionarily conserved in worms, flies, plants, fungi and mammals, and has a distinctive structure, which includes a helicase domain and dualRNase III motifs.
Journal ArticleDOI
Profile hidden Markov models.
TL;DR: Profile HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise and complement standard pairwise comparison methods for large-scale sequence analysis.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Journal ArticleDOI
A comprehensive set of sequence analysis programs for the VAX
TL;DR: A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS operating system.
Journal ArticleDOI
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
TL;DR: This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references.
Journal ArticleDOI
CLUSTAL V: improved software for multiple sequence alignment.
TL;DR: The CLUSTAL package of multiple sequence alignment programs has been completely rewritten and many new features added, the main new features are the ability to store and reuse old alignments and to calculate phylogenetic trees after alignment.