PROSITE, a protein domain database for functional characterization and annotation
Christian J. A. Sigrist,Lorenzo Cerutti,Edouard de Castro,Petra S. Langendijk-Genevaux,Virginie Bulliard,Amos Marc Bairoch,Nicolas Hulo +6 more
TLDR
AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources are described.Abstract:
PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/.read more
Citations
More filters
Journal ArticleDOI
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Kazutaka Katoh,Daron M. Standley +1 more
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
Detecting differential usage of exons from RNA-seq data.
TL;DR: DEXSeq is presented, a statistical method to test for differential exon usage in RNA-seq data that uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account.
Journal ArticleDOI
New and continuing developments at PROSITE
Christian J. A. Sigrist,Edouard de Castro,Lorenzo Cerutti,Béatrice A. Cuche,Nicolas Hulo,Alan Bridge,Lydie Bougueleret,Ioannis Xenarios +7 more
TL;DR: Recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery are described.
Journal ArticleDOI
InterPro in 2011: new developments in the family and domain prediction database
Sarah Hunter,Philip Jones,Alex L. Mitchell,Rolf Apweiler,Teresa K. Attwood,Alex Bateman,Thomas E. Bernard,David Binns,Peer Bork,Sarah W. Burge,Edouard de Castro,Penny Coggill,Matthew Corbett,Ujjwal Das,Louise C. Daugherty,Lauranne Duquenne,Robert D. Finn,Matthew Fraser,Julian Gough,Daniel H. Haft,Nicolas Hulo,Daniel Kahn,Elizabeth Kelly,Ivica Letunic,David M. Lonsdale,Rodrigo Lopez,Martin Madera,John Maslen,Craig McAnulla,Jennifer McDowall,Conor McMenamin,Huaiyu Mi,Prudence Mutowo-Muellenet,Nicola Mulder,Darren A. Natale,Christine A. Orengo,Sebastien Pesseat,Marco Punta,Antony F. Quinn,Catherine Rivoire,Amaia Sangrador-Vegas,Jeremy D. Selengut,Christian J. A. Sigrist,Maxim Scheremetjew,John Tate,Manjulapramila Thimmajanarthanan,Paul Thomas,Cathy H. Wu,Corin Yeats,Siew Yit Yong +49 more
TL;DR: An overview of new developments in the InterPro database and its associated software since 2009 is given, including updates to database content, curation processes and Web and programmatic interfaces.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Journal ArticleDOI
Jalview Version 2--a multiple sequence alignment editor and analysis workbench.
TL;DR: Jalview 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server.
Journal ArticleDOI
InterPro: the integrative protein signature database
Sarah Hunter,Rolf Apweiler,Teresa K. Attwood,Amos Marc Bairoch,Alex Bateman,David Binns,Peer Bork,Ujjwal Das,Louise C. Daugherty,Lauranne Duquenne,Robert D. Finn,Julian Gough,Daniel H. Haft,Nicolas Hulo,Daniel Kahn,Elizabeth Kelly,Aurélie Laugraud,Ivica Letunic,David M. Lonsdale,Rodrigo Lopez,Martin Madera,John Maslen,Craig McAnulla,Jennifer McDowall,Jaina Mistry,Alex L. Mitchell,Nicola Mulder,Darren A. Natale,Christine A. Orengo,Antony F. Quinn,Jeremy D. Selengut,Christian J. A. Sigrist,Manjula Thimma,Paul Thomas,Franck Valentin,Derek Wilson,Cathy H. Wu,Corin Yeats +37 more
TL;DR: The InterPro database integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs.