The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes
Ross Overbeek,Tadhg P. Begley,Ralph Butler,Jomuna V. Choudhuri,Han-Yu Chuang,Matthew P. Cohoon,Valérie de Crécy-Lagard,Naryttza N. Diaz,Terry Disz,Robert D. Edwards,Robert D. Edwards,Michael Fonstein,Ed D. Frank,Svetlana Gerdes,Elizabeth M. Glass,Alexander Goesmann,Andrew C. Hanson,Dirk Iwata-Reuyl,Roy A. Jensen,Neema Jamshidi,Lutz Krause,Michael Kubal,Niels Bent Larsen,Burkhard Linke,Alice C. McHardy,Folker Meyer,Heiko Neuweger,Gary J. Olsen,Robert Olson,Andrei L. Osterman,Vasiliy A. Portnoy,Gordon D. Pusch,Dmitry A. Rodionov,Christian Rückert,Jason Steiner,Rick Stevens,Rick Stevens,Ines Thiele,Olga Vassieva,Yuzhen Ye,Olga Zagnitko,Veronika Vonstein +41 more
TLDR
The subsystem approach is described, the first release of the growing library of populated subsystems is offered, and the SEED is the first annotation environment that supports this model of annotation.Abstract:
The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.read more
Citations
More filters
Journal ArticleDOI
The RAST Server: Rapid Annotations using Subsystems Technology
Ramy K. Aziz,Ramy K. Aziz,Daniela Bartels,Aaron A. Best,Matthew DeJongh,Terrence Disz,Terrence Disz,Robert Edwards,Kevin Formsma,Svetlana Gerdes,Elizabeth M. Glass,Michael Kubal,Folker Meyer,Folker Meyer,Gary J. Olsen,Gary J. Olsen,Robert Olson,Robert Olson,Andrei L. Osterman,Ross Overbeek,Leslie Klis McNeil,Daniel Paarmann,Tobias Paczian,Bruce Parrello,Gordon D. Pusch,Claudia I. Reich,Rick Stevens,Rick Stevens,Olga Vassieva,Veronika Vonstein,Andreas Wilke,Olga Zagnitko +31 more
TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.
Journal ArticleDOI
Metagenomic biomarker discovery and explanation
Nicola Segata,Jacques Izard,Jacques Izard,Levi Waldron,Dirk Gevers,Larisa Miropolsky,Wendy S. Garrett,Curtis Huttenhower +7 more
TL;DR: A new method for metagenomic biomarker discovery is described and validates by way of class comparison, tests of biological consistency and effect size estimation to address the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities.
Journal ArticleDOI
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)
Ross Overbeek,Robert Olson,Gordon D. Pusch,Gary J. Olsen,James J. Davis,Terry Disz,Robert Edwards,Svetlana Gerdes,Bruce Parrello,Maulik Shukla,Veronika Vonstein,Alice R. Wattam,Fangfang Xia,Rick Stevens +13 more
TL;DR: The interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources are described.
Journal ArticleDOI
The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes
Folker Meyer,Folker Meyer,Daniel Paarmann,Mark D'Souza,Robert Olson,Elizabeth M. Glass,Michael Kubal,Tobias Paczian,Alexis A. Rodriguez,Rick Stevens,Rick Stevens,Andreas Wilke,Jared Wilkening,Robert Edwards,Robert Edwards +14 more
TL;DR: The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes that is stable, extensible, and freely available to all researchers.
Journal ArticleDOI
BIGSdb: Scalable analysis of bacterial genome variation at the population level
TL;DR: The Bacterial Isolate Genome Sequence Database (BIGSDB) represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach.
References
More filters
Journal ArticleDOI
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI
KEGG: Kyoto Encyclopedia of Genes and Genomes
Minoru Kanehisa,Susumu Goto +1 more
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Journal ArticleDOI
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
Fleischmann Rd,Adams,Owen White,Rebecca A. Clayton,Ewen F. Kirkness,Anthony R. Kerlavage,Carol J. Bult,J F Tomb,Brian Dougherty,Merrick Jm +9 more
TL;DR: An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence of the genome from the bacterium Haemophilus influenzae Rd.
Journal ArticleDOI
MetaCyc: a multiorganism database of metabolic pathways and enzymes
Ron Caspi,Hartmut Foerster,Carol A. Fulcher,Rebecca Hopkinson,John L. Ingraham,Pallavi Kaipa,Markus Krummenacker,Suzanne M. Paley,John Pick,Seung Y. Rhee,Christophe Tissier,Peifen Zhang,Peter D. Karp +12 more
TL;DR: In the past 2 years the data content and the Pathway Tools software used to query, visualize and edit MetaCyc have been expanded significantly, and these enhancements are described in this paper.
Related Papers (5)
The RAST Server: Rapid Annotations using Subsystems Technology
Ramy K. Aziz,Ramy K. Aziz,Daniela Bartels,Aaron A. Best,Matthew DeJongh,Terrence Disz,Terrence Disz,Robert Edwards,Kevin Formsma,Svetlana Gerdes,Elizabeth M. Glass,Michael Kubal,Folker Meyer,Folker Meyer,Gary J. Olsen,Gary J. Olsen,Robert Olson,Robert Olson,Andrei L. Osterman,Ross Overbeek,Leslie Klis McNeil,Daniel Paarmann,Tobias Paczian,Bruce Parrello,Gordon D. Pusch,Claudia I. Reich,Rick Stevens,Rick Stevens,Olga Vassieva,Veronika Vonstein,Andreas Wilke,Olga Zagnitko +31 more