Home
/
Authors
/
Inna Dubchak

Author

Inna Dubchak

Other affiliations: Joint Genome Institute, United States Department of Energy, Massachusetts Institute of Technology

Bio: Inna Dubchak is an academic researcher from Lawrence Berkeley National Laboratory. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 64, co-authored 122 publications receiving 41115 citations. Previous affiliations of Inna Dubchak include Joint Genome Institute & United States Department of Energy.

Topics: Genome, Gene, Comparative genomics, Genomics, Alternative splicing ...read more

Papers published on a yearly basis

2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and comparative analysis of the mouse genome.

[...]

Robert H. Waterston¹, Kerstin Lindblad-Toh², Ewan Birney, Jane Rogers³ +219 more•Institutions (26)

05 Dec 2002-Nature

TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.

...read moreread less

Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

...read moreread less

6,643 citations

Journal Article•DOI•

The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)

[...]

Gerald A. Tuskan¹, Gerald A. Tuskan², Stephen P. DiFazio¹, Stephen P. DiFazio³, Stefan Jansson⁴, Joerg Bohlmann⁵, Igor V. Grigoriev⁶, Uffe Hellsten⁶, Nicholas H. Putnam⁶, Steven G. Ralph⁵, Stephane Rombauts⁷, Asaf Salamov⁶, Jacquie Schein, Lieven Sterck⁷, Andrea Aerts⁶, Rishikeshi Bhalerao⁴, Rishikesh P. Bhalerao⁸, Damien Blaudez⁹, Wout Boerjan⁷, Annick Brun⁹, Amy M. Brunner¹⁰, Victor Busov¹¹, Malcolm M. Campbell¹², John E. Carlson¹³, Michel Chalot⁹, Jarrod Chapman⁶, G.-L. Chen¹, Dawn Cooper⁵, Pedro M. Coutinho¹⁴, Jérémy Couturier⁹, Sarah F. Covert¹⁵, Quentin C. B. Cronk⁵, R. Cunningham¹, John M. Davis¹⁶, Sven Degroeve⁷, Annabelle Déjardin⁹, Claude W. dePamphilis¹³, John C. Detter⁶, Bill Dirks¹⁷, Inna Dubchak¹⁸, Inna Dubchak⁶, Sébastien Duplessis⁹, Jürgen Ehlting⁵, Brian E. Ellis⁵, Karla C Gendler¹⁹, David Goodstein⁶, Michael Gribskov²⁰, Jane Grimwood²¹, Andrew Groover²², Lee E. Gunter¹, Björn Hamberger⁵, Berthold Heinze, Yrjö Helariutta²³, Yrjö Helariutta⁸, Yrjö Helariutta²⁴, Bernard Henrissat¹⁴, D. Holligan¹⁵, Robert A. Holt, Wenyu Huang⁶, N. Islam-Faridi²², Steven J.M. Jones, M. Jones-Rhoades²⁵, Richard A. Jorgensen¹⁹, Chandrashekhar P. Joshi¹¹, Jaakko Kangasjärvi²⁴, Jan Karlsson⁴, Colin T. Kelleher⁵, Robert Kirkpatrick, Matias Kirst¹⁶, Annegret Kohler⁹, Udaya C. Kalluri¹, Frank W. Larimer¹, Jim Leebens-Mack¹⁵, Jean-Charles Leplé⁹, Philip F. LoCascio¹, Y. Lou⁶, Susan Lucas⁶, Francis Martin⁹, Barbara Montanini⁹, Carolyn A. Napoli¹⁹, David R. Nelson²⁶, C D Nelson²², Kaisa Nieminen²⁴, Ove Nilsson⁸, V. Pereda⁹, Gary F. Peter¹⁶, Ryan N. Philippe⁵, Gilles Pilate⁹, Alexander Poliakov¹⁸, J. Razumovskaya¹, Paul G. Richardson⁶, Cécile Rinaldi⁹, Kermit Ritland⁵, Pierre Rouzé⁷, D. Ryaboy¹⁸, Jeremy Schmutz²¹, J. Schrader²⁷, Bo Segerman⁴, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry⁶, Chung-Jui Tsai¹¹, Edward C. Uberbacher¹, Per Unneberg, Jorma Vahala²⁴, Kerr Wall¹³, Susan R. Wessler¹⁵, Guojun Yang¹⁵, T. Yin¹, Carl J. Douglas⁵, Marco A. Marra, Göran Sandberg⁸, Y. Van de Peer⁷, Daniel S. Rokhsar¹⁷, Daniel S. Rokhsar⁶ - Show less +112 more•Institutions (27)

Oak Ridge National Laboratory¹, University of Tennessee², West Virginia University³, Umeå University⁴, University of British Columbia⁵, United States Department of Energy⁶, Ghent University⁷, Swedish University of Agricultural Sciences⁸, Institut national de la recherche agronomique⁹, Virginia Tech¹⁰, Michigan Technological University¹¹, University of Toronto¹², Pennsylvania State University¹³, University of Provence¹⁴, University of Georgia¹⁵, University of Florida¹⁶, University of California, Berkeley¹⁷, Lawrence Berkeley National Laboratory¹⁸, University of Arizona¹⁹, Purdue University²⁰, Stanford University²¹, United States Department of Agriculture²², University of Turku²³, University of Helsinki²⁴, Massachusetts Institute of Technology²⁵, University of Tennessee Health Science Center²⁶, University of Tübingen²⁷

15 Sep 2006-Science

TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.

...read moreread less

Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

...read moreread less

4,025 citations

Journal Article•DOI•

The Sorghum bicolor genome and the diversification of grasses

[...]

Andrew H. Paterson¹, John E. Bowers¹, Rémy Bruggmann², Inna Dubchak³, Jane Grimwood⁴, Heidrun Gundlach, Georg Haberer, Uffe Hellsten³, Therese Mitros⁵, Alexander Poliakov³, Jeremy Schmutz⁴, Manuel Spannagl, Haibao Tang¹, Xiyin Wang⁶, Xiyin Wang¹, Thomas Wicker⁷, Arvind K. Bharti², Jarrod Chapman³, F. Alex Feltus¹, F. Alex Feltus⁸, Udo Gowik⁹, Igor V. Grigoriev³, Eric Lyons⁵, Christopher G. Maher¹⁰, Mihaela Martis, Apurva Narechania¹⁰, Robert Otillar³, Bryan W. Penning¹¹, Asaf Salamov³, Yu Wang, Lifang Zhang¹⁰, Nicholas C. Carpita¹¹, Michael Freeling⁵, Alan R. Gingle¹, C. Thomas Hash¹², Beat Keller⁷, Patricia E. Klein¹³, Stephen Kresovich¹⁴, Maureen C. McCann¹¹, Ray Ming¹⁵, Daniel G. Peterson¹, Daniel G. Peterson¹⁶, Mehboob-ur-Rahman¹, Mehboob-ur-Rahman¹⁷, Doreen Ware¹⁸, Doreen Ware¹⁰, Peter Westhoff⁹, Klaus F. X. Mayer, Joachim Messing², Daniel S. Rokhsar⁴, Daniel S. Rokhsar³ - Show less +47 more•Institutions (18)

University of Georgia¹, Rutgers University², United States Department of Energy³, Stanford University⁴, University of California, Berkeley⁵, North China University of Science and Technology⁶, University of Zurich⁷, Clemson University⁸, University of Düsseldorf⁹, Cold Spring Harbor Laboratory¹⁰, Purdue University¹¹, International Crops Research Institute for the Semi-Arid Tropics¹², Texas A&M University¹³, Cornell University¹⁴, University of Illinois at Urbana–Champaign¹⁵, Mississippi State University¹⁶, National Institute for Biotechnology and Genetic Engineering¹⁷, United States Department of Agriculture¹⁸

29 Jan 2009-Nature

TL;DR: An initial analysis of the ∼730-megabase Sorghum bicolor (L.) Moench genome is presented, placing ∼98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information.

...read moreread less

Abstract: Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.

...read moreread less

2,809 citations

Journal Article•DOI•

The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions

[...]

Sabeeha S. Merchant¹, Simon E. Prochnik², Olivier Vallon³, Elizabeth H. Harris⁴, Steven J. Karpowicz¹, George B. Witman⁵, Astrid Terry², Asaf Salamov², Lillian K. Fritz-Laylin⁶, Laurence Maréchal-Drouard⁷, Wallace F. Marshall⁸, Liang-Hu Qu⁹, David R. Nelson¹⁰, Anton A. Sanderfoot¹¹, Martin H. Spalding¹², Vladimir V. Kapitonov¹³, Qinghu Ren, Patrick J. Ferris¹⁴, Erika Lindquist², Harris Shapiro², Susan Lucas², Jane Grimwood¹⁵, Jeremy Schmutz¹⁵, Pierre Cardol³, Pierre Cardol¹⁶, Heriberto Cerutti¹⁷, Guillaume Chanfreau¹, Chun-Long Chen⁹, Valérie Cognat⁷, Martin T. Croft¹⁸, Rachel M. Dent⁶, Susan K. Dutcher¹⁹, Emilio Fernández²⁰, Hideya Fukuzawa²¹, David González-Ballester²², Diego González-Halphen²³, Armin Hallmann, Marc Hanikenne¹⁶, Michael Hippler²⁴, William Inwood⁶, Kamel Jabbari²⁵, Ming Kalanon²⁶, Richard Kuras³, Paul A. Lefebvre¹¹, Stéphane D. Lemaire²⁷, Alexey V. Lobanov¹⁷, Martin Lohr²⁸, Andrea L Manuell²⁹, Iris Meier³⁰, Laurens Mets³¹, Maria Mittag³², Telsa M. Mittelmeier³³, James V. Moroney³⁴, Jeffrey L. Moseley²², Carolyn A. Napoli³³, Aurora M. Nedelcu³⁵, Krishna K. Niyogi⁶, Sergey V. Novoselov¹⁷, Ian T. Paulsen, Greg Pazour⁵, Saul Purton³⁶, Jean-Philippe Ral⁷, Diego Mauricio Riaño-Pachón³⁷, Wayne R. Riekhof, Linda A. Rymarquis³⁸, Michael Schroda, David B. Stern³⁹, James G. Umen¹⁴, Robert D. Willows⁴⁰, Nedra F. Wilson⁴¹, Sara L. Zimmer³⁹, Jens Allmer⁴², Janneke Balk¹⁸, Katerina Bisova⁴³, Chong-Jian Chen⁹, Marek Eliáš⁴⁴, Karla C Gendler³³, Charles R. Hauser⁴⁵, Mary Rose Lamb⁴⁶, Heidi K. Ledford⁶, Joanne C. Long¹, Jun Minagawa⁴⁷, M. Dudley Page¹, Junmin Pan⁴⁸, Wirulda Pootakham²², Sanja Roje⁴⁹, Annkatrin Rose⁵⁰, Eric Stahlberg³⁰, Aimee M. Terauchi¹, Pinfen Yang⁵¹, Steven G. Ball⁷, Chris Bowler²⁵, Carol L. Dieckmann³³, Vadim N. Gladyshev¹⁷, Pamela J. Green³⁸, Richard A. Jorgensen³³, Stephen P. Mayfield²⁹, Bernd Mueller-Roeber³⁷, Sathish Rajamani³⁰, Richard T. Sayre³⁰, Peter Brokstein², Inna Dubchak², David Goodstein², Leila Hornick², Y. Wayne Huang², Jinal Jhaveri², Yigong Luo², Diego Martinez², Wing Chi Abby Ngau², Bobby Otillar², Alexander Poliakov², Aaron Porter², Lukasz Szajkowski², Gregory Werner², Kemin Zhou², Igor V. Grigoriev², Daniel S. Rokhsar², Daniel S. Rokhsar⁶, Arthur R. Grossman²² - Show less +115 more•Institutions (51)

University of California, Los Angeles¹, United States Department of Energy², University of Paris³, Duke University⁴, University of Massachusetts Medical School⁵, University of California, Berkeley⁶, Centre national de la recherche scientifique⁷, University of California, San Francisco⁸, Sun Yat-sen University⁹, University of Tennessee Health Science Center¹⁰, University of Minnesota¹¹, Iowa State University¹², Genetic Information Research Institute¹³, Salk Institute for Biological Studies¹⁴, Stanford University¹⁵, University of Liège¹⁶, University of Nebraska–Lincoln¹⁷, University of Cambridge¹⁸, Washington University in St. Louis¹⁹, University of Córdoba (Spain)²⁰, Kyoto University²¹, Carnegie Institution for Science²², National Autonomous University of Mexico²³, University of Münster²⁴, École Normale Supérieure²⁵, University of Melbourne²⁶, University of Paris-Sud²⁷, University of Mainz²⁸, Scripps Research Institute²⁹, Ohio State University³⁰, University of Chicago³¹, University of Jena³², University of Arizona³³, Louisiana State University³⁴, University of New Brunswick³⁵, University College London³⁶, University of Potsdam³⁷, Delaware Biotechnology Institute³⁸, Boyce Thompson Institute for Plant Research³⁹, Macquarie University⁴⁰, Oklahoma State University Center for Health Sciences⁴¹, İzmir University of Economics⁴², Academy of Sciences of the Czech Republic⁴³, Charles University in Prague⁴⁴, St. Edward's University⁴⁵, University of Puget Sound⁴⁶, Hokkaido University⁴⁷, Tsinghua University⁴⁸, Washington State University⁴⁹, Appalachian State University⁵⁰, Marquette University⁵¹

12 Oct 2007-Science

TL;DR: Analyses of the Chlamydomonas genome advance the understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

...read moreread less

Abstract: Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

...read moreread less

2,554 citations

Journal Article•DOI•

VISTA: computational tools for comparative genomics

[...]

Kelly A. Frazer, Lior Pachter, Alexander Poliakov, Edward M. Rubin, Inna Dubchak¹ - Show less +1 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Jul 2004-Nucleic Acids Research

TL;DR: The VISTA family of tools created to assist biologists in carrying out comparative analysis of DNA sequences is described and capabilities of the site are illustrated by the analysis of a 180 kb interval on human chromosome 5 that encodes for the kinesin family member 3A (KIF3A) protein.

...read moreread less

Abstract: Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/vista/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, to submit their own sequences of interest to several VISTA servers for various types of comparative analysis and to obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kb interval on human chromosome 5 that encodes for the kinesin family member 3A (KIF3A) protein.

...read moreread less

1,986 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Collapse

Cited by

PDF

Open Access

More filters

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

BLAST+: architecture and applications.

[...]

Christiam Camacho¹, George Coulouris¹, Vahram Avagyan¹, Ning Ma¹, Jason S. Papadopoulos¹, Kevin Bealer¹, Thomas L. Madden¹ - Show less +3 more•Institutions (1)

National Institutes of Health¹

15 Dec 2009-BMC Bioinformatics

TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.

...read moreread less

Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

...read moreread less

13,223 citations

Journal Article•DOI•

Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets

[...]

Benjamin P. Lewis¹, Christopher B. Burge¹, David P. Bartel¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Jan 2005-Cell

TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

...read moreread less

11,624 citations

Journal Article•DOI•

Integrative genomics viewer

[...]

James T. Robinson¹, Helga Thorvaldsdottir¹, Wendy Winckler¹, Mitchell Guttman¹, Eric S. Lander¹, Eric S. Lander², Gad Getz¹, Jill P. Mesirov¹ - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

10 Jan 2011-Nature Biotechnology

TL;DR: In this article, the authors present an approach for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.

...read moreread less

Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

...read moreread less

10,798 citations

Journal Article•DOI•

The RAST Server: Rapid Annotations using Subsystems Technology

[...]

Ramy K. Aziz¹, Ramy K. Aziz², Daniela Bartels³, Aaron A. Best⁴, Matthew DeJongh⁴, Terrence Disz⁵, Terrence Disz³, Robert Edwards⁵, Kevin Formsma⁴, Svetlana Gerdes, Elizabeth M. Glass⁵, Michael Kubal³, Folker Meyer³, Folker Meyer⁵, Gary J. Olsen⁶, Gary J. Olsen⁵, Robert Olson⁵, Robert Olson³, Andrei L. Osterman⁷, Ross Overbeek, Leslie Klis McNeil⁶, Daniel Paarmann³, Tobias Paczian³, Bruce Parrello, Gordon D. Pusch³, Claudia I. Reich⁶, Rick Stevens⁵, Rick Stevens³, Olga Vassieva, Veronika Vonstein, Andreas Wilke³, Olga Zagnitko - Show less +28 more•Institutions (7)

Cairo University¹, University of Tennessee Health Science Center², University of Chicago³, Hope College⁴, Argonne National Laboratory⁵, University of Illinois at Urbana–Champaign⁶, Sanford-Burnham Institute for Medical Research⁷

08 Feb 2008-BMC Genomics

TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.

...read moreread less

Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

...read moreread less

9,397 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse