Home
/
Authors
/
Ross Overbeek

Author

Ross Overbeek

Other affiliations: University of Chicago

Bio: Ross Overbeek is an academic researcher from Argonne National Laboratory. The author has contributed to research in topics: Genome & Genome project. The author has an hindex of 52, co-authored 123 publications receiving 31415 citations. Previous affiliations of Ross Overbeek include University of Chicago.

Topics: Genome, Genome project, Gene, Logic programming, Ribosomal RNA ...read more

Papers published on a yearly basis

2021
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1981

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The RAST Server: Rapid Annotations using Subsystems Technology

[...]

Ramy K. Aziz¹, Ramy K. Aziz², Daniela Bartels³, Aaron A. Best⁴, Matthew DeJongh⁴, Terrence Disz⁵, Terrence Disz³, Robert Edwards⁵, Kevin Formsma⁴, Svetlana Gerdes, Elizabeth M. Glass⁵, Michael Kubal³, Folker Meyer⁵, Folker Meyer³, Gary J. Olsen⁶, Gary J. Olsen⁵, Robert Olson³, Robert Olson⁵, Andrei L. Osterman⁷, Ross Overbeek, Leslie Klis McNeil⁶, Daniel Paarmann³, Tobias Paczian³, Bruce Parrello, Gordon D. Pusch³, Claudia I. Reich⁶, Rick Stevens⁵, Rick Stevens³, Olga Vassieva, Veronika Vonstein, Andreas Wilke³, Olga Zagnitko - Show less +28 more•Institutions (7)

Cairo University¹, University of Tennessee Health Science Center², University of Chicago³, Hope College⁴, Argonne National Laboratory⁵, University of Illinois at Urbana–Champaign⁶, Sanford-Burnham Institute for Medical Research⁷

08 Feb 2008-BMC Genomics

TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.

...read moreread less

Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

...read moreread less

9,397 citations

Journal Article•DOI•

The rdp (ribosomal database project)

[...]

Bonnie L. Maidak¹, Gary J. Olsen¹, Niels Larsen², Ross Overbeek³, Michael J. McCaughey¹, Carl R. Woese¹ - Show less +2 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Michigan State University², Argonne National Laboratory³

01 Jan 1997-Nucleic Acids Research

TL;DR: The Ribosomal Database Project (RDP-II), previously described by Maidak et al. (2000), continued during the past year to add new rRNA sequences to the aligned data and to improve the analysis commands.

...read moreread less

Abstract: The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous FTP (rdp.life.uiuc.edu), electronic mail (server@rdp.life.uiuc.edu), gopher (rdpgopher.life.uiuc.edu) and WWW (http://rdpwww.life.uiuc.edu/ ). The electronic mail and WWW servers provide ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree.

...read moreread less

2,106 citations

Journal Article•DOI•

The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes

[...]

Ross Overbeek, Tadhg P. Begley¹, Ralph Butler², Jomuna V. Choudhuri³, Han-Yu Chuang⁴, Matthew P. Cohoon⁵, Valérie de Crécy-Lagard⁶, Naryttza N. Diaz³, Terry Disz⁵, Robert D. Edwards⁷, Robert D. Edwards⁸, Michael Fonstein, Ed D. Frank⁹, Svetlana Gerdes, Elizabeth M. Glass⁹, Alexander Goesmann³, Andrew C. Hanson⁶, Dirk Iwata-Reuyl¹⁰, Roy A. Jensen⁶, Neema Jamshidi⁴, Lutz Krause³, Michael Kubal⁵, Niels Bent Larsen, Burkhard Linke³, Alice C. McHardy³, Folker Meyer³, Heiko Neuweger³, Gary J. Olsen¹¹, Robert Olson⁵, Andrei L. Osterman⁸, Vasiliy A. Portnoy⁴, Gordon D. Pusch, Dmitry A. Rodionov¹², Christian Rückert³, Jason Steiner⁴, Rick Stevens⁹, Rick Stevens⁵, Ines Thiele⁴, Olga Vassieva, Yuzhen Ye⁸, Olga Zagnitko, Veronika Vonstein - Show less +38 more•Institutions (12)

Cornell University¹, Middle Tennessee State University², Bielefeld University³, University of California, San Diego⁴, University of Chicago⁵, University of Florida⁶, San Diego State University⁷, Sanford-Burnham Institute for Medical Research⁸, Argonne National Laboratory⁹, Portland State University¹⁰, University of Illinois at Urbana–Champaign¹¹, Russian Academy of Sciences¹²

01 Jan 2005-Nucleic Acids Research

TL;DR: The subsystem approach is described, the first release of the growing library of populated subsystems is offered, and the SEED is the first annotation environment that supports this model of annotation.

...read moreread less

Abstract: The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

...read moreread less

1,896 citations

Journal Article•DOI•

Insights into social insects from the genome of the honeybee Apis mellifera

[...]

George M. Weinstock¹, Gene E. Robinson², Richard A. Gibbs¹, Kim C. Worley¹ +225 more•Institutions (55)

26 Oct 2006-Nature

TL;DR: The genome sequence of the honeybee Apis mellifera is reported, suggesting a novel African origin for the species A. melliferA and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.

...read moreread less

Abstract: Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.

...read moreread less

1,673 citations

Journal Article•DOI•

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

[...]

Thomas Brettin¹, Thomas Brettin², James J. Davis², James J. Davis¹, Terry Disz, Robert Edwards², Robert Edwards³, Svetlana Gerdes², Gary J. Olsen⁴, Robert Olson¹, Robert Olson², Ross Overbeek², Bruce Parrello², Gordon D. Pusch², Maulik Shukla⁵, James Thomason⁶, Rick Stevens², Rick Stevens¹, Veronika Vonstein², Alice R. Wattam⁵, Fangfang Xia¹, Fangfang Xia² - Show less +18 more•Institutions (6)

University of Chicago¹, Argonne National Laboratory², San Diego State University³, University of Illinois at Urbana–Champaign⁴, Virginia Tech⁵, Cold Spring Harbor Laboratory⁶

10 Feb 2015-Scientific Reports

TL;DR: The RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines and offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job.

...read moreread less

Abstract: The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

...read moreread less

1,666 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy

[...]

Qiong Wang, George M. Garrity¹, James M. Tiedje¹, James R. Cole•Institutions (1)

Michigan State University¹

15 Aug 2007-Applied and Environmental Microbiology

TL;DR: The RDP Classifier can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes, and the majority of the classification errors appear to be due to anomalies in the current taxonomies.

...read moreread less

Abstract: The Ribosomal Database Project (RDP) Classifier, a naive Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (≥95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/.

...read moreread less

16,048 citations

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

[...]

Kazutaka Katoh¹, Kazuharu Misawa, Kei-ichi Kuma¹, Takashi Miyata¹•Institutions (1)

Kyoto University¹

15 Jul 2002-Nucleic Acids Research

TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.

...read moreread less

Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

...read moreread less

12,003 citations

Journal Article•DOI•

UCHIME improves sensitivity and speed of chimera detection

[...]

Robert C. Edgar, Brian J. Haas¹, Jose C. Clemente¹, Christopher Quince¹, Rob Knight¹ - Show less +1 more•Institutions (1)

University of Colorado Boulder¹

01 Aug 2011-Bioinformatics

TL;DR: UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences, and in testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus.

...read moreread less

Abstract: Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Internal Transcribed Spacer) to assess diversity or compare populations. Undetected chimeras may be misinterpreted as novel species, causing inflated estimates of diversity and spurious inferences of differences between populations. Detection and removal of chimeras is therefore of critical importance in such experiments. Results: We describe UCHIME, a new program that detects chimeric sequences with two or more segments. UCHIME either uses a database of chimera-free sequences or detects chimeras de novo by exploiting abundance data. UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences. In testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus. UCHIME is >100× faster than Perseus and >1000× faster than ChimeraSlayer. Contact: [email protected] Availability: Source, binaries and data: http://drive5.com/uchime. Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

11,904 citations

Journal Article•DOI•

Prokka: Rapid Prokaryotic Genome Annotation

[...]

Torsten Seemann¹•Institutions (1)

Victorian Life Sciences Computation Initiative¹

15 Jul 2014-Bioinformatics

TL;DR: Prokka is introduced, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer, and produces standards-compliant output files for further analysis or viewing in genome browsers.

...read moreread less

Abstract: UNLABELLED: The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. AVAILABILITY AND IMPLEMENTATION: Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/.

...read moreread less

10,432 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse