Home
/
Authors
/
Rob Egan

Author

Rob Egan

Other affiliations: Joint Genome Institute

Bio: Rob Egan is an academic researcher from Lawrence Berkeley National Laboratory. The author has contributed to research in topics: Metagenomics & Genome. The author has an hindex of 10, co-authored 17 publications receiving 2570 citations. Previous affiliations of Rob Egan include Joint Genome Institute.

Topics: Metagenomics, Genome, Medicine, Data structure, Scalability ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities

[...]

Dongwan D. Kang¹, Jeff Froula¹, Jeff Froula², Rob Egan², Rob Egan¹, Zhong Wang¹ - Show less +2 more•Institutions (2)

Lawrence Berkeley National Laboratory¹, Joint Genome Institute²

27 Aug 2015-PeerJ

TL;DR: MetaBAT as mentioned in this paper integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning, and automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs.

...read moreread less

Abstract: Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

...read moreread less

1,406 citations

Journal Article•DOI•

Metagenomic discovery of biomass-degrading genes and genomes from cow rumen.

[...]

Matthias Hess¹, Matthias Hess², Alexander Sczyrba¹, Alexander Sczyrba², Rob Egan², Rob Egan¹, Tae-Wan Kim³, Harshal A. Chokhawala³, Gary P. Schroth⁴, Shujun Luo⁴, Douglas S. Clark³, Feng Chen², Feng Chen¹, Tao Zhang¹, Tao Zhang², Roderick I. Mackie⁵, Len A. Pennacchio¹, Len A. Pennacchio², Susannah G. Tringe², Susannah G. Tringe¹, Axel Visel², Axel Visel¹, Tanja Woyke¹, Tanja Woyke², Zhong Wang¹, Zhong Wang², Edward M. Rubin², Edward M. Rubin¹ - Show less +24 more•Institutions (5)

Lawrence Berkeley National Laboratory¹, Joint Genome Institute², University of California, Berkeley³, Illumina⁴, University of Illinois at Urbana–Champaign⁵

28 Jan 2011-Science

TL;DR: To characterize biomass-degrading genes and genomes, this work sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen and identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates.

...read moreread less

Abstract: The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes specialize in degradation of cellulosic plant material, but most members of this complex community resist cultivation. To characterize biomass-degrading genes and genomes, we sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen. From these data, we identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates. We also assembled 15 uncultured microbial genomes, which were validated by complementary methods including single-cell genome sequencing. These data sets provide a substantially expanded catalog of genes and genomes participating in the deconstruction of cellulosic biomass.

...read moreread less

1,135 citations

Posted Content•DOI•

De novoIdentification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing

[...]

Marcus H. Stoiber¹, Josh Quick², Rob Egan¹, Eun Lee J¹, Susan E. Celniker¹, Robert K. Neely², Nicholas J. Loman², Len A. Pennacchio¹, James B. Brown¹ - Show less +5 more•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of Birmingham²

15 Dec 2016-bioRxiv

TL;DR: The first algorithm for the identification of modified nucleotides without the need for prior training data is presented along with the open source software implementation, nanoraw, which accurately assigns contiguous raw nanopore signal to genomic positions, enabling novel data visualization and increasing power and accuracy for the discovery of covalently modified bases in native DNA.

...read moreread less

Abstract: Advances in nanopore sequencing technology have enabled investigation of the full catalogue of covalent DNA modifications. We present the first algorithm for the identification of modified nucleotides without the need for prior training data along with the open source software implementation, nanoraw. Nanoraw accurately assigns contiguous raw nanopore signal to genomic positions, enabling novel data visualization, and increasing power and accuracy for the discovery of covalently modified bases in native DNA. Ground truth case studies utilizing synthetically methylated DNA show the capacity to identify three distinct methylation marks, 4mC, 5mC, and 6mA, in seven distinct sequence contexts without any changes to the algorithm. We demonstrate quantitative reproducibility simultaneously identifying 5mC and 6mA in native E. coli across biological replicates processed in different labs. Finally we propose a pipeline for the comprehensive discovery of DNA modifications in any genome without a priori knowledge of their chemical identities.

...read moreread less

217 citations

Journal Article•DOI•

ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies

[...]

Scott Clark¹, Rob Egan¹, Peter I. Frazier¹, Zhong Wang¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

15 Feb 2013-Bioinformatics

TL;DR: The ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.

...read moreread less

Abstract: Motivation: Researchers need general purpose methods for objectively evaluating the accuracy of single and metagenome assemblies and for automatically detecting any errors they may contain. Current methods do not fully meet this need because they require a reference, only consider one of the many aspects of assembly quality or lack statistical justification, and none are designed to evaluate metagenome assemblies. Results: In this article, we present an Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences’ own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process. Availability: ALE is released as open source software under the UoI/ NCSA license at http://www.alescore.org. It is implemented in C and Python.

...read moreread less

165 citations

Proceedings Article•DOI•

HipMer: an extreme-scale de novo genome assembler

[...]

Evangelos Georganas¹, Aydin Buluc¹, Jarrod Chapman¹, Steven Hofmeyr¹, Chaitanya Aluru², Rob Egan¹, Leonid Oliker¹, Daniel S. Rokhsar¹, Katherine Yelick¹ - Show less +5 more•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

15 Nov 2015

TL;DR: HipMer is presented, the first high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code, and significantly improves scalability of parallel k-mer analysis for complex repetitive genomes that exhibit skewed frequency distributions.

...read moreread less

Abstract: De novo whole genome assembly reconstructs genomic sequences from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMer, the first high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. First, we significantly improve scalability of parallel k-mer analysis for complex repetitive genomes that exhibit skewed frequency distributions. Next, we optimize the traversal of the de Bruijn graph of k-mers by employing a novel communication-avoiding parallel algorithm in a variety of use-case scenarios. Finally, we parallelize the Meraculous scaffolding modules by leveraging the one-sided communication capabilities of the Unified Parallel C while effectively mitigating load imbalance. Large-scale results on a Cray XC30 using grand-challenge genomes demonstrate efficient performance and scalability on thousands of cores. Overall, our pipeline accelerates Meraculous performance by orders of magnitude, enabling the complete assembly of the human genome in just 8.4 minutes on 15K cores of the Cray XC30, and creating unprecedented capability for extreme-scale genomic analysis.

...read moreread less

79 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

[...]

Felipe A. Simão¹, Robert M. Waterhouse¹, Panagiotis Ioannidis¹, Evgenia V. Kriventseva¹, Evgeny M. Zdobnov¹ - Show less +1 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Oct 2015-Bioinformatics

TL;DR: Zdobnov et al. as discussed by the authors proposed a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content, and implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs.

...read moreread less

Abstract: Motivation Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. Results We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. Availability and implementation Software implemented in Python and datasets available for download from http://busco.ezlab.org. Contact evgeny.zdobnov@unige.ch Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

7,747 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs

[...]

Felipe A. Simão

09 Jan 2016

TL;DR: This work proposes a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content, implemented in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO.

...read moreread less

Abstract: MOTIVATION Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. RESULTS We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. AVAILABILITY AND IMPLEMENTATION Software implemented in Python and datasets available for download from http://busco.ezlab.org. CONTACT evgeny.zdobnov@unige.ch SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

...read moreread less

4,036 citations

Journal Article•DOI•

MetaSPAdes: A new versatile metagenomic assembler

[...]

Sergey Nurk¹, Dmitry Meleshko¹, Anton Korobeynikov¹, Pavel A. Pevzner², Pavel A. Pevzner¹ - Show less +1 more•Institutions (2)

Saint Petersburg State University¹, University of California, San Diego²

01 May 2017-Genome Research

TL;DR: MetaSPAdes as mentioned in this paper addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes.

...read moreread less

Abstract: While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.

...read moreread less

2,295 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse