Home
/
Authors
/
Douglas B. Rusch

Author

Douglas B. Rusch

Other affiliations: J. Craig Venter Institute, Indiana University – Purdue University Indianapolis, Celera Corporation

Bio: Douglas B. Rusch is an academic researcher from Indiana University. The author has contributed to research in topics: Metagenomics & Genome. The author has an hindex of 38, co-authored 102 publications receiving 24768 citations. Previous affiliations of Douglas B. Rusch include J. Craig Venter Institute & Indiana University – Purdue University Indianapolis.

Topics: Metagenomics, Genome, Medicine, Quorum sensing, Metastasis ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2003
2002
2001
1998

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

The Genome Sequence of the Malaria Mosquito Anopheles gambiae

[...]

Robert A. Holt¹, G. Mani Subramanian¹, Aaron L. Halpern¹, Granger G. Sutton¹, Rosane Charlab¹, Deborah R. Nusskern¹, Patrick Wincker², Andrew G. Clark³, José M. C. Ribeiro⁴, Ron Wides⁵, Steven L. Salzberg⁶, Brendan J. Loftus⁶, Mark Yandell¹, William H. Majoros⁶, William H. Majoros¹, Douglas B. Rusch¹, Zhongwu Lai¹, Cheryl L. Kraft¹, Josep F. Abril, Véronique Anthouard², Peter Arensburger⁷, Peter W. Atkinson⁷, Holly Baden¹, Véronique de Berardinis², Danita Baldwin¹, Vladimir Benes, Jim Biedler⁸, Claudia Blass, Randall Bolanos¹, Didier Boscus², Mary Barnstead¹, Shuang Cai¹, Kabir Chatuverdi¹, George K. Christophides, Mathew A. Chrystal⁹, Michele Clamp¹⁰, Anibal Cravchik¹, Val Curwen¹⁰, Ali N Dana⁹, Arthur L. Delcher¹, Ian M. Dew¹, Cheryl A. Evans¹, Michael Flanigan¹, Anne Grundschober-Freimoser¹¹, Lisa Friedli⁷, Zhiping Gu¹, Ping Guan¹, Roderic Guigó, Maureen E. Hillenmeyer⁹, Susanne L. Hladun¹, James R. Hogan⁹, Young S. Hong⁹, Jeffrey Hoover¹, Olivier Jaillon², Zhaoxi Ke¹, Zhaoxi Ke⁹, Chinnappa D. Kodira¹, Kokoza Eb, Anastasios C. Koutsos¹², Ivica Letunic, Alex Levitsky¹, Yong Liang¹, Jhy-Jhu Lin⁶, Jhy-Jhu Lin¹, Neil F. Lobo⁹, John Lopez¹, Joel A. Malek⁶, Tina C. McIntosh¹, Stephan Meister, Jason R. Miller¹, Clark M. Mobarry¹, Emmanuel Mongin¹³, Sean D. Murphy¹, David A. O'Brochta¹¹, Cynthia Pfannkoch¹, Rong Qi¹, Megan A. Regier¹, Karin A. Remington¹, Hongguang Shao⁸, Maria V. Sharakhova⁹, Cynthia Sitter¹, Jyoti Shetty⁶, Thomas J. Smith¹, Renee Strong¹, Jingtao Sun¹, Dana Thomasova, Lucas Q. Ton⁹, Pantelis Topalis¹², Zhijian Tu⁸, Maria F. Unger⁹, Brian P. Walenz¹, Aihui Wang¹, Jian Wang¹, Mei Wang¹, X. Wang⁹, Kerry J. Woodford¹, Jennifer R. Wortman¹, Jennifer R. Wortman⁶, Martin Wu⁶, Alison Yao¹, Evgeny M. Zdobnov, Hongyu Zhang¹, Qi Zhao¹, Shaying Zhao⁶, Shiaoping C. Zhu¹, Igor F. Zhimulev, Mario Coluzzi¹⁴, Alessandra della Torre¹⁴, Charles Roth¹⁵, Christos Louis¹², Francis Kalush¹, Richard J. Mural¹, Eugene W. Myers¹, Mark Raymond Adams¹, Hamilton O. Smith¹, Samuel Broder¹, Malcolm J. Gardner⁶, Claire M. Fraser⁶, Ewan Birney¹³, Peer Bork, Paul T. Brey¹⁵, J. Craig Venter¹, J. Craig Venter⁶, Jean Weissenbach², Fotis C. Kafatos, Frank H. Collins⁹, Stephen L. Hoffman¹ - Show less +123 more•Institutions (15)

Celera Corporation¹, Centre national de la recherche scientifique², Cornell University³, National Institutes of Health⁴, Bar-Ilan University⁵, TigerLogic⁶, University of California, Riverside⁷, Virginia Tech⁸, University of Notre Dame⁹, Wellcome Trust Sanger Institute¹⁰, University of Maryland Biotechnology Institute¹¹, University of Crete¹², European Bioinformatics Institute¹³, Sapienza University of Rome¹⁴, Pasteur Institute¹⁵

04 Oct 2002-Science

TL;DR: Analysis of the PEST strain of A. gambiae revealed strong evidence for about 14,000 protein-encoding transcripts, and prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted.

...read moreread less

Abstract: Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

...read moreread less

2,033 citations

Journal Article•DOI•

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific

[...]

Douglas B. Rusch¹, Aaron L. Halpern¹, Granger G. Sutton¹, Karla B. Heidelberg², Karla B. Heidelberg¹, Shannon J. Williamson¹, Shibu Yooseph¹, Dongying Wu³, Dongying Wu¹, Jonathan A. Eisen¹, Jonathan A. Eisen³, Jeff Hoffman¹, Karin A. Remington¹, Karen Beeson¹, Bao Duc Tran¹, Hamilton O. Smith¹, Holly Baden-Tillson¹, Clare Stewart¹, Joyce Thorpe¹, Jason Freeman¹, Cynthia Andrews-Pfannkoch¹, Joseph E. Venter¹, Kelvin Li¹, Saul A. Kravitz¹, John F. Heidelberg², John F. Heidelberg¹, T. Utterback¹, Yu-Hui Rogers¹, Luisa I. Falcón⁴, Valeria Souza⁴, Germán Bonilla-Rosso⁴, Luis E. Eguiarte⁴, David M. Karl⁵, Shubha Sathyendranath⁶, Trevor Platt⁶, Eldredge Bermingham⁷, Victor A. Gallardo⁸, Giselle Tamayo-Castillo⁹, Michael Ferrari¹⁰, Robert L. Strausberg¹, Kenneth H. Nealson², Kenneth H. Nealson¹, Robert Friedman¹, Marvin Frazier¹, J. Craig Venter¹ - Show less +41 more•Institutions (10)

J. Craig Venter Institute¹, University of Southern California², University of California, Davis³, National Autonomous University of Mexico⁴, University of Hawaii⁵, Bedford Institute of Oceanography⁶, Smithsonian Tropical Research Institute⁷, University of Concepción⁸, University of Costa Rica⁹, Rutgers University¹⁰

13 Mar 2007-PLOS Biology

TL;DR: A metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition, which yielded an extensive dataset consisting of 7.7 million sequencing reads.

...read moreread less

Abstract: The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.

...read moreread less

1,982 citations

Journal Article•DOI•

The Sequence of the Human Genome

[...]

J. Craig Venter¹, Mark Raymond Adams, Eugene W. Myers, Peter W. Li +269 more•Institutions (1)

01 Sep 2015-Clinical Chemistry

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

1,674 citations

Journal Article•DOI•

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

[...]

Brian J. Haas¹, Arthur L. Delcher², Stephen M. Mount³, Jennifer R. Wortman², Roger Smith², Linda Hannick², Rama Maiti², Catherine M. Ronning², Douglas B. Rusch, Christopher D. Town², Steven L. Salzberg², Owen White² - Show less +8 more•Institutions (3)

TigerLogic¹, J. Craig Venter Institute², University of Maryland, College Park³

01 Oct 2003-Nucleic Acids Research

TL;DR: The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

...read moreread less

Abstract: The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ~27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

...read moreread less

1,441 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

[...]

Anton Bankevich¹, Sergey Nurk, Dmitry Antipov, Alexey Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, Pavel A. Pevzner - Show less +12 more•Institutions (1)

Saint Petersburg Academic University¹

07 May 2012-Journal of Computational Biology

TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

...read moreread less

16,859 citations

Journal Article•DOI•

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

[...]

Cole Trapnell¹, Cole Trapnell², Brian A. Williams³, Geo Pertea¹, Ali Mortazavi³, Gordon Kwan³, Marijke J. van Baren⁴, Steven L. Salzberg¹, Barbara J. Wold³, Lior Pachter² - Show less +6 more•Institutions (4)

University of Maryland, College Park¹, University of California, Berkeley², California Institute of Technology³, Washington University in St. Louis⁴

01 May 2010-Nature Biotechnology

TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

...read moreread less

Abstract: High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

...read moreread less

13,337 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse