Home
/
Authors
/
Dawn Thompson

Author

Dawn Thompson

Bio: Dawn Thompson is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Genome & Trinucleotide repeat expansion. The author has an hindex of 10, co-authored 14 publications receiving 14621 citations. Previous affiliations of Dawn Thompson include Harvard University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour¹, Moran Yassour², Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh⁴, Kerstin Lindblad-Toh¹, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

...read moreread less

15,665 citations

Journal Article•DOI•

Comprehensive comparative analysis of strand-specific RNA sequencing methods

[...]

Joshua Z. Levin¹, Moran Yassour¹, Moran Yassour², Xian Adiconis¹, Chad Nusbaum¹, Dawn Thompson¹, Nir Friedman², Andreas Gnirke¹, Aviv Regev¹ - Show less +5 more•Institutions (2)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem²

01 Sep 2010-Nature Methods

TL;DR: In this paper, the authors developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method, using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark.

...read moreread less

Abstract: Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.

...read moreread less

714 citations

Comprehensive comparative analysis of strand-specific RNA sequencing methods

[...]

Joshua Z. Levin¹, Moran Yassour², Moran Yassour¹, Xian Adiconis¹, Chad Nusbaum¹, Dawn Thompson¹, Nir Friedman², Andreas Gnirke¹, Aviv Regev¹ - Show less +5 more•Institutions (2)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem²

01 Aug 2010

TL;DR: A comprehensive computational pipeline is developed to compare library quality metrics from any RNA-seq method and identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing.

...read moreread less

675 citations

Journal Article•DOI•

Comparative functional genomics of the fission yeasts

[...]

Nicholas Rhind¹, Zehua Chen², Moran Yassour³, Moran Yassour², Dawn Thompson², Brian J. Haas², Naomi Habib³, Ilan Wapinski², Ilan Wapinski⁴, Sushmita Roy², Michael F. Lin², David I. Heiman², Sarah Young², Kanji Furuya⁵, Yabin Guo⁶, Alison L. Pidoux⁷, Huei Mei Chen⁸, Barbara Robbertse⁹, Jonathan M. Goldberg², Keita Aoki⁵, Elizabeth H. Bayne⁷, Aaron M. Berlin², Christopher A. Desjardins², Edward Dobbs⁷, Livio Dukaj¹, Lin Fan², Michael Fitzgerald², Courtney French³, Sharvari Gujja², Klavs R. Hansen¹⁰, Daniel Keifenheim¹, Joshua Z. Levin², Rebecca A. Mosher¹¹, Carolin A. Müller¹², Jenna Pfiffner², Margaret Priest², Carsten Russ², Agata Smialowska¹³, Agata Smialowska¹⁴, Peter Swoboda¹⁴, Sean M. Sykes², Matthew W. Vaughn¹⁰, Sonya Vengrova¹⁵, Ryan J. Yoder⁹, Qiandong Zeng², Robin C. Allshire⁷, David C. Baulcombe¹¹, Bruce W. Birren², William Brown¹², Karl Ekwall¹³, Karl Ekwall¹⁴, Manolis Kellis², Janet Leatherwood⁸, Henry L. Levin⁶, Hanah Margalit³, Robert A. Martienssen¹⁰, Conrad A. Nieduszynski¹², Joseph W. Spatafora⁹, Nir Friedman³, Jacob Z. Dalgaard¹⁵, Peter Baumann¹⁶, Peter Baumann¹⁷, Peter Baumann¹⁸, Hironori Niki⁵, Aviv Regev¹⁶, Aviv Regev², Chad Nusbaum² - Show less +63 more•Institutions (18)

University of Massachusetts Medical School¹, Massachusetts Institute of Technology², Hebrew University of Jerusalem³, Harvard University⁴, National Institute of Genetics⁵, National Institutes of Health⁶, University of Edinburgh⁷, State University of New York System⁸, Oregon State University⁹, Cold Spring Harbor Laboratory¹⁰, University of Cambridge¹¹, University of Nottingham¹², Södertörn University¹³, Karolinska Institutet¹⁴, University of Warwick¹⁵, Howard Hughes Medical Institute¹⁶, Stowers Institute for Medical Research¹⁷, University of Kansas¹⁸

20 May 2011-Science

TL;DR: Differences in gene content and regulation explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source and provide tools for investigation across the Schizosaccharomyces clade.

...read moreread less

Abstract: The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.

...read moreread less

474 citations

Journal Article•DOI•

Ploidy Controls the Success of Mutators and Nature of Mutations during Budding Yeast Evolution

[...]

Dawn Thompson¹, Michael M. Desai¹, Andrew W. Murray¹•Institutions (1)

Harvard University¹

22 Aug 2006-Current Biology

TL;DR: It is concluded that the advantage of mutators depends on ploidy and that diploid mutators can give rise to beneficial mutations that are inaccessible to nonmutators and haploidmutators.

...read moreread less

100 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour¹, Moran Yassour², Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh¹, Kerstin Lindblad-Toh⁴, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

...read moreread less

15,665 citations

Journal Article•DOI•

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

[...]

Bo Li¹, Colin N. Dewey¹•Institutions (1)

University of Wisconsin-Madison¹

04 Aug 2011-BMC Bioinformatics

TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.

...read moreread less

Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

...read moreread less

14,524 citations

Journal Article•DOI•

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

[...]

Cole Trapnell¹, Adam Roberts², Loyal A. Goff³, Loyal A. Goff¹, Loyal A. Goff⁴, Geo Pertea⁵, Daehwan Kim⁶, Daehwan Kim⁷, David R. Kelley¹, David R. Kelley⁴, Harold Pimentel², Steven L. Salzberg⁵, John L. Rinn⁴, John L. Rinn¹, Lior Pachter² - Show less +11 more•Institutions (7)

Broad Institute¹, University of California, Berkeley², Massachusetts Institute of Technology³, Harvard University⁴, Johns Hopkins University⁵, University of Maryland, College Park⁶, Johns Hopkins University School of Medicine⁷

01 Mar 2012-Nature Protocols

TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

...read moreread less

Abstract: Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

...read moreread less

10,913 citations

Journal Article•DOI•

A new coronavirus associated with human respiratory disease in China.

[...]

Fan Wu¹, Su Zhao², Bin Yu³, Yan-Mei Chen¹, Wen Wang³, Zhi gang Song¹, Yi Hu², Zhao Wu Tao², Jun Hua Tian³, Yuan Yuan Pei¹, Ming Li Yuan², Yu Ling Zhang¹, Fa Hui Dai¹, Yi Liu¹, Qi Min Wang¹, Jiao Jiao Zheng¹, Lin Xu¹, Edward C. Holmes¹, Edward C. Holmes⁴, Yong-Zhen Zhang¹, Yong-Zhen Zhang³ - Show less +17 more•Institutions (4)

Fudan University¹, Huazhong University of Science and Technology², Centers for Disease Control and Prevention³, University of Sydney⁴

03 Feb 2020-Nature

TL;DR: Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China.

...read moreread less

Abstract: Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health1–3. Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China5. This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans. Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China.

...read moreread less

9,231 citations

Journal Article•DOI•

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

[...]

Mihaela Pertea¹, Geo Pertea¹, Corina Antonescu¹, Tsung Cheng Chang², Joshua T. Mendell², Steven L. Salzberg¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, University of Texas Southwestern Medical Center²

01 Mar 2015-Nature Biotechnology

TL;DR: StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts produces more complete and accurate reconstructions of genes and better estimates of expression levels.

...read moreread less

Abstract: Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

...read moreread less

6,594 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse