Home
/
Authors
/
Lin Fan

Author

Lin Fan

Bio: Lin Fan is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 7, co-authored 8 publications receiving 14429 citations. Previous affiliations of Lin Fan include Broad Institute.

Topics: Genome, Gene, Population, Transcription (biology), RNA ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour², Moran Yassour¹, Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh⁴, Kerstin Lindblad-Toh¹, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

...read moreread less

15,665 citations

Journal Article•DOI•

Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis

[...]

Andrea Pauli¹, Eivind Valen, Michael F. Lin, Manuel Garber, Nadine L. Vastenhouw, Joshua Z. Levin, Lin Fan, Albin Sandelin, John L. Rinn, Aviv Regev, Alexander F. Schier - Show less +7 more•Institutions (1)

Harvard University¹

01 Mar 2012-Genome Research

TL;DR: This study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.

...read moreread less

Abstract: Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.

...read moreread less

744 citations

Journal Article•DOI•

Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells

[...]

Michal Rabani¹, Joshua Z. Levin¹, Lin Fan¹, Xian Adiconis¹, Raktima Raychowdhury¹, Manuel Garber¹, Andreas Gnirke¹, Chad Nusbaum¹, Nir Hacohen¹, Nir Friedman², Ido Amit¹, Aviv Regev¹, Aviv Regev³ - Show less +9 more•Institutions (3)

Broad Institute¹, Hebrew University of Jerusalem², Massachusetts Institute of Technology³

01 May 2011-Nature Biotechnology

TL;DR: This study combines metabolic labeling of RNA at high temporal resolution with advanced RNA quantification and computational modeling to estimate RNA transcription and degradation rates during the response of mouse dendritic cells to lipopolysaccharide.

...read moreread less

Abstract: Cellular RNA levels are determined by the interplay of RNA production, processing and degradation. However, because most studies of RNA regulation do not distinguish the separate contributions of these processes, little is known about how they are temporally integrated. Here we combine metabolic labeling of RNA at high temporal resolution with advanced RNA quantification and computational modeling to estimate RNA transcription and degradation rates during the response of mouse dendritic cells to lipopolysaccharide. We find that changes in transcription rates determine the majority of temporal changes in RNA levels, but that changes in degradation rates are important for shaping sharp 'peaked' responses. We used sequencing of the newly transcribed RNA population to estimate temporally constant RNA processing and degradation rates genome wide. Degradation rates vary significantly between genes and contribute to the observed differences in the dynamic response. Certain transcripts, including those encoding cytokines and transcription factors, mature faster. Our study provides a quantitative approach to study the integrative process of RNA regulation.

...read moreread less

552 citations

Journal Article•DOI•

Comparative functional genomics of the fission yeasts

[...]

Nicholas Rhind¹, Zehua Chen², Moran Yassour², Moran Yassour³, Dawn Thompson², Brian J. Haas², Naomi Habib³, Ilan Wapinski⁴, Ilan Wapinski², Sushmita Roy², Michael F. Lin², David I. Heiman², Sarah Young², Kanji Furuya⁵, Yabin Guo⁶, Alison L. Pidoux⁷, Huei Mei Chen⁸, Barbara Robbertse⁹, Jonathan M. Goldberg², Keita Aoki⁵, Elizabeth H. Bayne⁷, Aaron M. Berlin², Christopher A. Desjardins², Edward Dobbs⁷, Livio Dukaj¹, Lin Fan², Michael Fitzgerald², Courtney French³, Sharvari Gujja², Klavs R. Hansen¹⁰, Daniel Keifenheim¹, Joshua Z. Levin², Rebecca A. Mosher¹¹, Carolin A. Müller¹², Jenna Pfiffner², Margaret Priest², Carsten Russ², Agata Smialowska¹³, Agata Smialowska¹⁴, Peter Swoboda¹³, Sean M. Sykes², Matthew W. Vaughn¹⁰, Sonya Vengrova¹⁵, Ryan J. Yoder⁹, Qiandong Zeng², Robin C. Allshire⁷, David C. Baulcombe¹¹, Bruce W. Birren², William Brown¹², Karl Ekwall¹³, Karl Ekwall¹⁴, Manolis Kellis², Janet Leatherwood⁸, Henry L. Levin⁶, Hanah Margalit³, Robert A. Martienssen¹⁰, Conrad A. Nieduszynski¹², Joseph W. Spatafora⁹, Nir Friedman³, Jacob Z. Dalgaard¹⁵, Peter Baumann¹⁶, Peter Baumann¹⁷, Peter Baumann¹⁸, Hironori Niki⁵, Aviv Regev¹⁶, Aviv Regev², Chad Nusbaum² - Show less +63 more•Institutions (18)

University of Massachusetts Medical School¹, Massachusetts Institute of Technology², Hebrew University of Jerusalem³, Harvard University⁴, National Institute of Genetics⁵, National Institutes of Health⁶, University of Edinburgh⁷, State University of New York System⁸, Oregon State University⁹, Cold Spring Harbor Laboratory¹⁰, University of Cambridge¹¹, University of Nottingham¹², Karolinska Institutet¹³, Södertörn University¹⁴, University of Warwick¹⁵, Howard Hughes Medical Institute¹⁶, University of Kansas¹⁷, Stowers Institute for Medical Research¹⁸

20 May 2011-Science

TL;DR: Differences in gene content and regulation explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source and provide tools for investigation across the Schizosaccharomyces clade.

...read moreread less

Abstract: The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.

...read moreread less

474 citations

Journal Article•DOI•

Genome analysis of three Pneumocystis species reveals adaptation mechanisms to life exclusively in mammalian hosts

[...]

Liang Ma¹, Zehua Chen², Da-Wei Huang³, Da-Wei Huang⁴, Geetha Kutty¹, Mayumi Ishihara⁵, Honghui Wang¹, Amr Abouelleil², Lisa R. Bishop¹, Emma Davey¹, Rebecca Deng¹, Xilong Deng¹, Lin Fan², Giovanna Fantoni¹, Michael Fitzgerald², Emile Gogineni¹, Jonathan M. Goldberg², Jonathan M. Goldberg⁶, Grace Handley¹, Xiaojun Hu⁴, Charles Huber¹, Xiaoli Jiao⁴, Kristine Jones⁴, Joshua Z. Levin², Yueqin Liu¹, Pendexter Macdonald², Alexandre Melnikov², Castle Raley⁴, Monica Sassi¹, Brad T. Sherman⁴, Xiaohong Song¹, Sean M. Sykes², Bao Tran⁴, Laura Walsh¹, Yun Xia¹, Jun Yang⁴, Sarah Young², Qiandong Zeng², Xin Zheng⁴, Robert M. Stephens⁴, Chad Nusbaum², Bruce W. Birren², Parastoo Azadi⁵, Richard A. Lempicki⁴, Christina A. Cuomo², Joseph A. Kovacs¹ - Show less +42 more•Institutions (6)

National Institutes of Health¹, Massachusetts Institute of Technology², Laboratory of Molecular Biology³, Leidos⁴, University of Georgia⁵, Harvard University⁶

22 Feb 2016-Nature Communications

TL;DR: The findings suggest that Pneumocystis has developed unique mechanisms of adaptation to life exclusively in mammalian hosts, including dependence on the lungs for gas and nutrients and highly efficient strategies to escape both host innate and acquired immune defenses.

...read moreread less

Abstract: Pneumocystis jirovecii is a major cause of life-threatening pneumonia in immunosuppressed patients including transplant recipients and those with HIV/AIDS, yet surprisingly little is known about the biology of this fungal pathogen. Here we report near complete genome assemblies for three Pneumocystis species that infect humans, rats and mice. Pneumocystis genomes are highly compact relative to other fungi, with substantial reductions of ribosomal RNA genes, transporters, transcription factors and many metabolic pathways, but contain expansions of surface proteins, especially a unique and complex surface glycoprotein superfamily, as well as proteases and RNA processing proteins. Unexpectedly, the key fungal cell wall components chitin and outer chain N-mannans are absent, based on genome content and experimental validation. Our findings suggest that Pneumocystis has developed unique mechanisms of adaptation to life exclusively in mammalian hosts, including dependence on the lungs for gas and nutrients and highly efficient strategies to escape both host innate and acquired immune defenses.

...read moreread less

128 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour¹, Moran Yassour², Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh¹, Kerstin Lindblad-Toh⁴, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

...read moreread less

15,665 citations

Journal Article•DOI•

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

[...]

Bo Li¹, Colin N. Dewey¹•Institutions (1)

University of Wisconsin-Madison¹

04 Aug 2011-BMC Bioinformatics

TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.

...read moreread less

Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

...read moreread less

14,524 citations

Journal Article•DOI•

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

[...]

Cole Trapnell¹, Adam Roberts², Loyal A. Goff³, Loyal A. Goff¹, Loyal A. Goff⁴, Geo Pertea⁵, Daehwan Kim⁶, Daehwan Kim⁷, David R. Kelley¹, David R. Kelley⁴, Harold Pimentel², Steven L. Salzberg⁵, John L. Rinn⁴, John L. Rinn¹, Lior Pachter² - Show less +11 more•Institutions (7)

Broad Institute¹, University of California, Berkeley², Massachusetts Institute of Technology³, Harvard University⁴, Johns Hopkins University⁵, University of Maryland, College Park⁶, Johns Hopkins University School of Medicine⁷

01 Mar 2012-Nature Protocols

TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

...read moreread less

Abstract: Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

...read moreread less

10,913 citations

Journal Article•DOI•

A new coronavirus associated with human respiratory disease in China.

[...]

Fan Wu¹, Su Zhao², Bin Yu³, Yan-Mei Chen¹, Wen Wang³, Zhi gang Song¹, Yi Hu², Zhao Wu Tao², Jun Hua Tian³, Yuan Yuan Pei¹, Ming Li Yuan², Yu Ling Zhang¹, Fa Hui Dai¹, Yi Liu¹, Qi Min Wang¹, Jiao Jiao Zheng¹, Lin Xu¹, Edward C. Holmes¹, Edward C. Holmes⁴, Yong-Zhen Zhang¹, Yong-Zhen Zhang³ - Show less +17 more•Institutions (4)

Fudan University¹, Huazhong University of Science and Technology², Centers for Disease Control and Prevention³, University of Sydney⁴

03 Feb 2020-Nature

TL;DR: Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China.

...read moreread less

Abstract: Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health1–3. Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China5. This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans. Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China.

...read moreread less

9,231 citations

Journal Article•DOI•

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

[...]

Mihaela Pertea¹, Geo Pertea¹, Corina Antonescu¹, Tsung Cheng Chang², Joshua T. Mendell², Steven L. Salzberg¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, University of Texas Southwestern Medical Center²

01 Mar 2015-Nature Biotechnology

TL;DR: StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts produces more complete and accurate reconstructions of genes and better estimates of expression levels.

...read moreread less

Abstract: Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

...read moreread less

6,594 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse