Home
/
Authors
/
Chenghai Xue

Author

Chenghai Xue

Bio: Chenghai Xue is an academic researcher from Cold Spring Harbor Laboratory. The author has contributed to research in topics: Comparative genomics & Genome. The author has an hindex of 4, co-authored 4 publications receiving 7311 citations.

Topics: Comparative genomics, Genome, Human genome, ENCODE, Gene ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Landscape of transcription in human cells

[...]

Sarah Djebali, Carrie A. Davis¹, Angelika Merkel, Alexander Dobin¹, Timo Lassmann, Ali Mortazavi², Ali Mortazavi³, Andrea Tanzer, Julien Lagarde, Wei Lin¹, Felix Schlesinger¹, Chenghai Xue¹, Georgi K. Marinov³, Jainab Khatun⁴, Brian A. Williams³, Chris Zaleski¹, Joel Rozowsky⁵, Marion S. Röder, Felix Kokocinski⁶, Rehab F. Abdelhamid, Tyler Alioto, Igor Antoshechkin³, Michael T. Baer¹, Nadav Bar⁷, Philippe Batut¹, Kimberly Bell¹, Ian Bell⁸, Sudipto K. Chakrabortty¹, Xian Chen⁹, Jacqueline Chrast¹⁰, Joao Curado, Thomas Derrien, Jorg Drenkow¹, Erica Dumais⁸, Jacqueline Dumais⁸, Radha Duttagupta⁸, Emilie Falconnet¹¹, Meagan Fastuca¹, Kata Fejes-Toth¹, Pedro G. Ferreira, Sylvain Foissac⁸, Melissa J. Fullwood¹², Hui Gao⁸, David Gonzalez, Assaf Gordon¹, Harsha P. Gunawardena⁹, Cédric Howald¹⁰, Sonali Jha¹, Rory Johnson, Philipp Kapranov⁸, Brandon King³, Colin Kingswood, Oscar Junhong Luo¹², Eddie Park², Kimberly Persaud¹, Jonathan B. Preall¹, Paolo Ribeca, Brian A. Risk⁴, Daniel Robyr¹¹, Michael Sammeth, Lorian Schaffer³, Lei-Hoon See¹, Atif Shahab¹², Jørgen Skancke⁷, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner¹³, Diane Trout³, Nathalie Walters¹⁰, Huaien Wang¹, John A. Wrobel⁴, Yanbao Yu⁹, Xiaoan Ruan¹², Yoshihide Hayashizaki, Jennifer Harrow⁶, Mark Gerstein⁵, Tim Hubbard⁶, Alexandre Reymond¹⁰, Stylianos E. Antonarakis¹¹, Gregory J. Hannon¹, Morgan C. Giddings⁹, Morgan C. Giddings⁴, Yijun Ruan¹², Barbara J. Wold³, Piero Carninci, Roderic Guigó¹⁴, Thomas R. Gingeras⁸, Thomas R. Gingeras¹ - Show less +84 more•Institutions (14)

Cold Spring Harbor Laboratory¹, University of California, Irvine², California Institute of Technology³, Florida State University College of Arts and Sciences⁴, Yale University⁵, Wellcome Trust Sanger Institute⁶, Norwegian University of Science and Technology⁷, Affymetrix⁸, University of North Carolina at Chapel Hill⁹, University of Lausanne¹⁰, University of Geneva¹¹, Genome Institute of Singapore¹², Stanford University¹³, Pompeu Fabra University¹⁴

06 Sep 2012-Nature

TL;DR: Evidence that three-quarters of the human genome is capable of being transcribed is reported, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs that prompt a redefinition of the concept of a gene.

...read moreread less

Abstract: Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

...read moreread less

4,450 citations

An integrated encyclopedia of DNA elements in the human genome

[...]

Ian Dunham, Anshul Kundaje, Shelley Force Aldred, Patrick J. Collins +439 more

01 Sep 2012

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

2,767 citations

Journal Article•DOI•

An encyclopedia of mouse DNA elements (Mouse ENCODE)

[...]

John A. Stamatoyannopoulos¹, Michael Snyder², Ross C. Hardison³, Bing Ren⁴, Thomas R. Gingeras⁵, David M. Gilbert⁶, Mark Groudine⁷, M. A. Bender⁷, Rajinder Kaul¹, Theresa K. Canfield¹, Erica Giste¹, Audra K. Johnson¹, Mia Zhang⁷, Gayathri Balasundaram⁷, Rachel Byron⁷, Vaughan Roach¹, Peter J. Sabo¹, Richard Sandstrom¹, A Sandra Stehling¹, Robert E. Thurman¹, Sherman M. Weissman⁸, Philip Cayting⁸, Manoj Hariharan², Jin Lian⁸, Yong Cheng², Stephen G. Landt², Zhihai Ma², Barbara J. Wold⁹, Job Dekker¹⁰, Gregory E. Crawford¹¹, Cheryl A. Keller³, Weisheng Wu³, Christopher T. Morrissey³, Swathi Ashok Kumar³, Tejaswini Mishra³, Deepti Jain³, Marta Byrska-Bishop³, Daniel Blankenberg³, Bryan R. Lajoie², Gaurav Jain¹⁰, Amartya Sanyal¹⁰, Kaun-Bei Chen¹¹, Olgert Denas¹¹, James Taylor¹², Gerd A. Blobel¹³, Mitchell J. Weiss¹³, Max Pimkin¹³, Wulan Deng¹³, Georgi K. Marinov⁹, Brian A. Williams⁹, Katherine I. Fisher-Aylor⁹, Gilberto DeSalvo⁹, Anthony Kiralusha⁹, Diane Trout⁹, Henry Amrhein⁹, Ali Mortazavi¹⁴, Lee Edsall⁴, David McCleary⁴, Samantha Kuan⁴, Yin Shen⁴, Feng Yue⁴, Zhen Ye⁴, Carrie A. Davis⁵, Chris Zaleski⁵, Sonali Jha⁵, Chenghai Xue⁵, Alexander Dobin⁵, Wei Lin⁵, Meagan Fastuca⁵, Huaien Wang⁵, Roderic Guigó, Sarah Djebali, Julien Lagarde, Tyrone Ryba⁶, Takayo Sasaki⁶, Venkat S. Malladi¹⁵, Melissa S. Cline¹⁵, Vanessa M. Kirkup¹⁵, Katrina Learned¹⁵, Kate R. Rosenbloom¹⁵, W. James Kent¹⁵, Elise A. Feingold¹⁶, Peter J. Good¹⁶, Michael J. Pazin¹⁶, Rebecca F. Lowdon¹⁶, Leslie B Adams¹⁶ - Show less +82 more•Institutions (16)

University of Washington¹, Stanford University², Pennsylvania State University³, University of California, San Diego⁴, Cold Spring Harbor Laboratory⁵, Florida State University⁶, Fred Hutchinson Cancer Research Center⁷, Yale University⁸, California Institute of Technology⁹, University of Massachusetts Medical School¹⁰, Duke University¹¹, Emory University¹², Children's Hospital of Philadelphia¹³, University of California, Irvine¹⁴, University of California, Santa Cruz¹⁵, National Institutes of Health¹⁶

13 Aug 2012-Genome Biology

TL;DR: The Mouse E NCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome to enable a broad range of mouse genomics efforts.

...read moreread less

Abstract: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome

...read moreread less

445 citations

Journal Article•DOI•

Comparative analysis of the transcriptome across distant species

[...]

Mark Gerstein¹, Joel Rozowsky¹, Koon-Kiu Yan¹, Daifeng Wang¹, Chao Cheng², James B. Brown³, James B. Brown⁴, Carrie A. Davis⁵, LaDeana W. Hillier⁶, Cristina Sisu¹, Jingyi Jessica Li⁷, Jingyi Jessica Li³, Baikang Pei¹, Arif Harmanci¹, Michael O. Duff⁸, Sarah Djebali⁹, Roger P. Alexander¹, Burak H. Alver¹⁰, Raymond K. Auerbach¹, Kimberly Bell⁵, Peter J. Bickel³, Max E. Boeck⁶, Nathan Boley³, Nathan Boley⁴, Benjamin W. Booth⁴, Lucy Cherbas¹¹, Peter Cherbas¹¹, Chao Di¹², Alexander Dobin⁵, Jorg Drenkow⁵, Brent Ewing⁶, Gang Fang¹, Megan Fastuca⁵, Elise A. Feingold¹³, Adam Frankish¹⁴, Guanjun Gao¹², Peter J. Good¹³, Roderic Guigó⁹, Ann S. Hammonds⁴, Jen Harrow¹⁴, Roger A. Hoskins⁴, Cédric Howald¹⁵, Cédric Howald¹⁶, Long Hu¹², Haiyan Huang³, Tim Hubbard¹⁴, Tim Hubbard¹⁷, Chau Huynh⁶, Sonali Jha⁵, Dionna M. Kasper¹, Masaomi Kato¹, Thomas C. Kaufman¹¹, Robert R. Kitchen¹, Erik Ladewig¹⁸, Julien Lagarde⁹, Eric C. Lai¹⁸, Jing Leng¹, Zhi Lu¹², Michael J. MacCoss⁶, Gemma E. May⁸, Gemma E. May¹⁹, Rebecca McWhirter²⁰, Gennifer E. Merrihew⁶, David M. Miller²⁰, Ali Mortazavi²¹, Rabi Murad²¹, Brian Oliver¹³, Sara Olson⁸, Peter J. Park¹⁰, Michael J. Pazin¹³, Norbert Perrimon¹⁰, Norbert Perrimon²², Dmitri D. Pervouchine⁹, Valerie Reinke¹, Alexandre Reymond¹⁵, Garrett Robinson³, Anastasia Samsonova²², Anastasia Samsonova¹⁰, Gary Saunders²³, Gary Saunders¹⁴, Felix Schlesinger⁵, Anurag Sethi¹, Frank J. Slack¹, William C. Spencer²⁰, Marcus H. Stoiber⁴, Marcus H. Stoiber³, Pnina Strasbourger⁶, Andrea Tanzer⁹, Andrea Tanzer²⁴, Owen Thompson⁶, Kenneth H. Wan⁴, Guilin Wang¹, Huaien Wang⁵, Kathie L. Watkins²⁰, Jiayu Wen¹⁸, Kejia Wen¹², Chenghai Xue⁵, Li Yang⁸, Li Yang²⁵, Kevin Y. Yip²⁶, Chris Zaleski⁵, Yan Zhang¹, Henry Zheng¹, Steven E. Brenner³, Brenton R. Graveley⁸, Susan E. Celniker⁴, Thomas R. Gingeras⁵, Robert H. Waterston⁶ - Show less +104 more•Institutions (26)

Yale University¹, Dartmouth College², University of California, Berkeley³, Lawrence Berkeley National Laboratory⁴, Cold Spring Harbor Laboratory⁵, University of Washington⁶, University of California, Los Angeles⁷, University of Connecticut Health Center⁸, Pompeu Fabra University⁹, Harvard University¹⁰, Indiana University¹¹, Tsinghua University¹², National Institutes of Health¹³, Wellcome Trust Sanger Institute¹⁴, University of Lausanne¹⁵, Swiss Institute of Bioinformatics¹⁶, King's College London¹⁷, Kettering University¹⁸, Carnegie Mellon University¹⁹, Vanderbilt University²⁰, University of California, Irvine²¹, Howard Hughes Medical Institute²², European Bioinformatics Institute²³, University of Vienna²⁴, CAS-MPG Partner Institute for Computational Biology²⁵, The Chinese University of Hong Kong²⁶

28 Aug 2014-Nature

TL;DR: It is found in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a ‘universal model’ based on a single set of organism-independent parameters.

...read moreread less

Abstract: The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.

...read moreread less

284 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

STAR: ultrafast universal RNA-seq aligner

[...]

Alexander Dobin¹, Carrie A. Davis¹, Felix Schlesinger¹, Jorg Drenkow¹, Chris Zaleski¹, Sonali Jha¹, Philippe Batut¹, Mark Chaisson¹, Thomas R. Gingeras¹ - Show less +5 more•Institutions (1)

Cold Spring Harbor Laboratory¹

01 Jan 2013-Bioinformatics

TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.

...read moreread less

Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

...read moreread less

30,684 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

Tissue-based map of the human proteome

[...]

Mathias Uhlén¹, Mathias Uhlén², Linn Fagerberg¹, Björn M. Hallström¹, Cecilia Lindskog³, Per Oksvold¹, Adil Mardinoglu⁴, Åsa Sivertsson¹, Caroline Kampf³, Evelina Sjöstedt¹, Evelina Sjöstedt³, Anna Asplund³, IngMarie Olsson³, Karolina Edlund, Emma Lundberg¹, Sanjay Navani, Cristina Al-Khalili Szigyarto¹, Jacob Odeberg¹, Dijana Djureinovic³, Jenny Ottosson Takanen¹, Sophia Hober¹, Tove Alm¹, Per-Henrik Edqvist³, Holger Berling¹, Hanna Tegel¹, Jan Mulder³, Johan Rockberg¹, Peter Nilsson¹, Jochen M. Schwenk¹, Marica Hamsten¹, Kalle von Feilitzen¹, Mattias Forsberg¹, Lukas Persson¹, Fredric Johansson¹, Martin Zwahlen¹, Gunnar von Heijne⁵, Jens Nielsen², Jens Nielsen⁴, Fredrik Pontén³ - Show less +35 more•Institutions (5)

Royal Institute of Technology¹, Technical University of Denmark², Science for Life Laboratory³, Chalmers University of Technology⁴, Stockholm University⁵

23 Jan 2015-Science

TL;DR: In this paper, a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level.

...read moreread less

Abstract: Resolving the molecular details of proteome variation in the different tissues and organs of the human body will greatly increase our knowledge of human biology and disease. Here, we present a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level. Our tissue-based analysis detected more than 90% of the putative protein-coding genes. We used this approach to explore the human secretome, the membrane proteome, the druggable proteome, the cancer proteome, and the metabolic functions in 32 different tissues and organs. All the data are integrated in an interactive Web-based database that allows exploration of individual proteins, as well as navigation of global expression patterns, in all major tissues and organs in the human body.

...read moreread less

9,745 citations

Journal Article•

An integrated encyclopedia of DNA elements in the human genome.

[...]

ENCODEConsortium

01 Jan 2012-Nature

...read moreread less

8,106 citations

Journal Article•DOI•

NCBI GEO: archive for functional genomics data sets—update

[...]

Tanya Barrett¹, Stephen E. Wilhite¹, Pierre Ledoux¹, Carlos Evangelista¹, Irene F. Kim¹, Maxim Tomashevsky¹, Kimberly A. Marshall¹, Katherine Phillippy¹, Patti M. Sherman¹, Michelle Holko¹, Andrey Yefanov¹, Hye Seung Lee¹, Naigong Zhang¹, Cynthia L. Robertson¹, Nadezhda Serova¹, Sean Davis¹, Alexandra Soboleva¹ - Show less +13 more•Institutions (1)

National Institutes of Health¹

27 Nov 2012-Nucleic Acids Research

TL;DR: The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.

...read moreread less

Abstract: The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

...read moreread less

6,683 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse