Home
/
Authors
/
Hui Du

Author

Hui Du

Bio: Hui Du is an academic researcher from Washington University in St. Louis. The author has contributed to research in topics: Biology & Medicine. The author has an hindex of 4, co-authored 4 publications receiving 2651 citations.

Topics: Biology, Medicine, Genome, Gene, Chromosome 4 ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes.

[...]

Helen Skaletsky¹, Tomoko Kuroda-Kawaguchi¹, Patrick Minx², Holland S. Cordum², LaDeana W. Hillier², Laura G. Brown¹, Sjoerd Repping, Tatyana Pyntikova¹, Johar Ali², Tamberlyn Bieri², Asif T. Chinwalla², Andrew Delehaunty², Kim D. Delehaunty², Hui Du², Ginger A. Fewell², Lucinda Fulton², Robert S. Fulton², Tina Graves², Shunfang Hou², Philip Latrielle², Shawn Leonard², Elaine R. Mardis², Rachel Maupin², John Douglas Mcpherson², Tracie L. Miner², William E. Nash², Christine Nguyen², Philip Ozersky², Kymberlie H. Pepin², Susan M. Rock², Tracy Rohlfing², Kelsi Scott², Brian Schultz², Cindy Strong², Aye Mon Tin-Wollam², Shiaw-Pyng Yang², Robert H. Waterston², Richard K. Wilson², Steve Rozen¹, David C. Page¹ - Show less +36 more•Institutions (2)

Massachusetts Institute of Technology¹, Washington University in St. Louis²

19 Jun 2003-Nature

TL;DR: The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length, and is a mosaic of heterochromatic sequences and three classes of euchromatics sequences: X-transposed, X-degenerate and ampliconic.

...read moreread less

Abstract: The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes contain all 156 known transcription units, which include 78 protein-coding genes that collectively encode 27 distinct proteins. The X-transposed sequences exhibit 99% identity to the X chromosome. The X-degenerate sequences are remnants of ancient autosomes from which the modern X and Y chromosomes evolved. The ampliconic class includes large regions (about 30% of the MSY euchromatin) where sequence pairs show greater than 99.9% identity, which is maintained by frequent gene conversion (non-reciprocal transfer). The most prominent features here are eight massive palindromes, at least six of which contain testis genes.

...read moreread less

2,022 citations

Journal Article•DOI•

Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana

[...]

Klaus F. X. Mayer¹, C. Schüller¹, R. Wambutt, George Murphy² +230 more•Institutions (21)

16 Dec 1999-Nature

TL;DR: Analysis of 17.38 megabases of unique sequence, representing about 17% of the Arabidopsis genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements.

...read moreread less

Abstract: The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.

...read moreread less

411 citations

Journal Article•DOI•

The DNA sequence of human chromosome 7

[...]

LaDeana W. Hillier¹, Robert S. Fulton¹, Lucinda Fulton¹, Tina Graves¹, Kymberlie H. Pepin¹, Caryn Wagner-McPherson¹, Dan Layman¹, Jason Maas¹, Sara Jaeger¹, Rebecca S. Walker¹, Kristine M. Wylie¹, Mandeep Sekhon¹, Michael C. Becker¹, Michelle O'Laughlin¹, Mark E. Schaller¹, Ginger A. Fewell¹, Kimberly D. Delehaunty¹, Tracie L. Miner¹, William E. Nash¹, Matt Cordes¹, Hui Du¹, Hui Sun¹, Jennifer Edwards¹, Holland Bradshaw-Cordum¹, Johar Ali¹, Stephanie Andrews¹, Amber Isak¹, Andrew Vanbrunt¹, Christine Nguyen¹, Feiyu Du¹, Betty Lamar¹, Laura Courtney¹, Joelle Kalicki¹, Philip Ozersky¹, Lauren Bielicki¹, Kelsi Scott¹, Andrea Holmes¹, Richard Harkins¹, Anthony R. Harris¹, Cindy Strong¹, Shunfang Hou¹, Chad Tomlinson¹, Sara Dauphin-Kohlberg¹, Amy Kozlowicz-Reilly¹, Shawn Leonard¹, Theresa Rohlfing¹, Susan M. Rock¹, Aye-Mon Tin-Wollam¹, Amanda Abbott¹, Patrick Minx¹, Rachel Maupin¹, Catrina Strowmatt¹, Phil Latreille¹, Nancy Miller¹, Doug Johnson¹, Jennifer Murray¹, Jeffrey Woessner¹, Michael C. Wendl¹, Shiaw-Pyng Yang¹, Brian Schultz¹, John W. Wallis¹, John Spieth¹, Tamberlyn Bieri¹, Joanne O. Nelson¹, Nicolas Berkowicz¹, Patricia Wohldmann¹, Lisa Cook¹, Matthew T. Hickenbotham¹, James M. Eldred¹, Donald Williams¹, Joseph A. Bedell¹, Elaine R. Mardis¹, Sandra W. Clifton¹, Stephanie L. Chissoe¹, Marco A. Marra¹, Marco A. Marra², Christopher K. Raymond³, Eric Haugen³, Will Gillett³, Yang Zhou³, R. James³, Karen A. Phelps³, Shawn Iadanoto³, Kerry L. Bubb³, Elizabeth Simms³, Ruth Levy³, James B. Clendenning³, Rajinder Kaul³, W. James Kent⁴, Terrence S. Furey⁴, Robert Baertsch⁴, Michael R. Brent¹, Evan Keibler¹, Paul Flicek¹, Peer Bork⁵, Mikita Suyama⁵, Jeffrey A. Bailey⁶, Matthew E. Portnoy⁷, David Torrents⁵, Asif T. Chinwalla¹, Warren Gish¹, Sean R. Eddy¹, John Douglas Mcpherson⁸, John Douglas Mcpherson¹, Maynard V. Olson³, Evan E. Eichler⁶, Eric D. Green⁷, Robert H. Waterston³, Robert H. Waterston¹, Richard K. Wilson¹ - Show less +106 more•Institutions (8)

Washington University in St. Louis¹, BC Cancer Agency², University of Washington³, University of California, Santa Cruz⁴, European Bioinformatics Institute⁵, Case Western Reserve University⁶, National Institutes of Health⁷, Human Genome Sequencing Center⁸

10 Jul 2003-Nature

TL;DR: The euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far, has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence.

...read moreread less

Abstract: Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame.

...read moreread less

244 citations

Journal Article•DOI•

Generation and annotation of the DNA sequences of human chromosomes 2 and 4

[...]

LaDeana W. Hillier¹, Tina Graves¹, Robert S. Fulton¹, Lucinda Fulton¹, Kymberlie H. Pepin¹, Patrick Minx¹, Caryn Wagner-McPherson¹, Dan Layman¹, Kristine M. Wylie¹, Mandeep Sekhon¹, Michael C. Becker¹, Ginger A. Fewell¹, Kimberly D. Delehaunty¹, Tracie L. Miner¹, William E. Nash¹, Colin Kremitzki¹, Lachlan G. Oddy¹, Hui Du¹, Hui Sun¹, Holland Bradshaw-Cordum¹, Johar Ali¹, Jason Carter¹, Matt Cordes¹, Anthony R. Harris¹, Amber Isak¹, Andrew Van Brunt¹, Christine Nguyen¹, Feiyu Du¹, Laura Courtney¹, Joelle Kalicki¹, Philip Ozersky¹, Scott Abbott¹, Jon R. Armstrong¹, Edward A. Belter¹, Lauren Caruso¹, Maria Cedroni¹, Marc Cotton¹, Teresa Davidson¹, Anu Desai¹, Glendoria Elliott¹, Thomas Erb¹, Catrina Fronick¹, Tony Gaige¹, William Haakenson¹, Krista Haglund¹, Andrea Holmes¹, Richard Harkins¹, Kyung Kim¹, Scott Kruchowski¹, Cindy Strong¹, Neenu Grewal¹, Ernest Goyea¹, Shunfang Hou¹, Andrew Levy¹, Scott Martinka¹, Kelly Mead¹, Michael D. McLellan¹, Rick Meyer¹, Jennifer Randall-Maher¹, Chad Tomlinson¹, Sara Dauphin-Kohlberg¹, Amy Kozlowicz-Reilly¹, Neha Shah¹, Sharhonda Swearengen-Shahid¹, Jacqueline E. Snider¹, Joseph T. Strong¹, Johanna Thompson¹, Martin Yoakum¹, Shawn Leonard¹, Charlene Pearman¹, Lee Trani¹, Maxim Radionenko¹, Jason Waligorski¹, Chunyan Wang¹, Susan M. Rock¹, Aye Mon Tin-Wollam¹, Rachel Maupin¹, Phil Latreille¹, Michael C. Wendl¹, Shiaw Pyng Yang¹, Craig Pohl¹, John W. Wallis¹, John Spieth¹, Tamberlyn Bieri¹, Nicolas Berkowicz¹, Joanne O. Nelson¹, John R. Osborne¹, Li Ding¹, Rekha Meyer¹, Aniko Sabo¹, Yoram Shotland¹, Prashant R. Sinha¹, Patricia Wohldmann¹, Lisa Cook¹, Matthew T. Hickenbotham¹, James M. Eldred¹, Donald Williams¹, Thomas A. Jones¹, Xinwei She², Francesca D. Ciccarelli, Elisa Izaurralde, James Taylor³, Jeremy Schmutz⁴, Richard M. Myers⁴, David R. Cox⁴, Xiaoqiu Huang⁵, John Douglas Mcpherson¹, John Douglas Mcpherson⁶, Elaine R. Mardis¹, Sandra W. Clifton¹, Wesley C. Warren¹, Asif T. Chinwalla¹, Sean R. Eddy¹, Marco A. Marra⁷, Marco A. Marra¹, Ivan Ovcharenko⁸, Terrence S. Furey⁹, Webb Miller³, Evan E. Eichler², Peer Bork, Mikita Suyama, David Torrents, Robert H. Waterston², Robert H. Waterston¹, Richard K. Wilson¹ - Show less +121 more•Institutions (9)

Washington University in St. Louis¹, University of Washington², Pennsylvania State University³, Stanford University⁴, Iowa State University⁵, Baylor College of Medicine⁶, University of British Columbia⁷, Lawrence Livermore National Laboratory⁸, University of California, Santa Cruz⁹

07 Apr 2005-Nature

TL;DR: Extensive analyses confirm the underlying construction of the sequence, and expand the understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.

...read moreread less

Abstract: Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.

...read moreread less

107 citations

Journal Article•DOI•

Plant pan-genomics: recent advances, new challenges, and roads ahead.

[...]

Wei Yi Li, Jianan Liu, Hongyu Zhang, Zengqing Liu, Yu Tao Wang, Longsheng Xing, Qiang He, Hui Du - Show less +4 more

01 Jun 2022-Journal of Genetics and Genomics

TL;DR: Pan-genomics can encompass most of the genetic diversity of a species or population and has proved to be a powerful tool for studying genomic evolution and the origin and domestication of species, and for providing information for plant improvement as discussed by the authors .

...read moreread less

11 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

[...]

Arabidopsis Genome Initiative¹•Institutions (1)

J. Craig Venter Institute¹

14 Dec 2000-Nature

TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

8,742 citations

Journal Article•DOI•

GSVA: gene set variation analysis for microarray and RNA-seq data.

[...]

Sonja Hänzelmann, Robert Castelo¹, Justin Guinney²•Institutions (2)

Pompeu Fabra University¹, Sage Bionetworks²

16 Jan 2013-BMC Bioinformatics

TL;DR: This work introduces Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner and constitutes a starting point to build pathway-centric models of biology.

...read moreread less

Abstract: Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org .

...read moreread less

6,125 citations

Journal Article•DOI•

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

[...]

Charity W. Law¹, Charity W. Law², Yunshun Chen¹, Yunshun Chen², Wei Shi¹, Wei Shi², Gordon K. Smyth², Gordon K. Smyth¹ - Show less +4 more•Institutions (2)

University of Melbourne¹, Walter and Eliza Hall Institute of Medical Research²

03 Feb 2014-Genome Biology

TL;DR: New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments, and the voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline.

...read moreread less

Abstract: New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.

...read moreread less

4,475 citations

Journal Article•DOI•

Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

[...]

Olof Emanuelsson¹, Henrik Nielsen², Søren Brunak², Gunnar von Heijne¹•Institutions (2)

Stockholm University¹, Technical University of Denmark²

21 Jul 2000-Journal of Molecular Biology

TL;DR: A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed and it is estimated that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%.

...read moreread less

4,268 citations

Journal Article•DOI•

Finishing the euchromatic sequence of the human genome

[...]

Chris P. Ponting, Daniel Barker

21 Oct 2004-Nature

TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.

...read moreread less

Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

...read moreread less

3,989 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse