Home
/
Authors
/
Guangzhu He

Author

Guangzhu He

Bio: Guangzhu He is an academic researcher from University of Hong Kong. The author has contributed to research in topics: Sequence assembly & RNA-Seq. The author has an hindex of 6, co-authored 6 publications receiving 5104 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

[...]

Ruibang Luo¹, Binghang Liu¹, Yinlong Xie², Yinlong Xie¹, Zhenyu Li¹, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W. Cheung¹, Siu-Ming Yiu¹, Shaoliang Peng³, Zhu Xiao-qian³, Guangming Liu³, Xiangke Liao³, Yingrui Li¹, Huanming Yang, Jian Wang, Tak-Wah Lam¹, Jun Wang - Show less +27 more•Institutions (3)

University of Hong Kong¹, South China University of Technology², National University of Defense Technology³

27 Dec 2012-GigaScience

TL;DR: This work provides an updated assembly version of the 2008 Asian genome using SOAPdenovo2, a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.

...read moreread less

Abstract: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

...read moreread less

4,284 citations

Journal Article•DOI•

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

[...]

Yinlong Xie¹, Yinlong Xie², Gengxiong Wu, Jingbo Tang³, Ruibang Luo¹, Jordan Patterson⁴, Shanlin Liu, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Tak-Wah Lam¹, Yingrui Li, Xun Xu, Gane Ka-Shu Wong⁴, Jun Wang - Show less +13 more•Institutions (4)

University of Hong Kong¹, South China University of Technology², Central South University³, University of Alberta⁴

15 Jun 2014-Bioinformatics

TL;DR: The conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution, compared with two other popular transcriptome assemblers.

...read moreread less

Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 � 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge. Results: Here, we present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse. Using as our benchmarks the known transcripts from these wellannotated genomes (sequenced a decade ago), we assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution. Availability and implementation: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/. Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

730 citations

Posted Content•

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

[...]

Yinlong Xie¹, Yinlong Xie², Gengxiong Wu, Jingbo Tang³, Ruibang Luo², Jordan Patterson⁴, Shanlin Liu, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Tak-Wah Lam², Yingrui Li, Xun Xu, Gane Ka-Shu Wong⁴, Jun Wang - Show less +13 more•Institutions (4)

South China University of Technology¹, University of Hong Kong², Central South University³, University of Alberta⁴

29 May 2013-arXiv: Genomics

TL;DR: SOAPdenovo-Trans as mentioned in this paper is a de novo transcriptome assembler designed specifically for RNA-Seq that provides higher contiguity, lower redundancy, and faster execution.

...read moreread less

Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this technol- ogy, de novo assembly to recover complete or full-length transcript sequences remains an algorithmic challenge. Results: We present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance was evaluated on transcriptome datasets from rice and mouse. Using the known transcripts from these well-annotated genomes (sequenced a decade ago) as our benchmark, we assessed how SOAPdenovo- Trans and two other popular software handle the practical issues of alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy, and faster execution. Availability and Implementation: Source code and user manual are at this http URL Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com

...read moreread less

615 citations

Journal Article•DOI•

Erratum to "SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler" [GigaScience, (2012), 1, 18]

[...]

Ruibang Luo, Binghang Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W. Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiao-qian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam, Jun Wang - Show less +26 more

01 Jan 2015

TL;DR: This research presents a novel probabilistic approach to estimating the response of the immune system to laser-spot assisted, 3D image recognition.

...read moreread less

200 citations

Journal Article•DOI•

De novo assembly of a haplotype-resolved human genome

[...]

Hongzhi Cao¹, Honglong Wu, Ruibang Luo, Shujia Huang², Yuhui Sun², Xin Tong, Yinlong Xie², Binghang Liu, Hailong Yang, Hancheng Zheng¹, Jian Li¹, Bo Li, Yu Wang², Fang Yang, Peng Sun, Siyang Liu¹, Peng Gao, Haodong Huang², Sun Jing, Dan Chen, Guangzhu He, Weihua Huang, Zheng Huang, Li Yue, Laurent C. A. M. Tellier¹, Xiao Liu¹, Qiang Feng¹, Xun Xu, Xiuqing Zhang, Lars Bolund³, Anders Krogh¹, Karsten Kristiansen¹, Radoje Drmanac, Snezana Drmanac, Rasmus Nielsen⁴, Songgang Li, Jian Wang, Huanming Yang⁵, Yingrui Li⁶, Gane Ka-Shu Wong⁷, Jun Wang⁸ - Show less +37 more•Institutions (8)

University of Copenhagen¹, South China University of Technology², The Breast Cancer Research Foundation³, University of California, Berkeley⁴, King Abdulaziz University⁵, University of Queensland⁶, University of Alberta⁷, Macau University of Science and Technology⁸

01 Jun 2015-Nature Biotechnology

TL;DR: This haplotype-resolved diploid genome represents the most complete de novo human genome assembly to date and should aid in translating genotypes to phenotypes for the development of personalized medicine.

...read moreread less

Abstract: The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

...read moreread less

79 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

[...]

Daehwan Kim¹, Joseph M. Paggi², Chanhee Park¹, Christopher Bennett¹, Steven L. Salzberg³ - Show less +1 more•Institutions (3)

University of Texas Southwestern Medical Center¹, Stanford University², Johns Hopkins University³

01 Aug 2019-Nature Biotechnology

TL;DR: This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.

...read moreread less

Abstract: The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays. A graph-based genome indexing scheme enables variant-aware alignment of sequences with very low memory requirements.

...read moreread less

4,855 citations

Journal Article•DOI•

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

[...]

Mihaela Pertea¹, Daehwan Kim¹, Geo Pertea¹, Jeffrey T. Leek¹, Steven L. Salzberg¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

01 Sep 2016-Nature Protocols

TL;DR: This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts.

...read moreread less

Abstract: High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

...read moreread less

3,755 citations

Journal Article•DOI•

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

[...]

Dinghua Li¹, Chi-Man Liu¹, Ruibang Luo¹, Kunihiko Sadakane¹, Tak-Wah Lam¹ - Show less +1 more•Institutions (1)

National Institute of Informatics¹

15 May 2015-Bioinformatics

TL;DR: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner and generated a three-time larger assembly, with longer contig N50 and average contig length.

...read moreread less

Abstract: Summary: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., no pre-processing like partitioning and normalization was needed. When compared with previous methods (Chikhi and Rizk, 2012; Howe, et al., 2014) on assembling the soil data, MEGAHIT generated a 3-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a 4-fold improvement . Availability: The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license. Contact: rb@l3-bioinfo.com, twlam@cs.hku.hk

...read moreread less

3,634 citations

Posted Content•

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

[...]

Dinghua Li¹, Chi-Man Liu¹, Ruibang Luo¹, Kunihiko Sadakane¹, Tak-Wah Lam¹ - Show less +1 more•Institutions (1)

National Institute of Informatics¹

25 Sep 2014-arXiv: Genomics

TL;DR: MEGAHIT as mentioned in this paper is a NGS de novo assembler for assembling large and complex metagenomics data in a time and cost-efficient manner, which avoids preprocessing like partitioning and normalization, which might compromise on result integrity.

...read moreread less

Abstract: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at this https URL under GPLv3 license.

...read moreread less

2,673 citations

Journal Article•DOI•

Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life.

[...]

Fredrik Bäckhed¹, Fredrik Bäckhed², Josefine Roswall², Yangqing Peng, Qiang Feng¹, Huijue Jia, Petia Kovatcheva-Datchary², Yin Li, Yan Xia, Hailiang Xie, Huanzi Zhong, Muhammad Tanweer Khan², Jianfeng Zhang, Junhua Li, Liang Xiao, Jumana Y. Al-Aama³, Dongya Zhang, Ying Shiuan Lee², Dorota Ewa Kotowska¹, Camilla Colding¹, Valentina Tremaroli², Ye Yin, Stefan Bergman², Xun Xu, Lise Madsen⁴, Lise Madsen¹, Karsten Kristiansen¹, Jovanna Dahlgren², Jun Wang - Show less +25 more•Institutions (4)

University of Copenhagen¹, University of Gothenburg², King Abdulaziz University³, National Institute of Nutrition, Hyderabad⁴

13 May 2015-Cell Host & Microbe

TL;DR: The gut microbiota of infants delivered by C-section showed significantly less resemblance to their mothers and nutrition had a major impact on early microbiota composition and function, with cessation of breast-feeding, rather than introduction of solid food, being required for maturation into an adult-like microbiota.

...read moreread less

2,227 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse