Home
/
Authors
/
Jingbo Tang

Author

Jingbo Tang

Bio: Jingbo Tang is an academic researcher from Central South University. The author has contributed to research in topics: Sequence assembly & Contig. The author has an hindex of 8, co-authored 15 publications receiving 6795 citations.

Topics: Sequence assembly, Contig, Phylogenomics, RNA-Seq, Genome ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

[...]

Ruibang Luo¹, Binghang Liu¹, Yinlong Xie², Yinlong Xie¹, Zhenyu Li¹, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W. Cheung¹, Siu-Ming Yiu¹, Shaoliang Peng³, Zhu Xiao-qian³, Guangming Liu³, Xiangke Liao³, Yingrui Li¹, Huanming Yang, Jian Wang, Tak-Wah Lam¹, Jun Wang - Show less +27 more•Institutions (3)

University of Hong Kong¹, South China University of Technology², National University of Defense Technology³

27 Dec 2012-GigaScience

TL;DR: This work provides an updated assembly version of the 2008 Asian genome using SOAPdenovo2, a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.

...read moreread less

Abstract: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

...read moreread less

4,284 citations

Journal Article•DOI•

Phylogenomics resolves the timing and pattern of insect evolution

[...]

Bernhard Misof, Shanlin Liu, Karen Meusemann¹, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen², Jessica L. Ware², Tomas Flouri³, Rolf G. Beutel⁴, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco³, Torsten Wappler⁵, Jes Rust⁵, Andre J. Aberer³, Ulrike Aspöck⁶, Ulrike Aspöck⁷, Horst Aspöck⁷, Daniela Bartel⁷, Alexander Blanke⁸, Simon Berger³, Alexander Böhm⁷, Thomas R. Buckley⁹, Brett Calcott¹⁰, Junqing Chen, Frank Friedrich¹¹, Makiko Fukui¹², Mari Fujita⁸, Carola Greve, Peter Grobe, Shengchang Gu, Ying Huang, Lars S. Jermiin¹, Akito Y. Kawahara¹³, Lars Krogmann¹⁴, Martin Kubiak¹¹, Robert Lanfear¹⁵, Robert Lanfear¹⁶, Robert Lanfear¹⁷, Harald Letsch⁷, Yiyuan Li, Zhenyu Li, Jiguang Li, Haorong Lu, Ryuichiro Machida⁸, Yuta Mashimo⁸, Pashalia Kapli¹⁸, Pashalia Kapli³, Duane D. McKenna¹⁹, Guanliang Meng, Yasutaka Nakagaki⁸, José Luis Navarrete-Heredia²⁰, Michael Ott²¹, Yanxiang Ou, Günther Pass⁷, Lars Podsiadlowski⁵, Hans Pohl⁴, Björn M. von Reumont²², Kai Schütte¹¹, Kaoru Sekiya⁸, Shota Shimizu⁸, Adam Slipinski¹, Alexandros Stamatakis³, Alexandros Stamatakis²³, Wenhui Song, Xu Su, Nikolaus U. Szucsich⁷, Meihua Tan, Xuemei Tan, Min Tang, Jingbo Tang, Gerald Timelthaler⁷, Shigekazu Tomizuka⁸, Michelle D. Trautwein²⁴, Xiaoli Tong²⁵, Toshiki Uchifune⁸, Manfred Walzl⁷, Brian M. Wiegmann²⁶, Jeanne Wilbrandt, Benjamin Wipfler⁴, Thomas K. F. Wong¹, Qiong Wu, Gengxiong Wu, Yinlong Xie, Shenzhou Yang, Qing Yang, David K. Yeates¹, Kazunori Yoshizawa²⁷, Qing Zhang, Rui Zhang, Wenwei Zhang, Yunhui Zhang, Jing Zhao, Chengran Zhou, Lili Zhou, Tanja Ziesmann, Shijie Zou, Yingrui Li, Xun Xu, Yong Zhang, Huanming Yang, Jian Wang, Jun Wang, Karl M. Kjer², Xin Zhou - Show less +102 more•Institutions (27)

Commonwealth Scientific and Industrial Research Organisation¹, Rutgers University², Heidelberg Institute for Theoretical Studies³, University of Jena⁴, University of Bonn⁵, Naturhistorisches Museum⁶, University of Vienna⁷, University of Tsukuba⁸, Landcare Research⁹, Johns Hopkins University¹⁰, University of Hamburg¹¹, Ehime University¹², Florida Museum of Natural History¹³, Staatliches Museum für Naturkunde Stuttgart¹⁴, Australian National University¹⁵, Macquarie University¹⁶, National Evolutionary Synthesis Center¹⁷, American Museum of Natural History¹⁸, University of Memphis¹⁹, University of Guadalajara²⁰, Bavarian Academy of Sciences and Humanities²¹, Natural History Museum²², Karlsruhe Institute of Technology²³, California Academy of Sciences²⁴, South China Agricultural University²⁵, North Carolina State University²⁶, Hokkaido University²⁷

07 Nov 2014-Science

TL;DR: The phylogeny of all major insect lineages reveals how and when insects diversified and provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

...read moreread less

Abstract: Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

...read moreread less

1,998 citations

Journal Article•DOI•

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

[...]

Yinlong Xie¹, Yinlong Xie², Gengxiong Wu, Jingbo Tang³, Ruibang Luo², Jordan Patterson⁴, Shanlin Liu, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Tak-Wah Lam², Yingrui Li, Xun Xu, Gane Ka-Shu Wong⁴, Jun Wang - Show less +13 more•Institutions (4)

South China University of Technology¹, University of Hong Kong², Central South University³, University of Alberta⁴

15 Jun 2014-Bioinformatics

TL;DR: The conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution, compared with two other popular transcriptome assemblers.

...read moreread less

Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 � 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge. Results: Here, we present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse. Using as our benchmarks the known transcripts from these wellannotated genomes (sequenced a decade ago), we assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution. Availability and implementation: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/. Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

730 citations

Posted Content•

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

[...]

South China University of Technology¹, University of Hong Kong², Central South University³, University of Alberta⁴

29 May 2013-arXiv: Genomics

TL;DR: SOAPdenovo-Trans as mentioned in this paper is a de novo transcriptome assembler designed specifically for RNA-Seq that provides higher contiguity, lower redundancy, and faster execution.

...read moreread less

Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this technol- ogy, de novo assembly to recover complete or full-length transcript sequences remains an algorithmic challenge. Results: We present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance was evaluated on transcriptome datasets from rice and mouse. Using the known transcripts from these well-annotated genomes (sequenced a decade ago) as our benchmark, we assessed how SOAPdenovo- Trans and two other popular software handle the practical issues of alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy, and faster execution. Availability and Implementation: Source code and user manual are at this http URL Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com

...read moreread less

615 citations

Journal Article•DOI•

Erratum to "SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler" [GigaScience, (2012), 1, 18]

[...]

Ruibang Luo, Binghang Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W. Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiao-qian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam, Jun Wang - Show less +26 more

01 Jan 2015

TL;DR: This research presents a novel probabilistic approach to estimating the response of the immune system to laser-spot assisted, 3D image recognition.

...read moreread less

200 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

[...]

Daehwan Kim¹, Joseph M. Paggi², Chanhee Park¹, Christopher Bennett¹, Steven L. Salzberg³ - Show less +1 more•Institutions (3)

University of Texas Southwestern Medical Center¹, Stanford University², Johns Hopkins University³

01 Aug 2019-Nature Biotechnology

TL;DR: This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.

...read moreread less

Abstract: The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays. A graph-based genome indexing scheme enables variant-aware alignment of sequences with very low memory requirements.

...read moreread less

4,855 citations

Journal Article•DOI•

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

[...]

Mihaela Pertea¹, Daehwan Kim¹, Geo Pertea¹, Jeffrey T. Leek¹, Steven L. Salzberg¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

01 Sep 2016-Nature Protocols

TL;DR: This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts.

...read moreread less

Abstract: High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

...read moreread less

3,755 citations

Journal Article•DOI•

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

[...]

Dinghua Li¹, Chi-Man Liu¹, Ruibang Luo¹, Kunihiko Sadakane¹, Tak-Wah Lam¹ - Show less +1 more•Institutions (1)

National Institute of Informatics¹

15 May 2015-Bioinformatics

TL;DR: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner and generated a three-time larger assembly, with longer contig N50 and average contig length.

...read moreread less

Abstract: Summary: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., no pre-processing like partitioning and normalization was needed. When compared with previous methods (Chikhi and Rizk, 2012; Howe, et al., 2014) on assembling the soil data, MEGAHIT generated a 3-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a 4-fold improvement . Availability: The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license. Contact: rb@l3-bioinfo.com, twlam@cs.hku.hk

...read moreread less

3,634 citations

Journal Article•DOI•

PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses.

[...]

Robert Lanfear¹, Paul B. Frandsen², April M. Wright³, Tereza Senfeld⁴, Brett Calcott⁵ - Show less +1 more•Institutions (5)

Australian National University¹, Smithsonian Institution², Iowa State University³, Macquarie University⁴, University of Sydney⁵

23 Dec 2016-Molecular Biology and Evolution

TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.

...read moreread less

Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.

...read moreread less

3,445 citations

Posted Content•

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

[...]

Dinghua Li¹, Chi-Man Liu¹, Ruibang Luo¹, Kunihiko Sadakane¹, Tak-Wah Lam¹ - Show less +1 more•Institutions (1)

National Institute of Informatics¹

25 Sep 2014-arXiv: Genomics

TL;DR: MEGAHIT as mentioned in this paper is a NGS de novo assembler for assembling large and complex metagenomics data in a time and cost-efficient manner, which avoids preprocessing like partitioning and normalization, which might compromise on result integrity.

...read moreread less

Abstract: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at this https URL under GPLv3 license.

...read moreread less

2,673 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse