Home
/
Authors
/
Zhong Wang

Author

Zhong Wang

Other affiliations: Joint Genome Institute, Yale University, Duke University ...read more

Bio: Zhong Wang is an academic researcher from Lawrence Berkeley National Laboratory. The author has contributed to research in topics: Genome & RNA-Seq. The author has an hindex of 29, co-authored 61 publications receiving 21060 citations. Previous affiliations of Zhong Wang include Joint Genome Institute & Yale University.

Topics: Genome, RNA-Seq, Transcriptome, Metagenomics, Gene ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2001
1993

Papers

PDF

Open Access

More filters

Deep sequencing of a plant transcriptome

[...]

Jeffrey Martin¹, Stephen M. Gross, James C. Schnable, Cindy Choi, Mei Wang, Kanwar Singh, Erika Lindquist, Feng Chen, Chia-Lin Wei, Zhong Wang - Show less +6 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

20 Mar 2012

TL;DR: Martin et al. as discussed by the authors presented a deep sequencing of a plant transcriptome with the support of the US Department of Energy Joint Genome Institute under Contract Number DE-AC02-05CH11231.

...read moreread less

Abstract: Deep sequencing of a plant transcriptome Jeff Martin 1 , Stephen Gross 1 , James Schnable 2 , Cindy Choi 1 , Mei Wang 1 , Kanwar Singh 1 , Erika Lindquist 1 , Feng Chen 1 , Chia-Lin Wei 1 , and Zhong Wang 1 , 1 Department of Energy Joint Genome Institute // LBNL - Walnut Creek, CA 2 University of California Berkeley - Berkeley, CA a To whom correspondence may be addressed. E-mail: jamartin@lbl.gov March 20, 2012 ACKNOWLEDGMENTS: The work conducted by the US Department of Energy (DOE) Joint Genome Institute is supported by the Office of Science of the DOE under Contract Number DE-AC02-05CH11231. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States Government, or any agency thereof, or the Regents of the University of California. DISCLAIMER: This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or The Regents of the University of California

...read moreread less

1 citations

Transcriptome Analysis of Drought-Tolerant CAM plants Agave deserti and Agave tequilana

[...]

Stephen M. Gross¹, Jeffrey Martin, June Simpson, Zhong Wang, Axel Visel - Show less +1 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

25 Mar 2013

TL;DR: Comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data are presented and compared to those of additional plant species to identify biological functions of gene families displaying sequence divergence in agave species.

...read moreread less

Abstract: Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development.

...read moreread less

1 citations

BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis

[...]

Karan Bhatia, Zhong Wang

22 Mar 2011

TL;DR: This work presents BioPig, a collection of cloud computing tools to scale data analysis and management of aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.

...read moreread less

Abstract: Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, including screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.

...read moreread less

1 citations

Proceedings Article•DOI•

Automatic Outlier Detection for Genome Assembly Quality Assessment

[...]

Taghrid Samak¹, Rob Egan¹, Brian Bushnell¹, Dan Gunter¹, Alex Copeland¹, Zhong Wang¹ - Show less +2 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

22 Oct 2013

TL;DR: A method to automatically detect errors in de novo assembled genomes by applying outlier detection algorithms to identify the precise locations of assembly errors.

...read moreread less

Abstract: In this work we describe a method to automatically detect errors in de novo assembled genomes. The method extends a Bayesian assembly quality evaluation framework, ALE, which computes the likelihood of an assembly given a set of unassembled data. Starting from ALE output, this method applies outlier detection algorithms to identify the precise locations of assembly errors. We show results from a microbial genome with manually curated assembly errors. Our method detects all deletions, 82.3% of insertions, and 88.8% of single base substitutions. It was also able to detect an inversion error that spans more than 400 bases.

...read moreread less

1 citations

Journal Article•DOI•

Persistent memory as an effective alternative to random access memory in metagenome assembly

[...]

Jing Xian Sun, Rob Egan, Harrison Ho, Yue-e Li, Zhong Wang - Show less +1 more

21 Apr 2022-BMC Bioinformatics

TL;DR: In this paper , the authors explored the possibility of using persistent memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers.

...read moreread less

Abstract: The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become too large to be assembled on a single computer node, and there is no reliable method to predict the memory requirement due to data-specific memory consumption pattern. Currently, out-of-memory (OOM) is one of the most prevalent factors that causes metagenome assembly failures.In this study, we explored the possibility of using Persistent Memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers. We evaluated the execution time and memory usage of three popular metagenome assemblers (MetaSPAdes, MEGAHIT, and MetaHipMer2) in datasets up to one terabase. We found that PMem can enable metagenome assemblers on terabyte-sized datasets by partially or fully substituting DRAM. Depending on the configured DRAM/PMEM ratio, running metagenome assemblies with PMem can achieve a similar speed as DRAM, while in the worst case it showed a roughly two-fold slowdown. In addition, different assemblers displayed distinct memory/speed trade-offs in the same hardware/software environment.We demonstrated that PMem is capable of expanding the capacity of DRAM to allow larger metagenome assembly with a potential tradeoff in speed. Because PMem can be used directly without any application-specific code modification, these findings are likely to be generalized to other memory-intensive bioinformatics applications.

...read moreread less

1 citations

1
2
3
4
5
6
7
8
…
9
10
11
12
13
14

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

[...]

Bo Li¹, Colin N. Dewey¹•Institutions (1)

University of Wisconsin-Madison¹

04 Aug 2011-BMC Bioinformatics

TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.

...read moreread less

Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

...read moreread less

14,524 citations

Journal Article•DOI•

Differential expression analysis for sequence count data.

[...]

Simon Anders, Wolfgang Huber

27 Oct 2010-Genome Biology

TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.

...read moreread less

Abstract: High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.

...read moreread less

13,356 citations

Journal Article•DOI•

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

[...]

Cole Trapnell¹, Cole Trapnell², Brian A. Williams³, Geo Pertea², Ali Mortazavi³, Gordon Kwan³, Marijke J. van Baren⁴, Steven L. Salzberg², Barbara J. Wold³, Lior Pachter¹ - Show less +6 more•Institutions (4)

University of California, Berkeley¹, University of Maryland, College Park², California Institute of Technology³, Washington University in St. Louis⁴

01 May 2010-Nature Biotechnology

TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

...read moreread less

Abstract: High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

...read moreread less

13,337 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse