Home
/
Authors
/
Liqing Zhang

Author

Liqing Zhang

Other affiliations: University of Chicago, University of California, Irvine, Virginia Bioinformatics Institute ...read more

Bio: Liqing Zhang is an academic researcher from Virginia Tech. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 30, co-authored 120 publications receiving 3566 citations. Previous affiliations of Liqing Zhang include University of Chicago & University of California, Irvine.

Topics: Genome, Gene, Indel, Population, Metagenomics ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

[...]

Rami A. Dalloul¹, Julie A. Long², Aleksey V. Zimin³, Luqman Aslam⁴, Kathryn Beal⁵, Le Ann Blomberg², Pascal Bouffard⁶, David W. Burt⁷, Oswald Crasta⁸, Richard P. M. A. Crooijmans⁴, Kristal L. Cooper⁸, Roger A. Coulombe⁹, Supriyo De¹⁰, Mary E. Delany¹¹, Jerry B. Dodgson¹², Jennifer J Dong¹³, Clive Evans⁸, Karin M. Frederickson⁶, Paul Flicek⁵, Liliana Florea³, Otto Folkerts⁸, Martien A. M. Groenen⁴, Tim Harkins⁶, Javier Herrero⁵, Steve Hoffmann¹⁴, Hendrik-Jan Megens⁴, Andrew Jiang¹¹, Pieter J. de Jong¹⁵, Peter K. Kaiser¹⁶, Heebal Kim¹⁷, Kyu-Won Kim¹⁷, Sungwon Kim¹, David Langenberger¹⁴, Mi-Kyung Lee¹³, Taeheon Lee¹⁷, Shrinivasrao P. Mane⁸, Guillaume Marçais³, Manja Marz¹⁴, Manja Marz¹⁸, A. P. McElroy¹, Thero Modise⁸, Mikhail Nefedov¹⁵, Cedric Notredame, Ian R. Paton⁷, William S. Payne¹², Geo Pertea³, Dennis Prickett¹⁶, Daniela Puiu³, Dan Qioa¹, Emanuele Raineri, Magali Ruffier¹⁹, Steven L. Salzberg³, Michael C. Schatz³, Chantel F. Scheuring¹³, Carl J. Schmidt²⁰, Steven Schroeder², Stephen M. J. Searle¹⁹, Edward J. Smith¹, Jacqueline Smith⁷, Tad S. Sonstegard², Peter F. Stadler, Hakim Tafer²¹, Hakim Tafer¹⁴, Zhijian Jake Tu¹, Curtis P. Van Tassell², Albert J. Vilella⁵, Kelly P. Williams⁸, James A. Yorke³, Liqing Zhang¹, Hong-Bin Zhang¹³, Xiaojun Zhang¹³, Yang Zhang¹³, Kent M. Reed²² - Show less +69 more•Institutions (22)

Virginia Tech¹, United States Department of Agriculture², University of Maryland, College Park³, Wageningen University and Research Centre⁴, European Bioinformatics Institute⁵, Roche Applied Science⁶, University of Edinburgh⁷, Virginia Bioinformatics Institute⁸, Utah State University⁹, National Institutes of Health¹⁰, University of California, Davis¹¹, Michigan State University¹², Texas A&M University¹³, Leipzig University¹⁴, Children's Hospital Oakland Research Institute¹⁵, Institute for Animal Health¹⁶, Seoul National University¹⁷, University of Marburg¹⁸, Wellcome Trust Sanger Institute¹⁹, University of Delaware²⁰, University of Vienna²¹, University of Minnesota²²

07 Sep 2010-PLOS Biology

TL;DR: The combined application of next-generation sequencing platforms has provided an economical approach to unlocking the potential of the turkey genome.

...read moreread less

Abstract: A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

...read moreread less

415 citations

Journal Article•DOI•

DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

[...]

Gustavo Arango-Argoty¹, Emily Garner¹, Amy Pruden¹, Lenwood S. Heath¹, Peter J. Vikesland¹, Liqing Zhang¹ - Show less +2 more•Institutions (1)

Virginia Tech¹

01 Feb 2018-Microbiome

TL;DR: The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice, and DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs.

...read moreread less

Abstract: Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the “best hits” of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models’ performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

...read moreread less

402 citations

Journal Article•DOI•

Mammalian Housekeeping Genes Evolve More Slowly than Tissue-Specific Genes

[...]

Liqing Zhang¹, Wen-Hsiung Li¹•Institutions (1)

University of Chicago¹

01 Feb 2004-Molecular Biology and Evolution

TL;DR: The results show that, in comparison to tissue-specific genes, housekeeping genes on average evolve more slowly and are under stronger selective constraints as reflected by significantly smaller values of Ka/Ks, and contrary to the old textbook concept, approximately 74% of theHousekeeping genes in this study belong to multigene families, not significantly different from that of the tissue- specific genes.

...read moreread less

Abstract: Do housekeeping genes, which are turned on most of the time in almost every tissue, evolve more slowly than genes that are turned on only at specific developmental times or tissues? Recent large-scale gene expression studies enable us to have a better definition of housekeeping genes and to address the above question in detail. In this study, we examined 1581 human-mouse orthologous gene pairs for their patterns of sequence evolution, contrasting housekeeping genes with tissue-specific genes. Our results show that, in comparison to tissue-specific genes, housekeeping genes on average evolve more slowly and are under stronger selective constraints as reflected by significantly smaller values of Ka/Ks. Besides stronger purifying selection, we explored several other factors that can possibly slow down nonsynonymous rates in housekeeping genes. Although mutational bias might slightly slow the nonsynonymous rates in housekeeping genes, it is unlikely to be the major cause of the rate difference between the two types of genes. The codon usage pattern of housekeeping genes does not seem to differ from that of tissue-specific genes. Moreover, contrary to the old textbook concept, we found that approximately 74% of the housekeeping genes in our study belong to multigene families, not significantly different from that of the tissue-specific genes ( approximately 70%). Therefore, the stronger selective constraints on housekeeping genes are not due to a lower degree of genetic redundancy.

...read moreread less

356 citations

Journal Article•DOI•

Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes

[...]

Mark J. Lawson¹, Liqing Zhang¹•Institutions (1)

Virginia Tech¹

21 Feb 2006-Genome Biology

TL;DR: Insight is provided into the evolution and distribution of SSRs in the two sequenced model plant genomes of monocots and dicots and reveals that the distributions appear highly non-random and vary a great deal in different regions of the genes in the genomes.

...read moreread less

Abstract: Simple sequence repeats (SSRs) in DNA have been traditionally thought of as functionally unimportant and have been studied mainly as genetic markers. A recent handful of studies have shown, however, that SSRs in different positions of a gene can play important roles in determining protein function, genetic development, and regulation of gene expression. We have performed a detailed comparative study of the distribution of SSRs in the sequenced genomes of Arabidopsis thaliana and rice. SSRs in different genic regions - 5'untranslated region (UTR), 3'UTR, exon, and intron - show distinct patterns of distribution both within and between the two genomes. Especially notable is the much higher density of SSRs in 5'UTRs compared to the other regions and a strong affinity towards trinucleotide repeats in these regions for both rice and Arabidopsis. On a genomic level, mononucleotide repeats are the most prevalent type of SSRs in Arabidopsis and trinucleotide repeats are the most prevalent type in rice. Both plants have the same most common mononucleotide (A/T) and dinucleotide (AT and AG) repeats, but have little in common for the other types of repeats. Our work provides insight into the evolution and distribution of SSRs in the two sequenced model plant genomes of monocots and dicots. Our analyses reveal that the distributions of SSRs appear highly non-random and vary a great deal in different regions of the genes in the genomes.

...read moreread less

221 citations

Journal Article•DOI•

PseKNC-General: A cross-platform package for generating various modes of pseudo nucleotide compositions

[...]

Wei Chen¹, Xitong Zhang², Jordan Brooker³, Hao Lin⁴, Liqing Zhang¹, Kuo-Chen Chou⁵ - Show less +2 more•Institutions (5)

Virginia Tech¹, University of Virginia², Vassar College³, University of Electronic Science and Technology of China⁴, King Abdulaziz University⁵

01 Jan 2015-Bioinformatics

TL;DR: P PseKNC-General (the general form of pseudo k-tuple nucleotide composition) is developed, that allows for fast and accurate computation of all the widely used nucleotide structural and physicochemical properties of both DNA and RNA sequences.

...read moreread less

Abstract: Associate Editor: John HancockABSTRACTSummary: The avalanche of genomic sequences generated in thepost-genomic age requires efficient computational methods for rapidlyand accurately identifying biological features from sequence informa-tion. Towards this goal, we developed a freely available and open-source package, called PseKNC-General (the general form ofpseudo k-tuple nucleotide composition), that allows for fast and ac-curate computation of all the widely used nucleotide structural andphysicochemical properties of both DNA and RNA sequences.PseKNC-General can generate several modes of pseudo nucleotidecompositions, including conventional k-tuple nucleotide compositions,Moreau–Broto autocorrelation coefficient, Moran autocorrelation coef-ficient, Geary autocorrelation coefficient, Type I PseKNC and Type IIPseKNC. In every mode,4100 physicochemical properties are avail-able for choosing. Moreover, it is flexible enough to allow the users tocalculate PseKNC with user-defined properties. The package can berun on Linux, Mac and Windows systems and also provides a graph-ical user interface.Availability and implementation: The package is freely available at:http://lin.uestc.edu.cn/server/pseknc.Contact: chenweiimu@gmail.com or lqzhang@vt.edu or kcchou@gor-donlifescience.org.Supplementary information: Supplementary data are available atBioinformatics online.Received on July 22, 2014; revised on August 19, 2014; accepted onAugust 31, 2014

...read moreread less

198 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

Cited by

PDF

Open Access

More filters

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

[...]

Fumio Tajima¹•Institutions (1)

Kyushu University¹

30 Oct 1989-Genomics

TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read moreread less

11,521 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

Evolution of Protein Molecules

[...]

S. Jeffery

01 Apr 1979-Biochemical Society Transactions

3,734 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse