Home
/
Authors
/
Yadong Wang

Author

Yadong Wang

Bio: Yadong Wang is an academic researcher from Harbin Institute of Technology. The author has contributed to research in topics: De Bruijn graph & Population. The author has an hindex of 19, co-authored 142 publications receiving 1539 citations.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2010
2009

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Review of Drug Repositioning Approaches and Resources.

[...]

Hanqing Xue¹, Jie Li¹, Haozhe Xie¹, Yadong Wang¹•Institutions (1)

Harbin Institute of Technology¹

13 Jul 2018-International Journal of Biological Sciences

TL;DR: Computational approaches are reviewed and highlighted their characteristics to provide references for researchers to develop more powerful approaches and to summarized 76 important resources about drug repositioning.

...read moreread less

Abstract: Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.

...read moreread less

407 citations

Journal Article•DOI•

An evaluation of copy number variation detection tools from whole-exome sequencing data.

[...]

Renjie Tan¹, Yadong Wang¹, Sarah E. Kleinstein², Yongzhuang Liu², Yongzhuang Liu¹, Xiaolin Zhu², Hongzhe Guo¹, Hongzhe Guo², Qinghua Jiang¹, Andrew S. Allen², Mingfu Zhu² - Show less +7 more•Institutions (2)

Harbin Institute of Technology¹, Duke University²

01 Jul 2014-Human Mutation

TL;DR: This evaluation provides a comprehensive and objective comparison of several well‐known detection tools designed for WES data, which will assist researchers in choosing the most suitable tools for their research needs.

...read moreread less

Abstract: Copy number variation (CNV) has been found to play an important role in human disease. Next-generation sequencing technology, including whole-genome sequencing (WGS) and whole-exome sequencing (WES), has become a primary strategy for studying the genetic basis of human disease. Several CNV calling tools have recently been developed on the basis of WES data. However, the comparative performance of these tools using real data remains unclear. An objective evaluation study of these tools in practical research situations would be beneficial. Here, we evaluated four well-known WES-based CNV detection tools (XHMM, CoNIFER, ExomeDepth, and CONTRA) using real data generated in house. After evaluation using six metrics, we found that the sensitive and accurate detection of CNVs in WES data remains challenging despite the many algorithms available. Each algorithm has its own strengths and weaknesses. None of the exome-based CNV calling methods performed well in all situations; in particular, compared with CNVs identified from high coverage WGS data from the same samples, all tools suffered from limited power. Our evaluation provides a comprehensive and objective comparison of several well-known detection tools designed for WES data, which will assist researchers in choosing the most suitable tools for their research needs.

...read moreread less

213 citations

Journal Article•DOI•

LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data

[...]

Qinghua Jiang¹, Rui Ma¹, Jixuan Wang¹, Xiaoliang Wu¹, Shuilin Jin¹, Jiajie Peng¹, Renjie Tan¹, Tianjiao Zhang¹, Yu Li¹, Yadong Wang¹ - Show less +6 more•Institutions (1)

Harbin Institute of Technology¹

29 Jan 2015-BMC Genomics

TL;DR: The hypergeometric test is used to functionally annotate a single lnc RNA or a set of lncRNAs with significantly enriched functional terms among the protein-coding genes that are significantly co-expressed with the lncRNA(s).

...read moreread less

Abstract: The GENCODE project has collected over 10,000 human long non-coding RNA (lncRNA) genes. However, the vast majority of them remain to be functionally characterized. Computational investigation of potential functions of human lncRNA genes is helpful to guide further experimental studies on lncRNAs. In this study, based on expression correlation between lncRNAs and protein-coding genes across 19 human normal tissues, we used the hypergeometric test to functionally annotate a single lncRNA or a set of lncRNAs with significantly enriched functional terms among the protein-coding genes that are significantly co-expressed with the lncRNA(s). The functional terms include all nodes in the Gene Ontology (GO) and 4,380 human biological pathways collected from 12 pathway databases. We successfully mapped 9,625 human lncRNA genes to GO terms and biological pathways, and then developed the first ontology-driven user-friendly web interface named lncRNA2Function, which enables researchers to browse the lncRNAs associated with a specific functional term, the functional terms associated with a specific lncRNA, or to assign functional terms to a set of human lncRNA genes, such as a cluster of co-expressed lncRNAs. The lncRNA2Function is freely available at http://mlg.hit.edu.cn/lncrna2function . The LncRNA2Function is an important resource for further investigating the functions of a single human lncRNA, or functionally annotating a set of human lncRNAs of interest.

...read moreread less

138 citations

Journal Article•DOI•

Long-read-based human genomic structural variation detection with cuteSV

[...]

Tao Jiang¹, Yongzhuang Liu¹, Yue Jiang, Junyi Li¹, Yan Gao¹, Zhe Cui¹, Yadong Liu¹, Bo Liu¹, Yadong Wang¹ - Show less +5 more•Institutions (1)

Harbin Institute of Technology¹

03 Aug 2020-Genome Biology

TL;DR: CuteSV, a sensitive, fast, and scalable long-read-based SV detection approach that uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection.

...read moreread less

Abstract: Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV .

...read moreread less

114 citations

Journal Article•DOI•

deBGA: read alignment with de Bruijn graph-based seed and extension.

[...]

Bo Liu¹, Hongzhe Guo¹, Michael Brudno², Yadong Wang¹•Institutions (2)

Harbin Institute of Technology¹, University of Toronto²

01 Nov 2016-Bioinformatics

TL;DR: De Bruijn Graph-based Aligner (deBGA) is proposed, an innovative graph-based seed-and-extension algorithm to align HTS reads to a reference genome that is organized and indexed using a de Bruijn graph that makes it particularly well-suited to handle the rapidly growing volumes of sequencing data.

...read moreread less

Abstract: Motivation: As high-throughput sequencing (HTS) technology becomes ubiquitous and the volume of data continues to rise, HTS read alignment is becoming increasingly rate-limiting, which keeps pressing the development of novel read alignment approaches. Moreover, promising novel applications of HTS technology require aligning reads to multiple genomes instead of a single reference; however, it is still not viable for the state-of-the-art aligners to align large numbers of reads to multiple genomes. Results: We propose de Bruijn Graph-based Aligner (deBGA), an innovative graph-based seed-and-extension algorithm to align HTS reads to a reference genome that is organized and indexed using a de Bruijn graph. With its well-handling of repeats, deBGA is substantially faster than state-of-the-art approaches while maintaining similar or higher sensitivity and accuracy. This makes it particularly well-suited to handle the rapidly growing volumes of sequencing data. Furthermore, it provides a promising solution for aligning reads to multiple genomes and graph-based references in HTS applications. Availability and Implementation: deBGA is available at: https://github.com/hitbc/deBGA . Contact: ydwang@hit.edu.cn Supplementary information : Supplementary data are available at Bioinformatics online.

...read moreread less

78 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Journal Article•DOI•

Minimap2: pairwise alignment for nucleotide sequences

[...]

Heng Li¹•Institutions (1)

Broad Institute¹

15 Sep 2018-Bioinformatics

TL;DR: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.

...read moreread less

Abstract: Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

6,264 citations

Journal Article•

Patterns of Somatic Mutation in Human Cancer Genomes

[...]

Michael R. Stratton¹•Institutions (1)

Wellcome Trust Sanger Institute¹

15 Nov 2007-Clinical Cancer Research

TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.

...read moreread less

Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

...read moreread less

2,737 citations

Integrative Genomics Viewer

[...]

James T. Robinson¹, Helga Thorvaldsdottir¹, Wendy Winckler¹, Mitchell Guttman¹, Eric S. Lander¹, Eric S. Lander², Gad Getz¹, Jill P. Mesirov¹ - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

01 Jan 2011

TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.

...read moreread less

Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

...read moreread less

2,187 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse