scispace - formally typeset
Search or ask a question
Author

Yadong Wang

Bio: Yadong Wang is an academic researcher from Harbin Institute of Technology. The author has contributed to research in topics: De Bruijn graph & Population. The author has an hindex of 19, co-authored 142 publications receiving 1539 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Computational approaches are reviewed and highlighted their characteristics to provide references for researchers to develop more powerful approaches and to summarized 76 important resources about drug repositioning.
Abstract: Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.

407 citations

Journal ArticleDOI
TL;DR: This evaluation provides a comprehensive and objective comparison of several well‐known detection tools designed for WES data, which will assist researchers in choosing the most suitable tools for their research needs.
Abstract: Copy number variation (CNV) has been found to play an important role in human disease. Next-generation sequencing technology, including whole-genome sequencing (WGS) and whole-exome sequencing (WES), has become a primary strategy for studying the genetic basis of human disease. Several CNV calling tools have recently been developed on the basis of WES data. However, the comparative performance of these tools using real data remains unclear. An objective evaluation study of these tools in practical research situations would be beneficial. Here, we evaluated four well-known WES-based CNV detection tools (XHMM, CoNIFER, ExomeDepth, and CONTRA) using real data generated in house. After evaluation using six metrics, we found that the sensitive and accurate detection of CNVs in WES data remains challenging despite the many algorithms available. Each algorithm has its own strengths and weaknesses. None of the exome-based CNV calling methods performed well in all situations; in particular, compared with CNVs identified from high coverage WGS data from the same samples, all tools suffered from limited power. Our evaluation provides a comprehensive and objective comparison of several well-known detection tools designed for WES data, which will assist researchers in choosing the most suitable tools for their research needs.

213 citations

Journal ArticleDOI
TL;DR: The hypergeometric test is used to functionally annotate a single lnc RNA or a set of lncRNAs with significantly enriched functional terms among the protein-coding genes that are significantly co-expressed with the lncRNA(s).
Abstract: The GENCODE project has collected over 10,000 human long non-coding RNA (lncRNA) genes. However, the vast majority of them remain to be functionally characterized. Computational investigation of potential functions of human lncRNA genes is helpful to guide further experimental studies on lncRNAs. In this study, based on expression correlation between lncRNAs and protein-coding genes across 19 human normal tissues, we used the hypergeometric test to functionally annotate a single lncRNA or a set of lncRNAs with significantly enriched functional terms among the protein-coding genes that are significantly co-expressed with the lncRNA(s). The functional terms include all nodes in the Gene Ontology (GO) and 4,380 human biological pathways collected from 12 pathway databases. We successfully mapped 9,625 human lncRNA genes to GO terms and biological pathways, and then developed the first ontology-driven user-friendly web interface named lncRNA2Function, which enables researchers to browse the lncRNAs associated with a specific functional term, the functional terms associated with a specific lncRNA, or to assign functional terms to a set of human lncRNA genes, such as a cluster of co-expressed lncRNAs. The lncRNA2Function is freely available at http://mlg.hit.edu.cn/lncrna2function . The LncRNA2Function is an important resource for further investigating the functions of a single human lncRNA, or functionally annotating a set of human lncRNAs of interest.

138 citations

Journal ArticleDOI
TL;DR: CuteSV, a sensitive, fast, and scalable long-read-based SV detection approach that uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection.
Abstract: Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV .

114 citations

Journal ArticleDOI
TL;DR: De Bruijn Graph-based Aligner (deBGA) is proposed, an innovative graph-based seed-and-extension algorithm to align HTS reads to a reference genome that is organized and indexed using a de Bruijn graph that makes it particularly well-suited to handle the rapidly growing volumes of sequencing data.
Abstract: Motivation: As high-throughput sequencing (HTS) technology becomes ubiquitous and the volume of data continues to rise, HTS read alignment is becoming increasingly rate-limiting, which keeps pressing the development of novel read alignment approaches. Moreover, promising novel applications of HTS technology require aligning reads to multiple genomes instead of a single reference; however, it is still not viable for the state-of-the-art aligners to align large numbers of reads to multiple genomes. Results: We propose de Bruijn Graph-based Aligner (deBGA), an innovative graph-based seed-and-extension algorithm to align HTS reads to a reference genome that is organized and indexed using a de Bruijn graph. With its well-handling of repeats, deBGA is substantially faster than state-of-the-art approaches while maintaining similar or higher sensitivity and accuracy. This makes it particularly well-suited to handle the rapidly growing volumes of sequencing data. Furthermore, it provides a promising solution for aligning reads to multiple genomes and graph-based references in HTS applications. Availability and Implementation: deBGA is available at: https://github.com/hitbc/deBGA . Contact: ydwang@hit.edu.cn Supplementary information : Supplementary data are available at Bioinformatics online.

78 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
Heng Li1
TL;DR: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.
Abstract: Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation https://github.com/lh3/minimap2. Supplementary information Supplementary data are available at Bioinformatics online.

6,264 citations

Journal Article
TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.
Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

2,737 citations

01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

2,187 citations