scispace - formally typeset
Search or ask a question
Author

Wing-Kin Sung

Bio: Wing-Kin Sung is an academic researcher from National University of Singapore. The author has contributed to research in topics: Gene & Chromatin immunoprecipitation. The author has an hindex of 64, co-authored 327 publications receiving 26116 citations. Previous affiliations of Wing-Kin Sung include University of Hong Kong & Yale University.


Papers
More filters
Journal ArticleDOI
TL;DR: A supervised learning approach is employed that learns and models the expression patterns under different conditions and controls from a training collection of known BCL6 targets and randomly chosen decoys and is able to identify BCL 6 targets with high accuracy, making us joint best performers of the challenge.
Abstract: In the Dialogue for Reverse Engineering Assessments and Methods Conference (DREAM2) BCL6 target identification challenge, we were given a list of 200 genes and tasked to identify which ones are the true targets of BCL6 using an independent panel of gene-expression data. Initial efforts using conventional motif-scanning approaches to find BCL6 binding sites in the promoters of the 200 genes as a means of identifying BCL6 true targets proved unsuccessful. Instead, we performed a large-scale comparative study of multiple expression data under different conditions. Specifically, we employed a supervised learning approach that learns and models the expression patterns under different conditions and controls from a training collection of known BCL6 targets and randomly chosen decoys. Genes in the given list whose expression matches well with that of the training set of known BCL6 targets are more likely to be BCL6 targets. Using this approach, we are able to identify BCL6 targets with high accuracy, making us joint best performers of the challenge.

1 citations

Journal ArticleDOI
TL;DR: In this article, the authors evaluate sequence data from the PathChip high-density hybridization array for epidemiological interpretation of detected pathogens and derive similar relative outbreak clustering in phylogenetic trees from PathChip-derived compared to classical Sanger-derived sequences.
Journal ArticleDOI
TL;DR: The "International Conference on Genome Informatics", popularly known as "GIW", is probably one of the oldest, if not actually the oldest annual, regular conference in computational biology that survived all turns of the tempestuous development of this field of research.
Abstract: The "International Conference on Genome Informatics", popularly known as "GIW", is probably one of the oldest, if not actually the oldest annual, regular conference in computational biology that survived all turns of the tempestuous development of this field of research [1]. It is impossible to overestimate its role for establishing and enhancing the computational biology and bioinformatics research community in the Asia-Pacific region and its interaction with the world-wide research effort. Importantly, it has provided a friendly forum where scientists especially from the region could exchange and publish their research findings. It has accompanied and furthered the growth of computational biology and bioinformatics research in both quantity and quality in the Asia-Pacific region. The GIW was first held as an open workshop ("Genome Informatics Workshop", thus, GIW) at Kikai Shinko Kaikan in Tokyo during December 3-4, 1990, essentially just before the Japanese Human Genome Project started in the next year. Whereas GIW was originally an intra-Japanese affair, it changed to an international conference in 1993 and the currently used name of the conference was adopted in 2001. During the last ~15 years, the conference was always attended by several hundred participants; thus, it is not really a "workshop" any longer. Whereas GIW had more the role of a regional incubator in the early years, it has recently become one of the important, truly international conference venues in the bioinformatics field for scientific exchange. It provides unique opportunities to bridge theory and experiment, academia and industry, science from the East and the West. The conference site was in Tokyo or Yokohama exclusively until 2006 (as well as 2009). GIW 2007 (the 18th edition) was the first one held outside Japan, in the Biopolis in Singapore. Other locations in the Asia-Pacific regions were to follow: the 19th GIW at the Gold Coast in Australia (2008), the 21st GIW in Hangzhou (China) in 2010, the 22nd GIW in Busan (South Korea) in 2011 and the 23rd GIW in Tainan (Taiwan). Remarkably, the 24th GIW has been awarded to Singapore again [2] and, notably, is held in the same premises as the conference in 2007, namely in the Matrix Building of Biopolis. Singaporean bioinformaticians might tend to see this as recognition for their research efforts during the last years; though, the geographically central location in the Asia-Pacific region, the excellent transport hub and the infrastructural support of Singapore will lend an alternative, equally important explanation. All events happen only if an activist champions them. The Singaporean community is grateful to Limsoon Wong for his lobbying effort to attract important conferences here. Given the maturity of the research area and today's scientific fashions, efforts that classify as system biology occupy a prominent place in GIW 2013. In total, eighteen submissions have qualified for this special issue of BMC Systems Biology. The systems biology approach aims at a holistic perspective, to explain and to predict phenotypic properties that are influenced by a multitude of factors with complex theoretical, desirably quantitative models. Given the absence of a consistent, predictive biological theory as physicists have been used to since many decades, some might consider the quest for an integrated, system approach grandiloquent and premature. There are serious arguments for this view such as that about 50% of all eukaryote genes lack even tentative functional characterizations and, most likely, not even half of the biomolecular mechanisms are known [3]. Despite full genome sequencing, even a stable reference proteome cannot be deduced [4]. Thus, quantitative and predictive biology has a long way to go. Nevertheless, the large-scale experimental techniques, most prominently nucleic acid sequencing but also epigenetics analyses, large-scale expression studies, proteomics with the large sets of protein-protein interaction data, the ever growing library of biomacromolecular structures and automated methods for analyzing cellular and tissue images [5] open new opportunities and, for carefully selected questions, interesting and important insights can be deduced from this data at the systems level that can even reach out into biomedical and biotechnological applications. The papers collected in this special edition exemplify how far research has moved forward.

Cited by
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.
Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

13,008 citations