Home
/
Authors
/
Wing-Kin Sung

Author

Wing-Kin Sung

Other affiliations: University of Hong Kong, Yale University, Huazhong Agricultural University ...read more

Bio: Wing-Kin Sung is an academic researcher from National University of Singapore. The author has contributed to research in topics: Gene & Chromatin immunoprecipitation. The author has an hindex of 64, co-authored 327 publications receiving 26116 citations. Previous affiliations of Wing-Kin Sung include University of Hong Kong & Yale University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1994

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An AR-ERG transcriptional signature defined by long-range chromatin interactomes in prostate cancer cells.

[...]

Zhizhuo Zhang¹, Kern Rei Chng², Shreyas Lingadahalli³, Zikai Chen³, Zikai Chen², Mei Hui Liu², Huy Hoang Do², Shaojiang Cai², Nicola J. Rinaldi⁴, Huay Mei Poh², Guoliang Li², Guoliang Li⁵, Ying Ying Sung², Charlie L. Heng², Leighton J. Core⁶, Si Kee Tan², Xiaoan Ruan², John T. Lis⁶, Manolis Kellis⁷, Manolis Kellis⁴, Yijun Ruan², Wing-Kin Sung², Wing-Kin Sung¹, Edwin Cheung², Edwin Cheung³ - Show less +21 more•Institutions (7)

National University of Singapore¹, Genome Institute of Singapore², University of Macau³, Massachusetts Institute of Technology⁴, Huazhong Agricultural University⁵, Cornell University⁶, Broad Institute⁷

03 Jan 2019-Genome Research

TL;DR: A novel framework to profile long-range chromatin interactions associated with AR and its collaborative transcription factor, erythroblast transformation-specific related gene (ERG), using chromatin interaction analysis by paired-end tag (ChIA-PET).

...read moreread less

Abstract: The aberrant activities of transcription factors such as the androgen receptor (AR) underpin prostate cancer development. While the AR cis-regulation has been extensively studied in prostate cancer, information pertaining to the spatial architecture of the AR transcriptional circuitry remains limited. In this paper, we propose a novel framework to profile long-range chromatin interactions associated with AR and its collaborative transcription factor, erythroblast transformation-specific related gene (ERG), using chromatin interaction analysis by paired-end tag (ChIA-PET). We identified ERG-associated long-range chromatin interactions as a cooperative component in the AR-associated chromatin interactome, acting in concert to achieve coordinated regulation of a subset of AR target genes. Through multifaceted functional data analysis, we found that AR-ERG interaction hub regions are characterized by distinct functional signatures, including bidirectional transcription and cotranscription factor binding. In addition, cancer-associated long noncoding RNAs were found to be connected near protein-coding genes through AR-ERG looping. Finally, we found strong enrichment of prostate cancer genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) at AR-ERG co-binding sites participating in chromatin interactions and gene regulation, suggesting GWAS target genes identified from chromatin looping data provide more biologically relevant findings than using the nearest gene approach. Taken together, our results revealed an AR-ERG-centric higher-order chromatin structure that drives coordinated gene expression in prostate cancer progression and the identification of potential target genes for therapeutic intervention.

...read moreread less

42 citations

Book Chapter•DOI•

Approximate String Matching Using Compressed Suffix Arrays

[...]

Trinh N. D. Huynh¹, Wing-Kai Hon², Tak-Wah Lam², Wing-Kin Sung¹•Institutions (2)

National University of Singapore¹, University of Hong Kong²

05 Jul 2004

TL;DR: This paper gives a solution using O(n) bits indexing data structure with O(mlog2 n) query time to the k-difference problem with k≥1, the first result which requires linear indexing space.

...read moreread less

Abstract: Let T be a text of length n and P be a pattern of length m, both strings over a fixed finite alphabet A. The k-difference (k-mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P. In this paper we investigate a well-studied case in which k=1 and T is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster [16-19]. This paper gives a solution using O(n) bits indexing data structure with O(mlog2 n) query time. To the best of our knowledge, this is the first result which requires linear indexing space. The results can be extended for the k-difference problem with k≥1.

...read moreread less

42 citations

Proceedings Article•DOI•

Protein structure and fold prediction using tree-augmented naive Bayesian classifier.

[...]

Arunkumar Chinnasamy¹, Wing-Kin Sung¹, Ankush Mittal²•Institutions (2)

National University of Singapore¹, Indian Institute of Technology Roorkee²

01 Dec 2003

TL;DR: This paper presents a framework using the Tree-Augmented Networks (TAN) based on the theory of learning Bayesian networks but with less restrictive assumptions than the naiveBayesian networks to enhance TAN's performance.

...read moreread less

Abstract: For determining the structure class and fold class of Protein Structure, computer-based techniques have became essential considering the large volume of the data. Several techniques based on sequence similarity. Neural Networks, SVMs, etc have been applied. This paper presents a framework using the Tree-Augmented Networks (TAN) based on the theory of learning Bayesian networks but with less restrictive assumptions than the naive Bayesian networks. In order to enhance TAN's performance, pre-processing of data is done by feature discretization and post-processing is done by using Mean Probability Voting (MPV) scheme. The advantage of using Bayesian approach over other learning methods is that the network structure is intuitive. In addition, one can read off the TAN structure probabilities to determine the significance of each feature (say, Hydrophobicity) for each class, which help to further understand the mystery of protein structure. Experimental results and comparison with other works over two databases show the effectiveness of our TAN based framework. The idea is implemented as the BAYESPROT web server and it is available at http://www-appn.comp.nus.edu.sg/-bioinfo/bayesprot/Default.htm.

...read moreread less

42 citations

Journal Article•DOI•

Inferring phylogenetic relationships avoiding forbidden rooted triplets.

[...]

Ying Jun He¹, Trinh N. D. Huynh¹, Jesper Jansson¹, Wing-Kin Sung¹•Institutions (1)

National University of Singapore¹

01 Feb 2006-Journal of Bioinformatics and Computational Biology

TL;DR: This paper considers the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set C and none of the Root Triplets in another given set F and provides some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.

...read moreread less

Abstract: To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set and none of the rooted triplets in another given set . Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.

...read moreread less

41 citations

Proceedings Article•DOI•

Using indirect protein-protein interactions for protein complex predication.

[...]

Hon Nian Chua¹, Kang Ning, Wing-Kin Sung, Hon Wai Leong, Limsoon Wong - Show less +1 more•Institutions (1)

National University of Singapore¹

01 Jan 2007

TL;DR: The use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms, and the complex finding algorithm performs very well on interaction networks modified in this way.

...read moreread less

Abstract: Protein complexes are fundamental for understanding principles of cellular organizations. Accurate and fast protein complex prediction from the PPI networks of increasing sizes can serve as a guide for biological experiments to discover novel protein complexes. However, protein complex prediction from PPI networks is a hard problem, especially in situations where the PPI network is noisy. We know from previous work that proteins that do not interact, but share interaction partners (level-2 neighbors) often share biological functions. The strength of functional association can be estimated using a topological weight, FS-Weight. Here we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. All direct and indirect interactions are first weighted using topological weight (FS-Weight). Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied on this modified network. We also propose a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. In this paper, we show that 1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; 2) our complex finding algorithm performs very well on interaction networks modified in this way. Since no any other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.

...read moreread less

40 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
…
17
18
19
20
21
22
23
…
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fast and accurate short read alignment with Burrows–Wheeler transform

[...]

Heng Li¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jul 2009-Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

...read moreread less

43,862 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

Model-based Analysis of ChIP-Seq (MACS)

[...]

Yong Zhang¹, Tao Liu¹, Clifford A. Meyer¹, Jérôme Eeckhoute², David S. Johnson, Bradley E. Bernstein¹, Bradley E. Bernstein³, Chad Nusbaum³, Richard M. Myers⁴, Myles Brown², Wei Li⁵, X. Shirley Liu¹ - Show less +8 more•Institutions (5)

Harvard University¹, Brigham and Women's Hospital², Broad Institute³, Stanford University⁴, Baylor College of Medicine⁵

17 Sep 2008-Genome Biology

TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.

...read moreread less

Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

...read moreread less

13,008 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse