scispace - formally typeset
Search or ask a question

Showing papers by "Wing-Kin Sung published in 2013"


Journal ArticleDOI
TL;DR: Findings from a whole-genome sequencing study of 88 matched HCC tumor/normal pairs, 81 of which are Hepatitis B virus (HBV) positive, find beta-catenin to be the mostrequently mutated oncogene and TP53 the most frequently mutated tumor suppressor.
Abstract: Hepatocellular carcinoma (HCC) is one of the most deadly cancers worldwide and has no effective treatment, yet the molecular basis of hepatocarcinogenesis remains largely unknown. Here we report findings from a whole-genome sequencing (WGS) study of 88 matched HCC tumor/normal pairs, 81 of which are Hepatitis B virus (HBV) positive, seeking to identify genetically altered genes and pathways implicated in HBV-associated HCC. We find beta-catenin to be the most frequently mutated oncogene (15.9%) and TP53 the most frequently mutated tumor suppressor (35.2%). The Wnt/beta-catenin and JAK/STAT pathways, altered in 62.5% and 45.5% of cases, respectively, are likely to act as two major oncogenic drivers in HCC. This study also identifies several prevalent and potentially actionable mutations, including activating mutations of Janus kinase 1 (JAK1), in 9.1% of patients and provides a path toward therapeutic intervention of the disease.

457 citations


Journal ArticleDOI
12 Dec 2013-Nature
TL;DR: This study explores the transcriptional interactomes of three mouse cells of progressive lineage commitment and sets the stage for the full-scale dissection of spatial and temporal genome structures and their roles in orchestrating development.
Abstract: A chromatin interaction analysis with paired-end tagging (ChIA-PET) approach is used to delineate chromatin interactions mediated by RNA polymerase II in several different stem-cell populations; putative long-range promoter–enhancer interactions are inferred, indicating that linear juxtaposition does not necessarily guide enhancer target selection and prevalent cell-specific enhancer usage. Gene transcription requires dynamic chromatin connectivity between promoters bound by RNA polymerase II and their corresponding distal-acting enhancers. In this paper the authors use the ChIA-PET (chromatin interaction analysis with paired-end tagging) approach to delineate chromatin interactions mediated by RNA polymerase II in embryonic stem cells, neural stem cells and neurosphere progenitor cells. Putative enhancer–promoter interactions can be inferred, and many enhancers associate with promoters located beyond their nearest active genes, indicating that linear juxtaposition does not necessarily guide enhancer target selection. This work illustrates the possible importance of underlying chromatin structures in nuclear function. In multicellular organisms, transcription regulation is one of the central mechanisms modelling lineage differentiation and cell-fate determination1. Transcription requires dynamic chromatin configurations between promoters and their corresponding distal regulatory elements2. It is believed that their communication occurs within large discrete foci of aggregated RNA polymerases termed transcription factories in three-dimensional nuclear space3. However, the dynamic nature of chromatin connectivity has not been characterized at the genome-wide level. Here, through a chromatin interaction analysis with paired-end tagging approach3,4,5 using an antibody that primarily recognizes the pre-initiation complexes of RNA polymerase II6, we explore the transcriptional interactomes of three mouse cells of progressive lineage commitment, including pluripotent embryonic stem cells7, neural stem cells8 and neurosphere stem/progenitor cells9. Our global chromatin connectivity maps reveal approximately 40,000 long-range interactions, suggest precise enhancer–promoter associations and delineate cell-type-specific chromatin structures. Analysis of the complex regulatory repertoire shows that there are extensive colocalizations among promoters and distal-acting enhancers. Most of the enhancers associate with promoters located beyond their nearest active genes, indicating that the linear juxtaposition is not the only guiding principle driving enhancer target selection. Although promoter–enhancer interactions exhibit high cell-type specificity, promoters involved in interactions are found to be generally common and mostly active among different cells. Chromatin connectivity networks reveal that the pivotal genes of reprogramming functions are transcribed within physical proximity to each other in embryonic stem cells, linking chromatin architecture to coordinated gene expression. Our study sets the stage for the full-scale dissection of spatial and temporal genome structures and their roles in orchestrating development.

446 citations


Journal ArticleDOI
TL;DR: A “molecular signature” of AA-induced DNA damage is presented, which helps to explain the mutagenic effects of AA and may also be useful as a way to detect unsuspected AA exposure as a cause of cancer.
Abstract: Aristolochic acid (AA), a natural product of Aristolochia plants found in herbal remedies and health supplements, is a group 1 carcinogen that can cause nephrotoxicity and upper urinary tract urothelial cell carcinoma (UTUC). Whole-genome and exome analysis of nine AA-associated UTUCs revealed a strikingly high somatic mutation rate (150 mutations/Mb), exceeding smoking-associated lung cancer (8 mutations/Mb) and ultraviolet radiation-associated melanoma (111 mutations/Mb). The AA-UTUC mutational signature was characterized by A:T to T:A transversions at the sequence motif A[C|T]AGG, located primarily on nontranscribed strands. AA-induced mutations were also significantly enriched at splice sites, suggesting a role for splice-site mutations in UTUC pathogenesis. RNA sequencing of AA-UTUC confirmed a general up-regulation of nonsense-mediated decay machinery components and aberrant splicing events associated with splice-site mutations. We observed a high frequency of somatic mutations in chromatin modifiers, particularly KDM6A, in AA-UTUC, demonstrated the sufficiency of AA to induce renal dysplasia in mice, and reproduced the AA mutational signature in experimentally treated human renal tubular cells. Finally, exploring other malignancies that were not known to be associated with AA, we screened 93 hepatocellular carcinoma genomes/exomes and identified AA-like mutational signatures in 11. Our study highlights an unusual genome-wide AA mutational signature and the potential use of mutation signatures as "molecular fingerprints" for interrogating high-throughput cancer genome data to infer previous carcinogen exposures.

242 citations


Journal ArticleDOI
TL;DR: A deterministic method called ChromSDE is presented, which applies semi-definite programming techniques to find the best structure fitting the observed data and uses golden section search to finding the correct parameter for converting the contact frequency to spatial distance and proves that the parameter of conversion from contact frequencyto spatial distance will change under different resolutions theoretically and empirically.
Abstract: For a long period of time, scientists studied genomes while assuming they are linear. Recently, chromosome conformation capture (3C)-based technologies, such as Hi-C, have been developed that provide the loci contact frequencies among loci pairs in a genome-wide scale. The technology unveiled that two far-apart loci can interact in the tested genome. It indicated that the tested genome forms a three-dimensional (3D) chromosomal structure within the nucleus. With the available Hi-C data, our next challenge is to model the 3D chromosomal structure from the 3C-derived data computationally. This article presents a deterministic method called ChromSDE, which applies semi-definite programming techniques to find the best structure fitting the observed data and uses golden section search to find the correct parameter for converting the contact frequency to spatial distance. Further, we develop a measure called consensus index to indicate if the Hi-C data corresponds to a single structure or a mixture of ...

118 citations


Book ChapterDOI
07 Apr 2013
TL;DR: A deterministic method called ChromSDE is presented, which applies semi-definite programming techniques to find the best structure fitting the observed data and uses golden section search to finding the correct parameter for converting the contact frequency to spatial distance and is shown to be much more accurate and robust than existing methods.
Abstract: For a long period of time, scientists studied genomes assuming they are linear. Recently, chromosome conformation capture (3C) based technologies, such as Hi-C, have been developed that provide the loci contact frequencies among loci pairs in a genome-wide scale. The technology unveiled that two far-apart loci can interact in the tested genome. It indicated that the tested genome forms a 3D chromsomal structure within the nucleus. With the available Hi-C data, our next challenge is to model the 3D chromosomal structure from the 3C-dervied data computationally. This paper presents a deterministic method called ChromSDE, which applies semi-definite programming techniques to find the best structure fitting the observed data and uses golden section search to find the correct parameter for converting the contact frequency to spatial distance. To the best of our knowledge, ChromSDE is the only method which can guarantee recovering the correct structure in the noise-free case. In addition, we prove that the parameter of conversion from contact frequency to spatial distance will change under different resolutions theoretically and empirically. Using simulation data and real Hi-C data, we showed that ChromSDE is much more accurate and robust than existing methods. Finally, we demonstrated that interesting biological findings can be uncovered from our predicted 3D structure.

40 citations


Journal ArticleDOI
TL;DR: A pathogen chip system (PathChip), developed at the Genome Institute of Singapore (GIS), using a random-tagged PCR coupled to a chip with over 170,000 probes, has the potential to recognize all known human viral pathogens.
Abstract: Determining the viral etiology of respiratory tract infections (RTI) has been limited for the most part to specific primer PCR-based methods due to their increased sensitivity and specificity compared to other methods, such as tissue culture However, specific primer approaches have limited the ability to fully understand the diversity of infecting pathogens A pathogen chip system (PathChip), developed at the Genome Institute of Singapore (GIS), using a random-tagged PCR coupled to a chip with over 170,000 probes, has the potential to recognize all known human viral pathogens We tested 290 nasal wash specimens from Filipino children <2 years of age with respiratory tract infections using culture and 3 PCR methods-EraGen, Luminex, and the GIS PathChip The PathChip had good diagnostic accuracy, ranging from 859% (95% confidence interval [CI], 813 to 897%) for rhinovirus/enteroviruses to 986% (95% CI, 965 to 996%) for PIV 2, compared to the other methods and additionally identified a number of viruses not detected by these methods

25 citations


Proceedings ArticleDOI
06 Jan 2013
TL;DR: New deterministic algorithms for constructing consensus trees that are faster than all the previously known ones are presented.
Abstract: A consensus tree is a single phylogenetic tree that summarizes the branching structure in a given set of conflicting phylogenetic trees. Many different types of consensus trees have been proposed in the literature; three of the most well-known and widely used ones are the majority rule consensus tree, the loose consensus tree, and the greedy consensus tree. This paper presents new deterministic algorithms for constructing them that are faster than all the previously known ones. Given k phylogenetic trees with n leaves each and with identical leaf label sets, our algorithms run in O(nk log k) time (majority rule consensus tree), O(nk) time (loose consensus tree), and O(n2k) time (greedy consensus tree).

12 citations


Journal ArticleDOI
TL;DR: A de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximized algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user.
Abstract: Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipi...

12 citations


Book ChapterDOI
07 Apr 2013
TL;DR: A deterministic algorithm for building the majority rule consensus tree of an input collection of conflicting phylogenetic trees with identical leaf labels is presented and it is shown that the algorithm is fast in practice.
Abstract: A deterministic algorithm for building the majority rule consensus tree of an input collection of conflicting phylogenetic trees with identical leaf labels is presented. Its worst-case running time is O(nk), where n is the size of the leaf label set and k is the number of input phylogenetic trees. This is optimal since the input size is Ω(nk). Experimental results show that the algorithm is fast in practice.

9 citations


Journal ArticleDOI
TL;DR: A new $O(n^{2} \sqrt{\log n})$-time algorithm is described to solve the problem of computing the R* consensus tree of two given (rooted) phylogenetic trees with a leaf label set of cardinality n.
Abstract: The previously fastest algorithms for computing the R* consensus tree of two given (rooted) phylogenetic trees with a leaf label set of cardinality n run in Θ(n 3) time (Bryant and Berry in Adv. Appl. Math. 27(4):705–732, 2001; Kannan et al. in SIAM J. Comput. 27(6):1695–1724, 1998). In this manuscript, we describe a new $O(n^{2} \sqrt{\log n})$ -time algorithm to solve the problem. This is a significant improvement because the R* consensus tree is defined in terms of a set $\mathcal {R}_{\mathit{maj}}$ which may contain Ω(n 3) elements, so any direct approach that explicitly constructs $\mathcal {R}_{\mathit{maj}}$ requires Ω(n 3) time.

9 citations


Posted Content
TL;DR: In this paper, the authors presented two deterministic algorithms for constructing consensus trees, the first algorithm constructs the majority rule consensus tree in O(kn) time, which is optimal since the input size is Omega(kn).
Abstract: This paper presents two new deterministic algorithms for constructing consensus trees. Given an input of k phylogenetic trees with identical leaf label sets and n leaves each, the first algorithm constructs the majority rule (+) consensus tree in O(kn) time, which is optimal since the input size is Omega(kn), and the second one constructs the frequency difference consensus tree in min(O(kn^2), O(kn (k+log^2 n))) time.

Journal ArticleDOI
TL;DR: Using compressed DAWG proposed in this paper, the problem can be solved in O(nm) worst case time and the same average case time for the local alignment problem.
Abstract: Suffix tree, suffix array, and directed acyclic word graph (DAWG) are data-structures for indexing a text. Although they enable efficient pattern matching, their data-structures require O(nlogn) bits, which make them impractical to index long text like human genome. Recently, the development of compressed data-structures allow us to simulate suffix tree and suffix array using O(n) bits. However, there is still no O(n)-bit data-structure for DAWG with full functionality. This work introduces an $n(H_{k}(\overline{S})+ 2 H_{0}^{*}(\mathcal {T}_{\overline{S}}))+o(n)$ -bit compressed data-structure for simulating DAWG (where $H_{k}(\overline{S})$ and $H_{0}^{*}(\mathcal{T}_{\overline{S}})$ are the empirical entropies of the reversed sequence and the reversed suffix tree topology, respectively.) Besides, we also propose an application of DAWG to improve the time complexity for the local alignment problem. In this application, the previously proposed solutions using BWT (a version of compressed suffix array) run in O(n 2 m) worst case time and O(n 0.628 m) average case time where n and m are the lengths of the database and the query, respectively. Using compressed DAWG proposed in this paper, the problem can be solved in O(nm) worst case time and the same average case time.

Book ChapterDOI
TL;DR: Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.
Abstract: Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.


Book ChapterDOI
02 Sep 2013
TL;DR: Two new deterministic algorithms for constructing consensus trees with identical leaf label sets and n leaves each are presented, which are optimal since the input size is Ω(k n), and the second one constructs the frequency difference consensus tree in min {O (k n 2), O (k + log2 n) time.
Abstract: This paper presents two new deterministic algorithms for constructing consensus trees. Given an input of k phylogenetic trees with identical leaf label sets and n leaves each, the first algorithm constructs the majority rule (+) consensus tree in O(k n) time, which is optimal since the input size is Ω(k n), and the second one constructs the frequency difference consensus tree in min {O(k n 2), O(k n (k + log2 n))} time.

Journal ArticleDOI
TL;DR: The "International Conference on Genome Informatics", popularly known as "GIW", is probably one of the oldest, if not actually the oldest annual, regular conference in computational biology that survived all turns of the tempestuous development of this field of research.
Abstract: The "International Conference on Genome Informatics", popularly known as "GIW", is probably one of the oldest, if not actually the oldest annual, regular conference in computational biology that survived all turns of the tempestuous development of this field of research [1]. It is impossible to overestimate its role for establishing and enhancing the computational biology and bioinformatics research community in the Asia-Pacific region and its interaction with the world-wide research effort. Importantly, it has provided a friendly forum where scientists especially from the region could exchange and publish their research findings. It has accompanied and furthered the growth of computational biology and bioinformatics research in both quantity and quality in the Asia-Pacific region. The GIW was first held as an open workshop ("Genome Informatics Workshop", thus, GIW) at Kikai Shinko Kaikan in Tokyo during December 3-4, 1990, essentially just before the Japanese Human Genome Project started in the next year. Whereas GIW was originally an intra-Japanese affair, it changed to an international conference in 1993 and the currently used name of the conference was adopted in 2001. During the last ~15 years, the conference was always attended by several hundred participants; thus, it is not really a "workshop" any longer. Whereas GIW had more the role of a regional incubator in the early years, it has recently become one of the important, truly international conference venues in the bioinformatics field for scientific exchange. It provides unique opportunities to bridge theory and experiment, academia and industry, science from the East and the West. The conference site was in Tokyo or Yokohama exclusively until 2006 (as well as 2009). GIW 2007 (the 18th edition) was the first one held outside Japan, in the Biopolis in Singapore. Other locations in the Asia-Pacific regions were to follow: the 19th GIW at the Gold Coast in Australia (2008), the 21st GIW in Hangzhou (China) in 2010, the 22nd GIW in Busan (South Korea) in 2011 and the 23rd GIW in Tainan (Taiwan). Remarkably, the 24th GIW has been awarded to Singapore again [2] and, notably, is held in the same premises as the conference in 2007, namely in the Matrix Building of Biopolis. Singaporean bioinformaticians might tend to see this as recognition for their research efforts during the last years; though, the geographically central location in the Asia-Pacific region, the excellent transport hub and the infrastructural support of Singapore will lend an alternative, equally important explanation. All events happen only if an activist champions them. The Singaporean community is grateful to Limsoon Wong for his lobbying effort to attract important conferences here. Given the maturity of the research area and today's scientific fashions, efforts that classify as system biology occupy a prominent place in GIW 2013. In total, eighteen submissions have qualified for this special issue of BMC Systems Biology. The systems biology approach aims at a holistic perspective, to explain and to predict phenotypic properties that are influenced by a multitude of factors with complex theoretical, desirably quantitative models. Given the absence of a consistent, predictive biological theory as physicists have been used to since many decades, some might consider the quest for an integrated, system approach grandiloquent and premature. There are serious arguments for this view such as that about 50% of all eukaryote genes lack even tentative functional characterizations and, most likely, not even half of the biomolecular mechanisms are known [3]. Despite full genome sequencing, even a stable reference proteome cannot be deduced [4]. Thus, quantitative and predictive biology has a long way to go. Nevertheless, the large-scale experimental techniques, most prominently nucleic acid sequencing but also epigenetics analyses, large-scale expression studies, proteomics with the large sets of protein-protein interaction data, the ever growing library of biomacromolecular structures and automated methods for analyzing cellular and tissue images [5] open new opportunities and, for carefully selected questions, interesting and important insights can be deduced from this data at the systems level that can even reach out into biomedical and biotechnological applications. The papers collected in this special edition exemplify how far research has moved forward.