scispace - formally typeset
Search or ask a question

Showing papers by "Wing-Kin Sung published in 2015"


Journal ArticleDOI
TL;DR: Using DNA paired-end-tag (DNA-PET) whole-genome sequencing, 15 gastric cancers from Southeast Asians were analyzed and recurrent fusions between CLDN18, a tight junction gene, and ARHGAP26, a gene encoding a RHOA inhibitor were found.

114 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used whole-genome sequencing of osteosarcoma (OS) to find features of TP53 intron 1 rearrangements suggesting a unique mechanism correlated with transcription.
Abstract: Somatic mutations of TP53 are among the most common in cancer and germline mutations of TP53 (usually missense) can cause Li-Fraumeni syndrome (LFS). Recently, recurrent genomic rearrangements in intron 1 of TP53 have been described in osteosarcoma (OS), a highly malignant neoplasm of bone belonging to the spectrum of LFS tumors. Using whole-genome sequencing of OS, we found features of TP53 intron 1 rearrangements suggesting a unique mechanism correlated with transcription. Screening of 288 OS and 1,090 tumors of other types revealed evidence for TP53 rearrangements in 46 (16%) OS, while none were detected in other tumor types, indicating this rearrangement to be highly specific to OS. We revisited a four-generation LFS family where no TP53 mutation had been identified and found a 445 kb inversion spanning from the TP53 intron 1 towards the centromere. The inversion segregated with tumors in the LFS family. Cancers in this family had loss of heterozygosity, retaining the rearranged allele and resulting in TP53 expression loss. In conclusion, intron 1 rearrangements cause p53-driven malignancies by both germline and somatic mechanisms and provide an important mechanism of TP53 inactivation in LFS, which might in part explain the diagnostic gap of formerly classified "TP53 wild-type" LFS.

55 citations


Journal ArticleDOI
TL;DR: A new technique for maintaining a dynamic trie T of size at most 2w nodes under the unit-cost RAM model with a fixed word size w is proposed, based on the idea of partitioning T into a set of linked small tries, each of which can be maintained efficiently.
Abstract: The dynamic trie is a fundamental data structure with applications in many areas of computer science This paper proposes a new technique for maintaining a dynamic trie T of size at most 2 w nodes under the unit-cost RAM model with a fixed word size w It is based on the idea of partitioning T into a set of linked small tries, each of which can be maintained efficiently Our method is not only space-efficient, but also allows the longest common prefix between any query pattern P and the strings currently stored in T to be computed in o(|P|) time for small alphabets, and allows any leaf to be inserted into or deleted from T in o(log|T|) time To demonstrate the usefulness of our new data structure, we apply it to LZ-compression Significantly, we obtain the first algorithm for generating the lz78 encoding of a given string of length n over an alphabet of size ? in sublinear (o(n)) time and sublinear (o(nlog?) bits) working space for small alphabets ( $\sigma= 2^{o(\log n \frac{\log\log\log n}{(\log\log n)^{2}})}$ ) Moreover, the working space for our new algorithm is asymptotically less than or equal to the space for storing the output compressed text, regardless of the alphabet size

26 citations


Journal ArticleDOI
08 Sep 2015-PLOS ONE
TL;DR: This work improves the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks, and considers that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data.
Abstract: Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10Kb, ~17Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome “fragmentation”, reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data.

21 citations


Journal ArticleDOI
TL;DR: An O(mlogm)-time algorithm is given to solve the problem for a graph with m edges of Superbubble, a complex generalization of bubbles, for analyzing assembly graphs.
Abstract: In genome assembly graphs, motifs such as tips, bubbles, and cross links are studied in order to find sequencing errors and to understand the nature of the genome. Superbubble, a complex generalization of bubbles, was recently proposed as an important subgraph class for analyzing assembly graphs. At present, a quadratic time algorithm is known. This paper gives an O(mlogm)-time algorithm to solve this problem for a graph with m edges.

19 citations


Journal ArticleDOI
TL;DR: BatAlign is an algorithm that integrated two strategies called ‘Reverse-Alignment’ and ‘Deep-Scan’ to improve the accuracy of read-alignment and was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrants, concordantly/discordantly paired and SV-spanning data sets.
Abstract: Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called ‘Reverse-Alignment’ and ‘Deep-Scan’ to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.

11 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: In this paper, the authors presented a fast algorithm for finding the Adams consensus tree of a set of conflicting phylogenetic trees with identical leaf labels, for the first time improving the time complexity of a widely used algorithm invented by Adams in 1972.
Abstract: This paper presents a fast algorithm for finding the Adams consensus tree of a set of conflicting phylogenetic trees with identical leaf labels, for the first time improving the time complexity of a widely used algorithm invented by Adams in 1972 [1]. Our algorithm applies the centroid path decomposition technique [9] in a new way to traverse the input trees' centroid paths in unison, and runs in O(k n \log n) time, where k is the number of input trees and n is the size of the leaf label set. (In comparison, the old algorithm from 1972 has a worst-case running time of O(k n^2).) For the special case of k = 2, an even faster algorithm running in O(n \cdot \frac{\log n}{\log\log n}) time is provided, which relies on an extension of the wavelet tree-based technique by Bose et al. [6] for orthogonal range counting on a grid. Our extended wavelet tree data structure also supports truncated range maximum queries efficiently and may be of independent interest to algorithm designers.

2 citations


01 Jan 2015
TL;DR: In this paper, the authors proposed an OðmlogmÞ-time algorithm to solve the problem for a graph with m edges, where m is the number of vertices in the graph.
Abstract: In genome assembly graphs, motifs such as tips, bubbles, and cross links are studied in order to find sequencing errors and to understand the nature of the genome. Superbubble, a complex generalization of bubbles, was recently proposed as an important subgraph class for analyzing assembly graphs. At present, a quadratic time algorithm is known. This paper gives an OðmlogmÞ-time algorithm to solve this problem for a graph with m edges.