scispace - formally typeset
Search or ask a question

Showing papers by "Wing-Kin Sung published in 2019"


Journal ArticleDOI
TL;DR: This study shows that immune-related genes are primed for transcription by proximal lncRNAs, and the insertion of UMLILO into the chemokine topologically associating domain in mouse macrophages resulted in training of Cxcl genes, providing strong evidence that lncRNA-mediated regulation is central to the establishment of trained immunity.
Abstract: Accumulation of trimethylation of histone H3 at lysine 4 (H3K4me3) on immune-related gene promoters underlies robust transcription during trained immunity. However, the molecular basis for this remains unknown. Here we show three-dimensional chromatin topology enables immune genes to engage in chromosomal contacts with a subset of long noncoding RNAs (lncRNAs) we have defined as immune gene–priming lncRNAs (IPLs). We show that the prototypical IPL, UMLILO, acts in cis to direct the WD repeat-containing protein 5 (WDR5)–mixed lineage leukemia protein 1 (MLL1) complex across the chemokine promoters, facilitating their H3K4me3 epigenetic priming. This mechanism is shared amongst several trained immune genes. Training mediated by β-glucan epigenetically reprograms immune genes by upregulating IPLs in manner dependent on nuclear factor of activated T cells. The murine chemokine topologically associating domain lacks an IPL, and the Cxcl genes are not trained. Strikingly, the insertion of UMLILO into the chemokine topologically associating domain in mouse macrophages resulted in training of Cxcl genes. This provides strong evidence that lncRNA-mediated regulation is central to the establishment of trained immunity. This study shows that immune-related genes are primed for transcription by proximal lncRNAs. One such lncRNA, UMLILO, directs the WDR5–MLL1 complex to CXCL chemokine promoters, facilitating H3K4me3 deposition.

167 citations


Journal ArticleDOI
TL;DR: The results reveal hierarchical and modular 3D genome architecture for transcriptional regulation in rice and reveal spatial correlation between the genetic regulation of eQTLs and e-traits.
Abstract: Insight into high-resolution three-dimensional genome organization and its effect on transcription remains largely elusive in plants. Here, using a long-read ChIA-PET approach, we map H3K4me3- and RNA polymerase II (RNAPII)-associated promoter–promoter interactions and H3K9me2-marked heterochromatin interactions at nucleotide/gene resolution in rice. The chromatin architecture is separated into different independent spatial interacting modules with distinct transcriptional potential and covers approximately 82% of the genome. Compared to inactive modules, active modules possess the majority of active loop genes with higher density and contribute to most of the transcriptional activity in rice. In addition, promoter–promoter interacting genes tend to be transcribed cooperatively. In contrast, the heterochromatin-mediated loops form relative stable structure domains in chromatin configuration. Furthermore, we examine the impact of genetic variation on chromatin interactions and transcription and identify a spatial correlation between the genetic regulation of eQTLs and e-traits. Thus, our results reveal hierarchical and modular 3D genome architecture for transcriptional regulation in rice. Three-dimensional genome organization and its effect on transcription remain elusive in rice. Here, the authors map promoter–promoter interactions and heterochromatin interactions using ChIA-PET and reveal spatial correlation between the genetic regulation of eQTLs and e-traits.

66 citations


Posted ContentDOI
20 Dec 2019-bioRxiv
TL;DR: Hypo–a Hybrid Polisher–that utilises short as well as long reads within a single run to polish a long read assembly of small and large genomes, and exploits unique genomic kmers to selectively polish segments of contigs using partial order alignment of selective read-segments.
Abstract: Efforts towards making population-scale long read genome assemblies (especially human genomes) viable have intensified recently with the emergence of many fast assemblers. The reliance of these fast assemblers on polishing for the accuracy of assemblies makes it crucial. We present HyPo--a Hybrid Polisher--that utilises short as well as long reads within a single run to polish a long read assembly of small and large genomes. It exploits unique genomic kmers to selectively polish segments of contigs using partial order alignment of selective read-segments. As demonstrated on human genome assemblies, Hypo generates significantly more accurate polished assemblies in about one-third time with about half the memory requirements in comparison to Racon (the widely used polisher currently).

62 citations


Journal ArticleDOI
TL;DR: A novel framework to profile long-range chromatin interactions associated with AR and its collaborative transcription factor, erythroblast transformation-specific related gene (ERG), using chromatin interaction analysis by paired-end tag (ChIA-PET).
Abstract: The aberrant activities of transcription factors such as the androgen receptor (AR) underpin prostate cancer development. While the AR cis-regulation has been extensively studied in prostate cancer, information pertaining to the spatial architecture of the AR transcriptional circuitry remains limited. In this paper, we propose a novel framework to profile long-range chromatin interactions associated with AR and its collaborative transcription factor, erythroblast transformation-specific related gene (ERG), using chromatin interaction analysis by paired-end tag (ChIA-PET). We identified ERG-associated long-range chromatin interactions as a cooperative component in the AR-associated chromatin interactome, acting in concert to achieve coordinated regulation of a subset of AR target genes. Through multifaceted functional data analysis, we found that AR-ERG interaction hub regions are characterized by distinct functional signatures, including bidirectional transcription and cotranscription factor binding. In addition, cancer-associated long noncoding RNAs were found to be connected near protein-coding genes through AR-ERG looping. Finally, we found strong enrichment of prostate cancer genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) at AR-ERG co-binding sites participating in chromatin interactions and gene regulation, suggesting GWAS target genes identified from chromatin looping data provide more biologically relevant findings than using the nearest gene approach. Taken together, our results revealed an AR-ERG-centric higher-order chromatin structure that drives coordinated gene expression in prostate cancer progression and the identification of potential target genes for therapeutic intervention.

42 citations


Journal ArticleDOI
TL;DR: The algorithm BatMeth2 is developed, which can align BS reads with high accuracy while allowing for variable-length indels with respect to the reference genome and improves DNA methylation calling, particularly for regions close to indels.
Abstract: DNA methylation plays crucial roles in most eukaryotic organisms. Bisulfite sequencing (BS-Seq) is a sequencing approach that provides quantitative cytosine methylation levels in genome-wide scope and single-base resolution. However, genomic variations such as insertions and deletions (indels) affect methylation calling, and the alignment of reads near/across indels becomes inaccurate in the presence of polymorphisms. Hence, the simultaneous detection of DNA methylation and indels is important for exploring the mechanisms of functional regulation in organisms. These problems motivated us to develop the algorithm BatMeth2, which can align BS reads with high accuracy while allowing for variable-length indels with respect to the reference genome. The results from simulated and real bisulfite DNA methylation data demonstrated that our proposed method increases alignment accuracy. Additionally, BatMeth2 can calculate the methylation levels of individual loci, genomic regions or functional regions such as genes/transposable elements. Additional programs were also developed to provide methylation data annotation, visualization, and differentially methylated cytosine/region (DMC/DMR) detection. The whole package provides new tools and will benefit bisulfite data analysis. BatMeth2 improves DNA methylation calling, particularly for regions close to indels. It is an autorun package and easy to use. In addition, a DNA methylation visualization program and a differential analysis program are provided in BatMeth2. We believe that BatMeth2 will facilitate the study of the mechanisms of DNA methylation in development and disease. BatMeth2 is an open source software program and is available on GitHub ( https://github.com/GuoliangLi-HZAU/BatMeth2 /).

37 citations


Journal ArticleDOI
TL;DR: Transcriptome sequencing of H CC patients reveals key cancer molecules and clinically relevant pathways deregulated/mutated in HCC patients and suggests that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts likely occur during the process of tumorigenesis.
Abstract: Hepatocellular carcinoma is the second most deadly cancer with late presentation and limited treatment options, highlighting an urgent need to better understand HCC to facilitate the identification of early-stage biomarkers and uncover therapeutic targets for the development of novel therapies for HCC. Deep transcriptome sequencing of tumor and paired non-tumor liver tissues was performed to comprehensively evaluate the profiles of both the host and HBV transcripts in HCC patients. Differential gene expression patterns and the dys-regulated genes associated with clinical outcomes were analyzed. Somatic mutations were identified from the sequencing data and the deleterious mutations were predicted. Lastly, human-HBV chimeric transcripts were identified, and their distribution, potential function and expression association were analyzed. Expression profiling identified the significantly upregulated TP73 as a nodal molecule modulating expression of apoptotic genes. Approximately 2.5% of dysregulated genes significantly correlated with HCC clinical characteristics. Of the 110 identified genes, those involved in post-translational modification, cell division and/or transcriptional regulation were upregulated, while those involved in redox reactions were downregulated in tumors of patients with poor prognosis. Mutation signature analysis identified that somatic mutations in HCC tumors were mainly non-synonymous, frequently affecting genes in the micro-environment and cancer pathways. Recurrent mutations occur mainly in ribosomal genes. The most frequently mutated genes were generally associated with a poorer clinical prognosis. Lastly, transcriptome sequencing suggest that HBV replication in the tumors of HCC patients is rare. HBV-human fusion transcripts are a common observation, with favored HBV and host insertion sites being the HBx C-terminus and gene introns (in tumors) and introns/intergenic-regions (in non-tumors), respectively. HBV-fused genes in tumors were mainly involved in RNA binding while those in non-tumors tissues varied widely. These observations suggest that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts may occur during tumorigenesis. Transcriptome sequencing of HCC patients reveals key cancer molecules and clinically relevant pathways deregulated/mutated in HCC patients and suggests that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts likely occur during the process of tumorigenesis.

21 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the Drosophila Integrator complex prevents dedifferentiation of intermediate neural progenitors (INPs) during neural stem cell (neuroblast) lineage development by regulating a key transcription factor Erm that also suppresses INP dedifferentiated.

14 citations


Book ChapterDOI
27 Feb 2019
TL;DR: The r-gathering problem when C and F are on a line is studied and a \(O(|C| + |F|)\)-time algorithm is presented to solve the problem.
Abstract: In this paper, we revisit the r-gathering problem. Given sets C and F of points on the plane and distance d(c, f) for each \(c \in C\) and \(f\in F\), an r-gathering of C to F is an assignment A of C to open facilities \(F' \subseteq F\) such that r or more members of C are assigned to each open facility. The cost of an r-gathering is \(\max _{c \in C}{d(c, A(c))}\). The r-gathering problem computes the r-gathering minimizing the cost. In this paper we study the r-gathering problem when C and F are on a line and present a \(O(|C| + |F|)\)-time algorithm to solve the problem. Our solution is optimal since any algorithm needs to read C and F at least once.

5 citations


Book ChapterDOI
27 Feb 2019
TL;DR: This study focuses on one of the most frequently used consensus tree problems, called greedy consensus tree problem, and describes an O(k^2 n) time solution, which is the fastest when k = O(\sqrt{n} \log n)\).
Abstract: Consensus tree is a phylogenetic tree that summarizes the branching information of a set of conflicting phylogenetic trees. Computing consensus tree is a major step in phylogenetic tree reconstruction. It also finds application in predicting a species tree from a set of gene trees. Here, we focus our study on one of the most frequently used consensus tree problem, called greedy consensus tree problem. Given k phylogenetic trees leaf-labeled by n taxa, previous best known algorithm for constructing a greedy consensus tree of these k trees runs in \(O(k n^{1.5} \log n)\) time. Here, we describe an \(O(k^2 n)\)-time solution. Our method is the fastest when \(k = O(\sqrt{n} \log n)\).

4 citations


Journal ArticleDOI
TL;DR: In the version of this article initially published, ‘+’ and ‘–’ labels were missing from the graph keys at the bottom of Fig. 8d.
Abstract: In the version of this article initially published, ‘+’ and ‘–’ labels were missing from the graph keys at the bottom of Fig. 8d. The error has been corrected in the HTML and PDF versions of the article.

3 citations


Book ChapterDOI
23 Jul 2019
TL;DR: In this article, the authors proposed a non-trivial algorithm for computing the rooted triplet distance between two phylogenetic trees of arbitrary levels, which has a running time of O(n 2 m + k √ n √ m √ d √ λ + λ √ 3 √ 2 n − λ − 3 n − 1.
Abstract: The rooted triplet distance measures the structural dissimilarity of two phylogenetic trees or networks by counting the number of rooted trees with exactly three leaf labels that occur as embedded subtrees in one, but not both of them. Suppose that \(N_1 = (V_1, E_1)\) and \(N_2 = (V_2, E_2)\) are rooted phylogenetic networks over a common leaf label set of size \(\lambda \), that \(N_i\) has level \(k_i\) and maximum in-degree \(d_i\) for \(i \in \{1,2\}\), and that the networks’ out-degrees are unbounded. Denote \(n = \max (|V_1|, |V_2|)\), \(m = \max (|E_1|, |E_2|)\), \(k = \max (k_1, k_2)\), and \(d = \max (d_1, d_2)\). Previous work has shown how to compute the rooted triplet distance between \(N_1\) and \(N_2\) in \(\mathrm {O}(\lambda \log \lambda )\) time in the special case \(k \le 1\). For \(k > 1\), no efficient algorithms are known; a trivial approach leads to a running time of \(\mathrm {\Omega }(n^{7} \lambda ^{3})\) and the only existing non-trivial algorithm imposes restrictions on the networks’ in- and out-degrees (in particular, it does not work when non-binary nodes are allowed). In this paper, we develop two new algorithms that have no such restrictions. Their running times are \(\mathrm {O}(n^{2} m + \lambda ^{3})\) and \(\mathrm {O}(m + k^{3} d^{3} \lambda + \lambda ^{3})\), respectively. We also provide implementations of our algorithms and evaluate their performance in practice. This is the first publicly available software for computing the rooted triplet distance between unrestricted networks of arbitrary levels.

01 Jan 2019
TL;DR: Although the authors' genome consists of a set of linear polymers of nucleotides, they form 3 dimensional structure through chromatin interaction, which helps to explain how mutations in inter-genic region affecting the expression of oncogenes and tumor-suppressor genes in diseases are explained.
Abstract: Although our genome consists of a set of linear polymers of nucleotides, they form 3 dimensional structure through chromatin interaction. Understanding the 3 dimensional structure of our genome is important since recent research showed that the 3 dimensional structure helps to explain tissue-specific expression profile and helps to explain how mutations in inter-genic region affecting the expression of oncogenes and tumor-suppressor genes in diseases.

Book ChapterDOI
TL;DR: The previously fastest algorithm for computing the rooted triplet distance between two input galled trees runs in \(O(n^{2.687})\) time, where n is the cardinality of the leaf label set.
Abstract: The previously fastest algorithm for computing the rooted triplet distance between two input galled trees (i.e., phylogenetic networks whose cycles are vertex-disjoint) runs in \(O(n^{2.687})\) time, where n is the cardinality of the leaf label set. Here, we present an \(O(n \log n)\)-time solution. Our strategy is to transform the input so that the answer can be obtained by applying an existing \(O(n \log n)\)-time algorithm for the simpler case of two phylogenetic trees a constant number of times.

Posted Content
TL;DR: It is mathematically prove that with a perfect ucc classifier, perfect clustering of individual instances inside the bags is possible even when no annotations on individual instances are given during training.
Abstract: A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of the models. We mathematically prove that with a perfect $ucc$ classifier, perfect clustering of individual instances inside the bags is possible even when no annotations on individual instances are given during training. We have constructed a neural network based $ucc$ classifier and experimentally shown that the clustering performance of our framework with our weakly supervised $ucc$ classifier is comparable to that of fully supervised learning models where labels for all instances are known. Furthermore, we have tested the applicability of our framework to a real world task of semantic segmentation of breast cancer metastases in histological lymph node sections and shown that the performance of our weakly supervised framework is comparable to the performance of a fully supervised Unet model.

Posted Content
TL;DR: It is mathematically prove that a perfect $ucc$ classifier, in principle, can be used to perfectly cluster individual instances inside the bags, and experimentally shown that the clustering performance of the framework with the classifier is comparable to that of fully supervised learning models.
Abstract: A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of the models. We mathematically prove that a perfect $ucc$ classifier, in principle, can be used to perfectly cluster individual instances inside the bags. In other words, perfect clustering of individual instances is possible even when no annotations on individual instances are given during training. We have constructed a neural network based $ucc$ classifier and experimentally shown that the clustering performance of our framework with our $ucc$ classifier is comparable to that of fully supervised learning models. We have also observed that our $ucc$ classifiers can potentially be used for zero-shot learning as they learn better semantic features than fully supervised models for `unseen classes', which have never been input into the models during training.

Journal ArticleDOI
TL;DR: An O ( log ⁡ n ) amortized per character algorithm to compute LCF on-line, where n is the length of the string and the Minimum Closed Factorization (MCF) problem, which identifies the minimum number of closed factors that cover X is introduced.