scispace - formally typeset
Search or ask a question

Showing papers by "Wing-Kin Sung published in 2021"


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an improved Xception network named as L2MXception which ensembles regularization term of L2-norm and mean to detect and identify peach diseases.
Abstract: Peach diseases can cause severe yield reduction and decreased quality for peach production. Rapid and accurate detection and identification of peach diseases is of great importance. Deep learning has been applied to detect peach diseases using imaging data. However, peach disease image data is difficult to collect and samples are imbalance. The popular deep networks perform poor for this issue. This paper proposed an improved Xception network named as L2MXception which ensembles regularization term of L2-norm and mean. With the peach disease image dataset collected, results on seven mainstream deep learning models were compared in details and an improved loss function was integrated with regularization term L2-norm and mean (L2M Loss). Experiments showed that the Xception model with L2M Loss outperformed the current best method for peach disease prediction. Compared to the original Xception model, the validation accuracy of L2MXception was up to 93.85%, increased by 28.48%. The proposed L2MXception network may have great potential in early identification of peach diseases.

15 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations, and used publicly available datasets to show that existing methods predict hundreds of thousands of false positives.
Abstract: A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts.

15 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper constructed the first well-resourced and comprehensive ASM database for 47 species and obtained the data on DNA methylation level, ASM and allele-specific expressed genes (ASEGs) and further analyzed the ASM/ASEG distribution patterns of these species.
Abstract: DNA methylation is known to be the most stable epigenetic modification and has been extensively studied in relation to cell differentiation, development, X chromosome inactivation and disease. Allele-specific DNA methylation (ASM) is a well-established mechanism for genomic imprinting and regulates imprinted gene expression. Previous studies have confirmed that certain special regions with ASM are susceptible and closely related to human carcinogenesis and plant development. In addition, recent studies have proven ASM to be an effective tumour marker. However, research on the functions of ASM in diseases and development is still extremely scarce. Here, we collected 4400 BS-Seq datasets and 1598 corresponding RNA-Seq datasets from 47 species, including human and mouse, to establish a comprehensive ASM database. We obtained the data on DNA methylation level, ASM and allele-specific expressed genes (ASEGs) and further analysed the ASM/ASEG distribution patterns of these species. In-depth ASM distribution analysis and differential methylation analysis conducted in nine cancer types showed results consistent with the reported changes in ASM in key tumour genes and revealed several potential ASM tumour-related genes. Finally, integrating these results, we constructed the first well-resourced and comprehensive ASM database for 47 species (ASMdb, www.dna-asmdb.com).

10 citations


Journal ArticleDOI
TL;DR: Rice (Oryza sativa) is one of the most important crops in the world and a common model plant for genomic research, and epigenomic information, including DNA methylation, histone modification, and chromatin accessibility, has been characterized in the Xian/Indica and Geng/Japonica genomes.

10 citations


Journal ArticleDOI
TL;DR: In this article, Drosophila Pr-Set7 is shown to be a regulator of NSC reactivation and the ability of neural stem cells to switch between quiescence and proliferation is crucial for brain development and homeostasis.
Abstract: The ability of neural stem cells (NSCs) to switch between quiescence and proliferation is crucial for brain development and homeostasis. Increasing evidence suggests that variants of histone lysine methyltransferases including KMT5A are associated with neurodevelopmental disorders. However, the function of KMT5A/Pr-set7/SETD8 in the central nervous system is not well established. Here, we show that Drosophila Pr-Set7 is a novel regulator of NSC reactivation. Loss of function of pr-set7 causes a delay in NSC reactivation and loss of H4K20 monomethylation in the brain. Through NSC-specific in vivo profiling, we demonstrate that Pr-set7 binds to the promoter region of cyclin-dependent kinase 1 (cdk1) and Wnt pathway transcriptional co-activator earthbound1/jerky (ebd1). Further validation indicates that Pr-set7 is required for the expression of cdk1 and ebd1 in the brain. Similar to Pr-set7, Cdk1 and Ebd1 promote NSC reactivation. Finally, overexpression of Cdk1 and Ebd1 significantly suppressed NSC reactivation defects observed in pr-set7-depleted brains. Therefore, Pr-set7 promotes NSC reactivation by regulating Wnt signaling and cell cycle progression. Our findings may contribute to the understanding of mammalian KMT5A/PR-SET7/SETD8 during brain development.

8 citations


Journal ArticleDOI
TL;DR: IndelEnsembler as discussed by the authors detected 34 093 deletions, 12 913 tandem duplications and 9773 insertions in 1047 Arabidopsis whole-genome sequencing data.
Abstract: Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.

8 citations


Journal ArticleDOI
05 Oct 2021-Oncogene
TL;DR: In this article, the authors used single-cell transcriptome analysis to reconstruct the transcriptional network of the Androgen receptor (AR) in prostate cancer and showed that AR directly regulates a set of signature genes in the ER-to-Golgi protein vesicle-mediated transport pathway.
Abstract: Androgen receptor (AR) plays a central role in driving prostate cancer (PCa) progression. How AR promotes this process is still not completely clear. Herein, we used single-cell transcriptome analysis to reconstruct the transcriptional network of AR in PCa. Our work shows AR directly regulates a set of signature genes in the ER-to-Golgi protein vesicle-mediated transport pathway. The expression of these genes is required for maximum androgen-dependent ER-to-Golgi trafficking, cell growth, and survival. Our analyses also reveal the signature genes are associated with PCa progression and prognosis. Moreover, we find inhibition of the ER-to-Golgi transport process with a small molecule enhanced antiandrogen-mediated tumor suppression of hormone-sensitive and insensitive PCa. Finally, we demonstrate AR collaborates with CREB3L2 in mediating ER-to-Golgi trafficking in PCa. In summary, our findings uncover a critical role for dysregulation of ER-to-Golgi trafficking expression and function in PCa progression, provide detailed mechanistic insights for how AR tightly controls this process, and highlight the prospect of targeting the ER-to-Golgi pathway as a therapeutic strategy for advanced PCa.

7 citations


Journal ArticleDOI
TL;DR: Zeng et al. as mentioned in this paper developed a method for accurate detection of virus integration into host genomes, which is a significant upgrade of HIVID and performs a paired-end combination (PE-combination) for potentially integrated reads.
Abstract: Motivation Virus integration in the host genome is frequently reported to be closely associated with many human diseases, and the detection of virus integration is a critically challenging task However, most existing tools show limited specificity and sensitivity Therefore, the objective of this study is to develop a method for accurate detection of virus integration into host genomes Results Herein, we report a novel method termed HIVID2 that is a significant upgrade of HIVID HIVID2 performs a paired-end combination (PE-combination) for potentially integrated reads The resulting sequences are then remapped onto the reference genomes, and both split and discordant chimeric reads are used to identify accurate integration breakpoints with high confidence HIVID2 represents a great improvement in specificity and sensitivity, and predicts breakpoints closer to the real integrations, compared with existing methods The advantage of our method was demonstrated using both simulated and real data sets HIVID2 uncovered novel integration breakpoints in well-known cervical cancer-related genes, including FHIT and LRP1B, which was verified using protein expression data In addition, HIVID2 allows the user to decide whether to automatically perform advanced analysis using the identified virus integrations By analyzing the simulated data and real data tests, we demonstrated that HIVID2 is not only more accurate than HIVID but also better than other existing programs with respect to both sensitivity and specificity We believe that HIVID2 will help in enhancing future research associated with virus integration Availability HIVID2 can be accessed at https://githubcom/zengxi-hada/HIVID2/ Contact Xi Zeng (zengxi@mailhzaueducn), Linghao Zhao (michael_yifan@126com) Supplementary information Supplementary data are available at Bioinformatics online

4 citations


Journal ArticleDOI
TL;DR: A novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data and outperforms existing methods on both simulated and real biological datasets.
Abstract: Motivation Structural variations (SV) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome.Since paired-end whole genome sequencing data has become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data. Results We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets. Availability SurVIndel is available at https://github.com/Mesh89/SurVIndel.

4 citations


Journal ArticleDOI
TL;DR: Based on whole-genome resequencing data of 78 ducks (Anas platyrhynchos) and 31 published wholegenome duck sequences, this article detected three geographic distinct genetic groups, including local Chinese, wild, and local Southeast/South Asian populations.
Abstract: The most prolific duck genetic resource in the world is located in Southeast/South Asia but little is known about the domestication and complex histories of these duck populations. Based on whole-genome resequencing data of 78 ducks (Anas platyrhynchos) and 31 published whole-genome duck sequences, we detected three geographic distinct genetic groups, including local Chinese, wild, and local Southeast/South Asian populations. We inferred the demographic history of these duck populations with different geographical distributions and found that the Chinese and Southeast/South Asian ducks shared similar demographic features. The Chinese domestic ducks experienced the strongest population bottleneck caused by domestication and the last glacial maximum (LGM) period, whereas the Chinese wild ducks experienced a relatively weak bottleneck caused by domestication only. Furthermore, the bottleneck was more severe in the local Southeast/South Asian populations than in the local Chinese populations, which resulted in a smaller effective population size for the former (7100–11,900). We show that extensive gene flow has occurred between the Southeast/South Asian and Chinese populations, and between the Southeast Asian and South Asian populations. Prolonged gene flow was detected between the Guangxi population from China and its neighboring Southeast/South Asian populations. In addition, based on multiple statistical approaches, we identified a genomic region that included three genes (PNPLA8, THAP5, and DNAJB9) on duck chromosome 1 with a high probability of gene flow between the Guangxi and Southeast/South Asian populations. Finally, we detected strong signatures of selection in genes that are involved in signaling pathways of the nervous system development (e.g., ADCYAP1R1 and PDC) and in genes that are associated with morphological traits such as cell growth (e.g., IGF1R). Our findings provide valuable information for a better understanding of the domestication and demographic history of the duck, and of the gene flow between local duck populations from Southeast/South Asia and China.

4 citations


Journal ArticleDOI
TL;DR: The Muscovy duck (Cairina moschata) is an economically important duck species, with favorable growth and carcass composition parameters in comparison to other ducks as discussed by the authors.
Abstract: The Muscovy duck (Cairina moschata) is an economically important duck species, with favourable growth and carcass composition parameters in comparison to other ducks. However, limited genomic resources for Muscovy duck hinder our understanding of its evolution and genetic diversity. We combined linked-reads sequencing technology and reference-guided methods for de novo genome assembly. The final draft assembly was 1.12 Gbp with 29 autosomes, one sex chromosome and 4,583 unlocalized scaffolds with an N50 size of 77.35 Mb. Based on universal single-copy orthologues (BUSCO), the draft genome assembly completeness was estimated to be 93.30 %. Genome annotation identified 15,580 genes, with 15,537 (99.72 %) genes annotated in public databases. We conducted comparative genomic analyses and found that species-specific and rapidly expanding gene families (compared to other birds) in Muscovy duck are mainly involved in Calcium signaling, Adrenergic signaling in cardiomyocytes, and GnRH signaling pathways. In comparison to the common domestic duck (Anas platyrhynchos), we identified 104 genes exhibiting strong signals of adaptive evolution (Ka/Ks > 1). Most of these genes were associated with immune defence pathways (e.g. IFNAR1 and TLR5). This is indicative of the existence of differences in the immune responses between the two species. Additionally, we combined divergence and polymorphism data to demonstrate the “faster-Z effect” of chromosome evolution. The chromosome-level genome assembly of Muscovy duck and comparative genomic analyses provide valuable resources for future molecular ecology studies, as well as the evolutionary arms race between the host and influenza viruses.

Journal ArticleDOI
TL;DR: In this paper, the r-gathering problem was revisited and a O( |C | + | F | )-time algorithm was proposed to solve the problem, which is optimal since any algorithm needs to read C and F at least once.

Posted ContentDOI
09 Jul 2021-bioRxiv
TL;DR: Wang et al. as mentioned in this paper developed a deep multiple instance learning model predicting tumor purity from H&E stained digital histopathology slides, which can be used for high throughput sample selection for genomic analysis, which will help reduce pathologists9 workload and decrease inter-observer variability.
Abstract: Tumor purity is the proportion of cancer cells in the tumor tissue. An accurate tumor purity estimation is crucial for accurate pathologic evaluation and for sample selection to minimize normal cell contamination in high throughput genomic analysis. We developed a novel deep multiple instance learning model predicting tumor purity from H&E stained digital histopathology slides. Our model successfully predicted tumor purity from slides of fresh-frozen sections in eight different TCGA cohorts and formalin-fixed paraffin-embedded sections in a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values, which were inferred from genomic data and accepted as the golden standard. Besides, we obtained spatially resolved tumor purity maps and showed that tumor purity varies spatially within a sample. Our analyses on tumor purity maps also suggested that pathologists might have chosen high tumor content regions inside the slides during tumor purity estimation in the TCGA cohorts, which resulted in higher values than genomic tumor purity values. In short, our model can be utilized for high throughput sample selection for genomic analysis, which will help reduce pathologists9 workload and decrease inter-observer variability. Moreover, spatial tumor purity maps can help better understand the tumor microenvironment as a key determinant in tumor formation and therapeutic response.

Journal ArticleDOI
TL;DR: This paper develops two new algorithms for computing the rooted triplet distance between unrestricted networks of arbitrary levels that have no restrictions on the networks’ in- and out-degrees.
Abstract: The rooted triplet distance measures the structural dissimilarity of two phylogenetic trees or networks by counting the number of rooted trees with exactly three leaf labels that occur as embedded subtrees in one, but not both of them. Suppose that \(N_1 = (V_1, E_1)\) and \(N_2 = (V_2, E_2)\) are rooted phylogenetic networks over a common leaf label set of size \(\lambda \), that \(N_i\) has level \(k_i\) and maximum in-degree \(d_i\) for \(i \in \{1,2\}\), and that the networks’ out-degrees are unbounded. Denote \(n = \max (|V_1|, |V_2|)\), \(m = \max (|E_1|, |E_2|)\), \(k = \max (k_1, k_2)\), and \(d = \max (d_1, d_2)\). Previous work has shown how to compute the rooted triplet distance between \(N_1\) and \(N_2\) in \(\mathrm {O}(\lambda \log \lambda )\) time in the special case \(k \le 1\). For \(k > 1\), no efficient algorithms are known; a trivial approach leads to a running time of \(\mathrm {\Omega }(n^{7} \lambda ^{3})\) and the only existing non-trivial algorithm imposes restrictions on the networks’ in- and out-degrees (in particular, it does not work when non-binary nodes are allowed). In this paper, we develop two new algorithms that have no such restrictions. Their running times are \(\mathrm {O}(n^{2} m + \lambda ^{3})\) and \(\mathrm {O}(m + k^{3} d^{3} \lambda + \lambda ^{3})\), respectively. We also provide implementations of our algorithms and evaluate their performance in practice. This is the first publicly available software for computing the rooted triplet distance between unrestricted networks of arbitrary levels.

Posted ContentDOI
06 Jul 2021-bioRxiv
TL;DR: In this paper, Drosophila Parafibromin/Hyrax (Hyx) inhibits NSC overgrowth by governing the cell polarity, which leads to the formation of supernumerary NSCs in the larval brain.
Abstract: Neural stem cells (NSCs) divide asymmetrically to balance their self-renewal and differentiation. The imbalance can lead to NSC overgrowth and tumour formation. The function of Parafibromin, a conserved tumour suppressor, in the nervous system is not established. Here, we demonstrate that Drosophila Parafibromin/Hyrax (Hyx) inhibits NSC overgrowth by governing the cell polarity. Hyx is essential for the apicobasal polarity by localizing both apical and basal proteins asymmetrically in NSCs. hyx loss results in the symmetric division of NSCs, leading to the formation of supernumerary NSCs in the larval brain. Human Parafibromin fully rescues NSC overgrowth and cell polarity defects in Drosophila hyx mutant brains. Hyx plays a novel role in maintaining interphase microtubule-organizing center and mitotic spindle formation in NSCs. Hyx is required for the proper localization of a key centrosomal protein Polo and microtubule-binding proteins Msps and D-TACC in dividing NSCs. This study discovers that Hyx has a brain tumour suppressor-like function and maintains NSC polarity by regulating centrosome function and microtubule growth. The new paradigm that Parafibromin orchestrates cell polarity and centrosomal assembly may be relevant to Parafibromin/HRPT2-associated cancers.