scispace - formally typeset
Search or ask a question
Author

Jia Wen

Bio: Jia Wen is an academic researcher from University of North Carolina at Chapel Hill. The author has contributed to research in topics: Medicine & Genome-wide association study. The author has an hindex of 10, co-authored 29 publications receiving 697 citations. Previous affiliations of Jia Wen include Nanjing Agricultural University & Huazhong Agricultural University.

Papers
More filters
Journal ArticleDOI
Mark Chaisson1, Mark Chaisson2, Ashley D. Sanders, Xuefang Zhao3, Xuefang Zhao4, Ankit Malhotra, David Porubsky5, David Porubsky6, Tobias Rausch, Eugene J. Gardner7, Oscar L. Rodriguez8, Li Guo9, Ryan L. Collins3, Xian Fan10, Jia Wen11, Robert E. Handsaker3, Robert E. Handsaker12, Susan Fairley13, Zev N. Kronenberg2, Xiangmeng Kong14, Fereydoun Hormozdiari15, Dillon Lee16, Aaron M. Wenger17, Alex Hastie, Danny Antaki18, Thomas Anantharaman, Peter A. Audano2, Harrison Brand3, Stuart Cantsilieris2, Han Cao, Eliza Cerveira, Chong Chen10, Xintong Chen7, Chen-Shan Chin17, Zechen Chong10, Nelson T. Chuang7, Christine C. Lambert17, Deanna M. Church, Laura Clarke13, Andrew Farrell16, Joey Flores19, Timur R. Galeev14, David U. Gorkin18, David U. Gorkin20, Madhusudan Gujral18, Victor Guryev6, William Haynes Heaton, Jonas Korlach17, Sushant Kumar14, Jee Young Kwon21, Ernest T. Lam, Jong Eun Lee, Joyce V. Lee, Wan-Ping Lee, Sau Peng Lee, Shantao Li14, Patrick Marks, Karine A. Viaud-Martinez19, Sascha Meiers, Katherine M. Munson2, Fabio C. P. Navarro14, Bradley J. Nelson2, Conor Nodzak11, Amina Noor18, Sofia Kyriazopoulou-Panagiotopoulou, Andy Wing Chun Pang, Yunjiang Qiu20, Yunjiang Qiu18, Gabriel Rosanio18, Mallory Ryan, Adrian M. Stütz, Diana C.J. Spierings6, Alistair Ward16, Anne Marie E. Welch2, Ming Xiao22, Wei Xu, Chengsheng Zhang, Qihui Zhu, Xiangqun Zheng-Bradley13, Ernesto Lowy13, Sergei Yakneen, Steven A. McCarroll3, Steven A. McCarroll12, Goo Jun23, Li Ding24, Chong-Lek Koh25, Bing Ren20, Bing Ren18, Paul Flicek13, Ken Chen10, Mark Gerstein, Pui-Yan Kwok26, Peter M. Lansdorp27, Peter M. Lansdorp6, Peter M. Lansdorp28, Gabor T. Marth16, Jonathan Sebat18, Xinghua Shi11, Ali Bashir8, Kai Ye9, Scott E. Devine7, Michael E. Talkowski3, Michael E. Talkowski12, Ryan E. Mills4, Tobias Marschall5, Jan O. Korbel13, Evan E. Eichler2, Charles Lee21 
TL;DR: A suite of long-read, short- read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms are applied to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner.
Abstract: The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

606 citations

Journal ArticleDOI
14 Oct 2020-Nature
TL;DR: It is shown that chromatin interactions underlie several aspects of gene regulation, with transposable elements and disease-associated variants enriched at distal interacting regions in a cell-type-specific manner.
Abstract: Lineage-specific epigenomic changes during human corticogenesis have been difficult to study owing to challenges with sample availability and tissue heterogeneity. For example, previous studies using single-cell RNA sequencing identified at least 9 major cell types and up to 26 distinct subtypes in the dorsal cortex alone1,2. Here we characterize cell-type-specific cis-regulatory chromatin interactions, open chromatin peaks, and transcriptomes for radial glia, intermediate progenitor cells, excitatory neurons, and interneurons isolated from mid-gestational samples of the human cortex. We show that chromatin interactions underlie several aspects of gene regulation, with transposable elements and disease-associated variants enriched at distal interacting regions in a cell-type-specific manner. In addition, promoters with increased levels of chromatin interactivity-termed super-interactive promoters-are enriched for lineage-specific genes, suggesting that interactions at these loci contribute to the fine-tuning of transcription. Finally, we develop CRISPRview, a technique that integrates immunostaining, CRISPR interference, RNAscope, and image analysis to validate cell-type-specific cis-regulatory elements in heterogeneous populations of primary cells. Our findings provide insights into cell-type-specific gene expression patterns in the developing human cortex and advance our understanding of gene regulation and lineage specification during this crucial developmental window.

100 citations

Journal ArticleDOI
TL;DR: This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes’ contribution to gene expression, by providing a deep auto-encoder model for predicting gene expression from SNP genotypes.
Abstract: Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes’ contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

75 citations

Journal ArticleDOI
TL;DR: This study indicates that integration of population divergence analysis, genome-wide association study and expression analysis is an efficient approach to identify candidate domestication-related genes.
Abstract: Flowering time and seed size are traits related to domestication. However, identification of domestication-related loci/genes of controlling the traits in soybean is rarely reported. In this study, we identified a total of 48 domestication-related loci based on RAD-seq genotyping of a natural population comprising 286 accessions. Among these, four on chromosome 12 and additional two on chromosomes 11 and 15 were associated with flowering time, and four on chromosomes 11 and 16 were associated with seed size. Of the five genes associated with flowering time and the three genes associated with seed size, three genes Glyma11g18720, Glyma11g15480 and Glyma15g35080 were homologous to Arabidopsis genes, additional five genes were found for the first time to be associated with these two traits. Glyma11g18720 and Glyma05g28130 were co-expressed with five genes homologous to flowering time genes in Arabidopsis, and Glyma11g15480 was co-expressed with 24 genes homologous to seed development genes in Arabidopsis. This study indicates that integration of population divergence analysis, genome-wide association study and expression analysis is an efficient approach to identify candidate domestication-related genes.

58 citations

Journal ArticleDOI
TL;DR: It is suggested that ultra-rare structural variants that affect the boundaries of topologically associated domains (TADs) increase risk for schizophrenia and Alterations in TAD boundaries may lead to dysregulation of gene expression.
Abstract: Despite considerable progress in schizophrenia genetics, most findings have been for large rare structural variants and common variants in well-imputed regions with few genes implicated from exome sequencing. Whole genome sequencing (WGS) can potentially provide a more complete enumeration of etiological genetic variation apart from the exome and regions of high linkage disequilibrium. We analyze high-coverage WGS data from 1162 Swedish schizophrenia cases and 936 ancestry-matched population controls. Our main objective is to evaluate the contribution to schizophrenia etiology from a variety of genetic variants accessible to WGS but not by previous technologies. Our results suggest that ultra-rare structural variants that affect the boundaries of topologically associated domains (TADs) increase risk for schizophrenia. Alterations in TAD boundaries may lead to dysregulation of gene expression. Future mechanistic studies will be needed to determine the precise functional effects of these variants on biology.

55 citations


Cited by
More filters
Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Book ChapterDOI
01 Jan 2009
TL;DR: In this article, the effects of cross-fertilisation and self fertilization on the production of seeds are discussed. But the main difference between cross-and self-flowered plants is the height and weights of the crossed and self-flowering plants.
Abstract: 1. Introductory remarks 2. Convolvulacaea 2. Scrophulariaceae, Gesneriaceae, Labiatae, etc. 4. Cruciferae, Papaveraceae, Resedaceae, etc. 5. Geraniaceae, Leguminosae, Onagraceae, etc. 6. Solanaceae, Primulaceae, Polygoneae, etc. 7. Summary of the heights and weights of the crossed and self-fertilised plants 8. Difference between crossed and self-fertilised plants in constitutional vigour and in other respects 9. The effects of cross-fertilisation and self-fertilisation on the production of seeds 10. Means of fertilisation 11. The habits of insects in relation to the fertilisation of flowers 12. General results Index.

1,224 citations

Journal ArticleDOI
TL;DR: Hifiasm as discussed by the authors is a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph.
Abstract: Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.

884 citations