scispace - formally typeset
Search or ask a question
Author

Yu Wang

Bio: Yu Wang is an academic researcher from Vanderbilt University Medical Center. The author has contributed to research in topics: Population & Gene. The author has an hindex of 24, co-authored 49 publications receiving 5248 citations. Previous affiliations of Yu Wang include Vanderbilt University & Zhejiang University.


Papers
More filters
Journal ArticleDOI
29 Jan 2009-Nature
TL;DR: An initial analysis of the ∼730-megabase Sorghum bicolor (L.) Moench genome is presented, placing ∼98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information.
Abstract: Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.

2,809 citations

01 Jan 2015
TL;DR: The contribution of rare and low-frequency variants to human traits is largely unexplored as mentioned in this paper, but the contribution of these variants to the human traits has not yet been fully explored.
Abstract: The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.

824 citations

Journal ArticleDOI
TL;DR: QuanTIseq as discussed by the authors is a method to quantify the fractions of ten immune cell types from bulk RNA-sequencing data, which is extensively validated in blood and tumor samples using simulated, flow cytometry, and immunohistochemistry data.
Abstract: We introduce quanTIseq, a method to quantify the fractions of ten immune cell types from bulk RNA-sequencing data. quanTIseq was extensively validated in blood and tumor samples using simulated, flow cytometry, and immunohistochemistry data. quanTIseq analysis of 8000 tumor samples revealed that cytotoxic T cell infiltration is more strongly associated with the activation of the CXCR3/CXCL9 axis than with mutational load and that deconvolution-based cell scores have prognostic value in several solid cancers. Finally, we used quanTIseq to show how kinase inhibitors modulate the immune contexture and to reveal immune-cell types that underlie differential patients’ responses to checkpoint blockers. Availability: quanTIseq is available at http://icbi.at/quantiseq.

572 citations

Posted ContentDOI
17 Aug 2018-bioRxiv
TL;DR: QuanTIseq, a deconvolution method that quantifies the densities of ten immune cell types from bulk RNA sequencing data and tissue imaging data, is developed and used to show how kinase inhibitors modulate the immune contexture and suggest that it might have predictive value for immunotherapy.
Abstract: The immune contexture has a prognostic value in several cancers and the study of its pharmacological modulation could identify drugs acting synergistically with immune checkpoint blockers. However, the quantification of the immune contexture is hampered by the lack of simple and efficient methods. We developed quanTIseq, a deconvolution method that quantifies the densities of ten immune cell types from bulk RNA sequencing data and tissue imaging data. We performed extensive validation using simulated data, flow cytometry data, and immunohistochemistry data from three cancer cohorts. Analysis of 8,000 samples showed that the activation of the CXCR3/CXCL9 axis, rather than the mutational load is associated with cytotoxic T cell infiltration. We also show the prognostic value of deconvolution-based immunoscore and T cell/B cell score in several solid cancers. Finally, we used quanTIseq to show how kinase inhibitors modulate the immune contexture, and we suggest that it might have predictive value for immunotherapy.

337 citations

Journal ArticleDOI
Ayush Giri1, Jacklyn N. Hellwege2, Jacob M. Keaton1, Jacob M. Keaton2, Jihwan Park3, Chengxiang Qiu3, Helen R. Warren4, Helen R. Warren5, Eric S. Torstenson1, Eric S. Torstenson2, Csaba P. Kovesdy6, Yan V. Sun7, Otis D. Wilson1, Otis D. Wilson2, Cassianne Robinson-Cohen1, Christianne L. Roumie1, Cecilia P. Chung1, K A Birdwell6, K A Birdwell1, Scott M. Damrauer6, Scott L. DuVall, Derek Klarin, Kelly Cho8, Yu Wang1, Evangelos Evangelou9, Evangelos Evangelou10, Claudia P. Cabrera5, Claudia P. Cabrera4, Louise V. Wain5, Louise V. Wain11, Rojesh Shrestha3, Brian S. Mautz1, Elvis A. Akwo1, Muralidharan Sargurupremraj12, Stéphanie Debette12, Michael Boehnke13, Laura J. Scott13, Jian'an Luan14, Zhao J-H.14, Sara M. Willems14, Sébastien Thériault15, Nabi Shah16, Nabi Shah17, Christopher Oldmeadow18, Peter Almgren19, Ruifang Li-Gao20, Niek Verweij21, Thibaud Boutin22, Massimo Mangino23, Massimo Mangino24, Ioanna Ntalla4, Elena V. Feofanova25, Praveen Surendran14, James P. Cook26, Savita Karthikeyan14, Najim Lahrouchi27, Ching-Ti Liu28, Nuno Sepúlveda29, Tom G. Richardson30, Aldi T. Kraja31, Philippe Amouyel32, Martin Farrall33, Neil Poulter9, Markku Laakso34, Eleftheria Zeggini35, Peter S. Sever36, Robert A. Scott14, Claudia Langenberg14, Nicholas J. Wareham14, David Conen37, Palmer Cna.17, John Attia18, Daniel I. Chasman38, Paul M. Ridker38, Olle Melander19, Dennis O. Mook-Kanamori20, Harst Pvd.21, Francesco Cucca39, David Schlessinger36, Caroline Hayward22, Tim D. Spector23, Jarvelin M-R.1, Branwen J. Hennig40, Branwen J. Hennig29, Nicholas J. Timpson30, Wei W-Q.1, J C Smith1, Yaomin Xu1, Michael E. Matheny, E E Siew1, C M Lindgren41, C M Lindgren27, C M Lindgren33, Herzig K-H., George Dedoussis42, Josh C. Denny1, Bruce M. Psaty43, Howson Jmm.14, Patricia B. Munroe4, Patricia B. Munroe5, Christopher Newton-Cheh44, Mark J. Caulfield5, Mark J. Caulfield4, Paul Elliott9, Paul Elliott5, J M Gaziano45, J M Gaziano46, John Concato, Wilson Pwf.6, Philip S. Tsao46, D.R. Velez Edwards1, D.R. Velez Edwards2, Katalin Susztak3, Christopher J. O'Donnell38, Adriana M. Hung2, Adriana M. Hung1, Todd L. Edwards1, Todd L. Edwards2 
TL;DR: Analysis of blood pressure data from the Million Veteran Program trans-ethnic cohort identifies common and rare variants, and genetically predicted gene expression across multiple tissues associated with systolic, diastolic and pulse pressure in over 775,000 individuals.
Abstract: In this trans-ethnic multi-omic study, we reinterpret the genetic architecture of blood pressure to identify genes, tissues, phenomes and medication contexts of blood pressure homeostasis. We discovered 208 novel common blood pressure SNPs and 53 rare variants in genome-wide association studies of systolic, diastolic and pulse pressure in up to 776,078 participants from the Million Veteran Program (MVP) and collaborating studies, with analysis of the blood pressure clinical phenome in MVP. Our transcriptome-wide association study detected 4,043 blood pressure associations with genetically predicted gene expression of 840 genes in 45 tissues, and mouse renal single-cell RNA sequencing identified upregulated blood pressure genes in kidney tubule cells.

310 citations


Cited by
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal ArticleDOI
Patrick S. Schnable1, Doreen Ware2, Robert S. Fulton3, Joshua C. Stein2  +156 moreInstitutions (18)
20 Nov 2009-Science
TL;DR: The sequence of the maize genome reveals it to be the most complex genome known to date and the correlation of methylation-poor regions with Mu transposon insertions and recombination and how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state is reported.
Abstract: We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize.

3,761 citations

Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

3,743 citations