Author
Dongmei Yu
Other affiliations: Broad Institute, Massachusetts Institute of Technology, VA Boston Healthcare System ...read more
Bio: Dongmei Yu is an academic researcher from Harvard University. The author has contributed to research in topics: Genome-wide association study & Tourette syndrome. The author has an hindex of 23, co-authored 44 publications receiving 12541 citations. Previous affiliations of Dongmei Yu include Broad Institute & Massachusetts Institute of Technology.
Papers
More filters
••
Harvard University1, Broad Institute2, Boston Children's Hospital3, University of Washington4, University of Arizona5, Cardiff University6, Google7, Icahn School of Medicine at Mount Sinai8, Samsung Medical Center9, Vertex Pharmaceuticals10, University of Michigan11, University of Cambridge12, State University of New York Upstate Medical University13, Karolinska Institutet14, University of Eastern Finland15, Wellcome Trust Centre for Human Genetics16, University of Oxford17, Cedars-Sinai Medical Center18, University of Ottawa19, University of Pennsylvania20, University of North Carolina at Chapel Hill21, University of Helsinki22, University of California, San Diego23, University of Mississippi Medical Center24
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
8,758 citations
••
Harvard University1, Broad Institute2, Cardiff University3, Icahn School of Medicine at Mount Sinai4, University of Michigan5, University of Cambridge6, Karolinska Institutet7, University of Eastern Finland8, University of Oxford9, Cedars-Sinai Medical Center10, University of Ottawa11, University of Helsinki12, University of Pennsylvania13, University of North Carolina at Chapel Hill14, University of Mississippi Medical Center15
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.
1,552 citations
••
Verneri Anttila1, Verneri Anttila2, Brendan Bulik-Sullivan2, Brendan Bulik-Sullivan1 +717 more•Institutions (270)
TL;DR: It is demonstrated that, in the general population, the personality trait neuroticism is significantly correlated with almost every psychiatric disorder and migraine, and it is shown that both psychiatric and neurological disorders have robust correlations with cognitive and personality measures.
Abstract: Disorders of the brain can exhibit considerable epidemiological comorbidity and often share symptoms, provoking debate about their etiologic overlap. We quantified the genetic sharing of 25 brain disorders from genome-wide association studies of 265,218 patients and 784,643 control participants and assessed their relationship to 17 phenotypes from 1,191,588 individuals. Psychiatric disorders share common variant risk, whereas neurological disorders appear more distinct from one another and from the psychiatric disorders. We also identified significant sharing between disorders and a number of brain phenotypes, including cognitive measures. Further, we conducted simulations to explore how statistical power, diagnostic misclassification, and phenotypic heterogeneity affect genetic correlations. These results highlight the importance of common genetic variation as a risk factor for brain disorders and the value of heritability-based methods in understanding their etiology.
1,357 citations
••
TL;DR: Genetic influences on psychiatric disorders transcend diagnostic boundaries, suggesting substantial pleiotropy of contributing loci within genes that show heightened expression in the brain throughout the lifespan, beginning prenatally in the second trimester, and play prominent roles in neurodevelopmental processes.
781 citations
••
TL;DR: A meta-analysis from two independent OCD consortia, investigating a total of 2688 individuals of European ancestry with OCD and 7037 genomically matched controls, concludes that the largest single OCD genome-wide study to date represents a major integrative step in elucidating the genetic causes of OCD.
Abstract: Two obsessive-compulsive disorder (OCD) genome-wide association studies (GWASs) have been published by independent OCD consortia, the International Obsessive-Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and the OCD Collaborative Genetics Association Study (OCGAS), but many of the top-ranked signals were supported in only one study. We therefore conducted a meta-analysis from the two consortia, investigating a total of 2688 individuals of European ancestry with OCD and 7037 genomically matched controls. No single-nucleotide polymorphisms (SNPs) reached genome-wide significance. However, in comparison with the two individual GWASs, the distribution of P-values shifted toward significance. The top haplotypic blocks were tagged with rs4733767 (P=7.1 × 10 -7; odds ratio (OR)=1.21; confidence interval (CI): 1.12-1.31, CASC8/CASC11), rs1030757 (P=1.1 × 10 -6; OR=1.18; CI: 1.10-1.26, GRID2) and rs12504244 (P=1.6 × 10 -6; OR=1.18; CI: 1.11-1.27, KIT). Variants located in or near the genes ASB13, RSPO4, DLGAP1, PTPRD, GRIK2, FAIM2 and CDH20, identified in linkage peaks and the original GWASs, were among the top signals. Polygenic risk scores for each individual study predicted case-control status in the other by explaining 0.9% (P=0.003) and 0.3% (P=0.0009) of the phenotypic variance in OCGAS and the European IOCDF-GC target samples, respectively. The common SNP heritability in the combined OCGAS and IOCDF-GC sample was estimated to be 0.28 (s.e.=0.04). Strikingly, ∼65% of the SNP-based heritability in the OCGAS sample was accounted for by SNPs with minor allele frequencies of ≥40%. This joint analysis constituting the largest single OCD genome-wide study to date represents a major integrative step in elucidating the genetic causes of OCD.
356 citations
Cited by
More filters
••
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
4,913 citations
••
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
4,489 citations
••
Max Planck Society1, Broad Institute2, University of California, Berkeley3, European Bioinformatics Institute4, National Institutes of Health5, University of Massachusetts Medical School6, University of Washington7, Spanish National Research Council8, University of Montana9, Croatian Academy of Sciences and Arts10, University of Oviedo11, University of Bonn12, Emory University13, University College Cork14, Harvard University15
TL;DR: The genomic data suggest that Neandertals mixed with modern human ancestors some 120,000 years ago, leaving traces of Ne andertal DNA in contemporary humans, suggesting that gene flow from Neand Bertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.
Abstract: Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.
3,575 citations
••
TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Abstract: Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
3,289 citations
••
TL;DR: Examination of the oral and gut microbiome of melanoma patients undergoing anti-programmed cell death 1 protein (PD-1) immunotherapy suggested enhanced systemic and antitumor immunity in responding patients with a favorable gut microbiome as well as in germ-free mice receiving fecal transplants from responding patients.
Abstract: Preclinical mouse models suggest that the gut microbiome modulates tumor response to checkpoint blockade immunotherapy; however, this has not been well-characterized in human cancer patients. Here we examined the oral and gut microbiome of melanoma patients undergoing anti-programmed cell death 1 protein (PD-1) immunotherapy (n = 112). Significant differences were observed in the diversity and composition of the patient gut microbiome of responders versus nonresponders. Analysis of patient fecal microbiome samples (n = 43, 30 responders, 13 nonresponders) showed significantly higher alpha diversity (P < 0.01) and relative abundance of bacteria of the Ruminococcaceae family (P < 0.01) in responding patients. Metagenomic studies revealed functional differences in gut bacteria in responders, including enrichment of anabolic pathways. Immune profiling suggested enhanced systemic and antitumor immunity in responding patients with a favorable gut microbiome as well as in germ-free mice receiving fecal transplants from responding patients. Together, these data have important implications for the treatment of melanoma patients with immune checkpoint inhibitors.
2,791 citations