Showing papers by "David Altshuler published in 2015"
••
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
12,661 citations
01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
3,247 citations
••
Harvard University1, Broad Institute2, Cardiff University3, Icahn School of Medicine at Mount Sinai4, University of Michigan5, University of Cambridge6, Karolinska Institutet7, University of Eastern Finland8, University of Oxford9, Cedars-Sinai Medical Center10, University of Ottawa11, University of Helsinki12, University of Pennsylvania13, University of North Carolina at Chapel Hill14, University of Mississippi Medical Center15
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.
1,552 citations
••
Broad Institute1, Harvard University2, Washington University in St. Louis3, University of Copenhagen4, University of Milan5, University of Oxford6, University of North Carolina at Chapel Hill7, Fred Hutchinson Cancer Research Center8, University of Verona9, University of Ottawa10, University of Cambridge11, Memorial Hospital of South Bend12, University of Amsterdam13, University of Leicester14, Technische Universität München15, University of Lübeck16, Duke University17, University of Western Ontario18, Heidelberg University19, Medical University of Graz20, Synlab Group21, National Institutes of Health22, University of Pennsylvania23, University of Alabama at Birmingham24, University of Minnesota25, Wake Forest University26, Stanford University27, University of Mississippi28, Karolinska Institutet29, Merck & Co.30, University of Washington31, Group Health Cooperative32, University of Virginia33, University of Vermont34, Boston University35, University of Missouri–Kansas City36, University of Southern California37, Cleveland Clinic38, Ohio State University39, University of Texas Health Science Center at Houston40, University of Michigan41
TL;DR: Kathiresan et al. as mentioned in this paper used exome sequencing of nearly 10,000 people to identify alleles associated with early-onset myocardial infarction; mutations in low-density lipoprotein receptor (LDLR) or apolipoprotein A-V (APOA5) were associated with disease risk.
Abstract: Exome sequence analysis of nearly 10,000 people was carried out to identify alleles associated with early-onset myocardial infarction; mutations in low-density lipoprotein receptor (LDLR) or apolipoprotein A-V (APOA5) were associated with disease risk, identifying the key roles of low-density lipoprotein cholesterol and metabolism of triglyceride-rich lipoproteins. Sekar Kathiresan and colleagues use exome sequencing of nearly 10,000 people to probe the contribution of multiple rare mutations within a gene to risk for myocardial infarction at a population level. They find that mutations in low-density lipoprotein receptor (LDLR) or apolipoprotein A-V (APOA5) are associated with disease risk. When compared with non-carriers, LDLR mutation carriers had higher plasma levels of LDL cholesterol, whereas APOA5 mutation carriers had higher plasma levels of triglycerides. As well as confirming that APOA5 is a myocardial infarction gene, this work informs the design and conduct of rare-variant association studies for complex diseases. Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance1,2. When MI occurs early in life, genetic inheritance is a major component to risk1. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families3,4,5,6,7,8, whereas common variants at more than 45 loci have been associated with MI risk in the population9,10,11,12,13,14,15. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol16. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl−1. At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase15,17 and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
521 citations
••
TL;DR: This paper performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry, and identified 49 distinct association signals at these loci including five mapping in or near KCNQ1.
Abstract: We performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in or near KCNQ1. 'Credible sets' of the variants most likely to drive each distinct signal mapped predominantly to noncoding sequence, implying that association with T2D is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine mapping implicated rs10830963 as driving T2D association. We confirmed that the T2D risk allele for this SNP increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease.
370 citations
••
Medical Research Council1, National Institute for Health Research2, Massachusetts Institute of Technology3, University of Washington4, Harvard University5, Max Delbrück Center for Molecular Medicine6, National Institutes of Health7, European Bioinformatics Institute8, St. Vincent's Health System9, Victor Chang Cardiac Research Institute10, University of New South Wales11, Harefield Hospital12, University of Louisville13, University of Mississippi Medical Center14, Charité15
TL;DR: It is shown that TTNtv is the most common genetic cause of DCM in ambulant patients in the community, identify clinically important manifestations ofTTNtv-positive DCM, and define the penetrance and outcomes of TTNTV in the general population.
Abstract: The recent discovery of heterozygous human mutations that truncate full-length titin (TTN, an abundant structural, sensory, and signaling filament in muscle) as a common cause of end-stage dilated cardiomyopathy (DCM) promises new prospects for improving heart failure management. However, realization of this opportunity has been hindered by the burden of TTN-truncating variants (TTNtv) in the general population and uncertainty about their consequences in health or disease. To elucidate the effects of TTNtv, we coupled TTN gene sequencing with cardiac phenotyping in 5267 individuals across the spectrum of cardiac physiology and integrated these data with RNA and protein analyses of human heart tissues. We report diversity of TTN isoform expression in the heart, define the relative inclusion of TTN exons in different isoforms (using the TTN transcript annotations available at http://cardiodb.org/titin), and demonstrate that these data, coupled with the position of the TTNtv, provide a robust strategy to discriminate pathogenic from benign TTNtv. We show that TTNtv is the most common genetic cause of DCM in ambulant patients in the community, identify clinically important manifestations of TTNtv-positive DCM, and define the penetrance and outcomes of TTNtv in the general population. By integrating genetic, transcriptome, and protein analyses, we provide evidence for a length-dependent mechanism of disease. These data inform diagnostic criteria and management strategies for TTNtv-positive DCM patients and for TTNtv that are identified as incidental findings.
341 citations
••
University of Bonn1, Columbia University2, Broad Institute3, Harvard University4, Utrecht University5, German Center for Neurodegenerative Diseases6, University of Texas MD Anderson Cancer Center7, University of Minnesota8, University of Colorado Denver9, University of California, San Francisco10, University of Duisburg-Essen11, University of Münster12, Charité13, Ludwig Maximilian University of Munich14, Dartmouth College15, North Shore-LIJ Health System16, University of Antwerp17
TL;DR: The first meta-analysis in AA is performed by combining data from two genome-wide association studies (GWAS), and replication with supplemented ImmunoChip data for a total of 3,253 cases and 7,543 controls, finding new molecular pathways disrupted in AA that support the causal role of aberrant immune processes in AA.
Abstract: Alopecia areata (AA) is a prevalent autoimmune disease with 10 known susceptibility loci. Here we perform the first meta-analysis of research on AA by combining data from two genome-wide association studies (GWAS), and replication with supplemented ImmunoChip data for a total of 3,253 cases and 7,543 controls. The strongest region of association is the major histocompatibility complex, where we fine-map four independent effects, all implicating human leukocyte antigen-DR as a key aetiologic driver. Outside the major histocompatibility complex, we identify two novel loci that exceed the threshold of statistical significance, containing ACOXL/BCL2L11(BIM) (2q13); GARP (LRRC32) (11q13.5), as well as a third nominally significant region SH2B3(LNK)/ATXN2 (12q24.12). Candidate susceptibility gene expression analysis in these regions demonstrates expression in relevant immune cells and the hair follicle. We integrate our results with data from seven other autoimmune diseases and provide insight into the alignment of AA within these disorders. Our findings uncover new molecular pathways disrupted in AA, including autophagy/apoptosis, transforming growth factor beta/Tregs and JAK kinase signalling, and support the causal role of aberrant immune processes in AA.
210 citations
••
TL;DR: The results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
Abstract: Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
132 citations
••
TL;DR: IMP2 limits longevity and regulates nutrient and energy metabolism in the mouse by controlling the translation of its client mRNAs.
130 citations
••
TL;DR: It is found that the AMY1 copy number in an individual's genome is generally even (rather than odd) and partially correlates with nearby SNPs, which do not associate with body mass index (BMI).
Abstract: Hundreds of genes reside in structurally complex, poorly understood regions of the human genome. One such region contains the three amylase genes (AMY2B, AMY2A and AMY1) responsible for digesting starch into sugar. Copy number of AMY1 is reported to be the largest genomic influence on obesity, although genome-wide association studies for obesity have found this locus unremarkable. Using whole-genome sequence analysis, droplet digital PCR and genome mapping, we identified eight common structural haplotypes of the amylase locus that suggest its mutational history. We found that the AMY1 copy number in an individual's genome is generally even (rather than odd) and partially correlates with nearby SNPs, which do not associate with body mass index (BMI). We measured amylase gene copy number in 1,000 obese or lean Estonians and in 2 other cohorts totaling ∼3,500 individuals. We had 99% power to detect the lower bound of the reported effects on BMI, yet found no association.
127 citations
••
TL;DR: A luminescent insulin secretion assay that enables large-scale investigations of beta-cell function, created by inserting Gaussia luciferase into the C-peptide portion of proinsulin, requiring 40-fold less time and expense than the traditional ELISA.
••
Wellcome Trust Centre for Human Genetics1, University of Michigan2, University of Oxford3, Broad Institute4, University of Texas Health Science Center at Houston5, University of Copenhagen6, University of Chicago7, McGill University8, Harvard University9, Lund University10, King's College London11, Science for Life Laboratory12, Texas Biomedical Research Institute13, University of California, San Francisco14, University of Mississippi Medical Center15, University of Southern Denmark16, Ninewells Hospital17, Uppsala University18, Helsinki University Central Hospital19, Steno Diabetes Center20, Aalborg University21, University of Eastern Finland22, National Institutes of Health23, University of North Carolina at Chapel Hill24, Cedars-Sinai Medical Center25, King Abdulaziz University26, University of Southern California27, Boston University28, Massachusetts Institute of Technology29
TL;DR: In this article, the authors analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry and identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal.
Abstract: Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.
••
TL;DR: A strong enrichment of drug target genes associated with T2D was detected, primarily driven by insulin and thiazolidinedione targets, which was replicated in an independent meta-analysis (Metabochip), illustrating the utility of this approach in identifying potential side effects.
Abstract: Genome-wide association studies (GWAS) have uncovered >65 common variants associated with type 2 diabetes (T2D); however, their relevance for drug development is not yet clear. Of note, the first two T2D-associated loci (PPARG and KCNJ11/ABCC8) encode known targets of antidiabetes medications. We therefore tested whether other genes/pathways targeted by antidiabetes drugs are associated with T2D. We compiled a list of 102 genes in pathways targeted by marketed antidiabetic medications and applied Gene Set Enrichment Analysis (MAGENTA [Meta-Analysis Gene-set Enrichment of variaNT Associations]) to this gene set, using available GWAS meta-analyses for T2D and seven quantitative glycemic traits. We detected a strong enrichment of drug target genes associated with T2D (P = 2 × 10−5; 14 potential new associations), primarily driven by insulin and thiazolidinedione (TZD) targets, which was replicated in an independent meta-analysis (Metabochip). The glycemic traits yielded no enrichment. The T2D enrichment signal was largely due to multiple genes of modest effects (P = 4 × 10−4, after removing known loci), highlighting new associations for follow-up (ACSL1, NFKB1, SLC2A2, incretin targets). Furthermore, we found that TZD targets were enriched for LDL cholesterol associations, illustrating the utility of this approach in identifying potential side effects. These results highlight the potential biomedical relevance of genes revealed by GWAS and may provide new avenues for tailored therapy and T2D treatment design.
••
TL;DR: Delayed failures of structural augmentation with cement during kyphoplasty do occur and can lead to additional surgeries, and a possible predictive index may include wall integrity of the vertebral body, competency of the posterior tension band, and location of the kyPHoplasty at a junctional spinal level.
Abstract: OBJECT Pathological compression fractures in cancer patients cause significant pain and disability. Spinal metastases affect quality of life near the end of life and may require multiple procedures, including medical palliative care and open surgical decompression and fixation. An increasingly popular minimally invasive technique to treat metastatic instabilities is kyphoplasty. Even though it may alleviate pain due to pathological fractures, it may fail. However, delayed kyphoplasty failures with retropulsed cement and neural element compression have not been well reported. Such failures necessitate open surgical decompression and stabilization, and cement inserted during the kyphoplasty complicates salvage surgeries in patients with a disease-burdened spine. The authors sought to examine the incidence of delayed failure of structural kyphoplasty in a series of cement augmentations for pathological compression fractures. The goal was to identify risk predictors by analyzing patient and disease characteristics to reduce kyphoplasty failure and to prevent excessive surgical procedures at the end of life. METHODS The authors retrospectively reviewed the records of all patients with metastatic cancer from 2010 to 2013 who had undergone a procedure involving cement augmentation for a pathological compression fracture at their institution. The authors examined the characteristics of the patients, diseases, and radiographic fractures. RESULTS In total, 37 patients underwent cement augmentation in 75 spinal levels during 45 surgeries. Four patients had delayed structural kyphoplasty failure necessitating surgical decompression and fusion. The mean time to kyphoplasty failure was 2.88 ± 1.24 months. The mean loss of vertebral body height was 16% in the patients in whom kyphoplasty failed and 32% in patients in whom kyphoplasty did not fail. No posterior intraoperative cement extravasation was observed in the patients in whom kyphoplasty had failed. The mean spinal instability neoplastic score was 10.8 in the patients in whom kyphoplasty failed and 10.1 in those in whom kyphoplasty did not fail. Approximately 50% of the kyphoplasty failures occurred at junctional spinal levels. All the patients in whom kyphoplasty failed had fractures in 3 or more cortical walls before treatment, whereas 46% of patients in the nonfailure group had fractures with breaching of 3 or more walls. CONCLUSIONS Although rare, delayed failures of structural augmentation with cement during kyphoplasty do occur and can lead to additional surgeries. A possible predictive index may include wall integrity of the vertebral body, competency of the posterior tension band, and location of the kyphoplasty at a junctional spinal level. Additional studies are required to confirm these findings.