Showing papers by "Gonçalo R. Abecasis published in 2016"
••
TL;DR: Improvements to imputation machinery are described that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools.
Abstract: Christian Fuchsberger, Goncalo Abecasis and colleagues describe a new web-based imputation service that enables rapid imputation of large numbers of samples and allows convenient access to large reference panels of sequenced individuals. Their state space reduction provides a computationally efficient solution for genotype imputation with no loss in imputation accuracy.
2,556 citations
••
Wellcome Trust Sanger Institute1, University of Michigan2, University of Oxford3, University of Geneva4, University of Exeter5, Greifswald University Hospital6, National Research Council7, University of Bristol8, University of Colorado Boulder9, University of Washington10, Fred Hutchinson Cancer Research Center11, SUNY Downstate Medical Center12, Erasmus University Rotterdam13, University of Trieste14, VU University Amsterdam15, King's College London16, South London and Maudsley NHS Foundation Trust17, University of Edinburgh18, Harvard University19, National Institutes of Health20, Harokopio University21, Innsbruck Medical University22, Broad Institute23, Lund University24, University of Helsinki25, Norwegian University of Science and Technology26, University of Cambridge27, University of Minnesota28, Technische Universität München29, University of North Carolina at Chapel Hill30, University of Toronto31, McGill University32, Leiden University33, University of Pennsylvania34, University of Groningen35, Utrecht University36, Churchill Hospital37
TL;DR: A reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
2,149 citations
01 Jan 2016
TL;DR: In this article, a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry is presented.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
1,261 citations
••
TL;DR: A new phasing algorithm, Eagle2, is introduced that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform.
Abstract: Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.
1,246 citations
••
Christian Fuchsberger1, Christian Fuchsberger2, Jason Flannick3, Jason Flannick4 +346 more•Institutions (77)
TL;DR: In this paper, the authors performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing for 12,940 individuals from five ancestry groups.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
866 citations
01 Jan 2016
TL;DR: Large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes, but most fell within regions previously identified by genome-wide association studies.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
698 citations
••
University of Pennsylvania1, Pierre-and-Marie-Curie University2, University of Cambridge3, Leiden University Medical Center4, Wellcome Trust Sanger Institute5, British Heart Foundation6, University of Glasgow7, Glasgow Clinical Research Facility8, International Centre for Diarrhoeal Disease Research, Bangladesh9, University of Michigan10, University of Lübeck11, Copenhagen University Hospital12, University of Copenhagen13, National Institutes of Health14, Pennsylvania State University15, University of Hamburg16, Pasteur Institute17, University of Strasbourg18, University of Toulouse19, Ludwig Maximilian University of Munich20, University of Insubria21, Queen's University Belfast22, National Institute for Health Research23, Technische Universität München24, Harvard University25, Washington University in St. Louis26
TL;DR: In this paper, the authors identified a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI.
Abstract: Scavenger receptor BI (SR-BI) is the major receptor for high-density lipoprotein (HDL) cholesterol (HDL-C). In humans, high amounts of HDL-C in plasma are associated with a lower risk of coronary heart disease (CHD). Mice that have depleted Scarb1 (SR-BI knockout mice) have markedly elevated HDL-C levels but, paradoxically, increased atherosclerosis. The impact of SR-BI on HDL metabolism and CHD risk in humans remains unclear. Through targeted sequencing of coding regions of lipid-modifying genes in 328 individuals with extremely high plasma HDL-C levels, we identified a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI. The P376L variant impairs posttranslational processing of SR-BI and abrogates selective HDL cholesterol uptake in transfected cells, in hepatocyte-like cells derived from induced pluripotent stem cells from the homozygous subject, and in mice. Large population-based studies revealed that subjects who are heterozygous carriers of the P376L variant have significantly increased levels of plasma HDL-C. P376L carriers have a profound HDL-related phenotype and an increased risk of CHD (odds ratio = 1.79, which is statistically significant).
417 citations
••
TL;DR: A meta-analysis of genome-wide association studies for estimated glomerular filtration rate suggests that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways.
Abstract: Reduced glomerular filtration rate defines chronic kidney disease and is associated with cardiovascular and all-cause mortality. We conducted a meta-analysis of genome-wide association studies for estimated glomerular filtration rate (eGFR), combining data across 133,413 individuals with replication in up to 42,166 individuals. We identify 24 new and confirm 29 previously identified loci. Of these 53 loci, 19 associate with eGFR among individuals with diabetes. Using bioinformatics, we show that identified genes at eGFR loci are enriched for expression in kidney tissues and in pathways relevant for kidney development and transmembrane transporter activity, kidney structure, and regulation of glucose metabolism. Chromatin state mapping and DNase I hypersensitivity analyses across adult tissues demonstrate preferential mapping of associated variants to regulatory regions in kidney but not extra-renal tissues. These findings suggest that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways.
409 citations
••
TL;DR: RVTESTS is developed, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals and provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection.
Abstract: Motivation: Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data.
Availability and implementation: RVTESTS is available on Linux, MacOS and Windows. Source code and executable files can be obtained at https://github.com/zhanxw/rvtests
Contact: moc.liamg@wxnahz; ude.hcimu@olacnog; moc.kooltuo@uil.gnaijad
Supplementary information: Supplementary data are available at Bioinformatics online.
344 citations
••
Stanford University1, Wellcome Trust Sanger Institute2, Massachusetts Institute of Technology3, Arizona State University4, American Museum of Natural History5, Broad Institute6, Cornell University7, University of Queensland8, European Bioinformatics Institute9, Yeshiva University10, Virginia Tech11, Wellcome Trust12, University of Michigan13, Harvard University14, Ewha Womans University15, Columbia University16
TL;DR: A calibrated phylogenetic tree is constructed on the basis of binary single-nucleotide variants and the more complex variants onto it, estimating the number of mutations for each class and shows bursts of extreme expansion in male numbers that have occurred independently among the five continental superpopulations examined.
Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.
280 citations
••
Harvard University1, Samsung Medical Center2, Boston University3, Washington University in St. Louis4, Broad Institute5, Humanitas University6, University of Mississippi Medical Center7, Technische Universität München8, University of Leicester9, University of Leeds10, University of Ottawa11, Wellcome Trust Centre for Human Genetics12, University of Michigan13, Erasmus University Rotterdam14, University of Pennsylvania15, National Institute for Health Research16, University of Parma17
TL;DR: Beyond CHD, genetically lowered Lp(a) levels are associated with a lower risk of peripheral vascular disease, stroke, heart failure, and aortic stenosis, and no association with 31 other disorders, including type 2 diabetes and cancer.
••
University of Twente1, VU University Amsterdam2, QIMR Berghofer Medical Research Institute3, University of Minnesota4, University of Edinburgh5, University of Illinois at Urbana–Champaign6, University of Tartu7, Erasmus University Medical Center8, University of Chicago9, University of Tampere10, Western General Hospital11, Martin Luther University of Halle-Wittenberg12, University of Helsinki13, Virginia Commonwealth University14, National Institutes of Health15, Greifswald University Hospital16, Karolinska Institutet17, University of Michigan18, Washington University in St. Louis19, Estonian Academy of Sciences20, Duke University21, University of Bristol22, Radboud University Nijmegen23, University of Greifswald24, University of Queensland25, University of Brescia26, VU University Medical Center27, Wellcome Trust Sanger Institute28, University of Split29, Turku University Hospital30, University of Turku31, Indiana University32, University of Missouri33, Florida State University34, Trinity College, Dublin35, University of Southern Denmark36
TL;DR: A large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts shows that extraversion is a highly polygenic personality trait, with an architecture possibly different from other complex human traits, including other personality traits.
Abstract: Extraversion is a relatively stable and heritable personality trait associated with numerous psychosocial, lifestyle and health outcomes. Despite its substantial heritability, no genetic variants have been detected in previous genome-wide association (GWA) studies, which may be due to relatively small sample sizes of those studies. Here, we report on a large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts. Extraversion item data from multiple personality inventories were harmonized across inventories and cohorts. No genome-wide significant associations were found at the single nucleotide polymorphism (SNP) level but there was one significant hit at the gene level for a long non-coding RNA site (LOC101928162). Genome-wide complex trait analysis in two large cohorts showed that the additive variance explained by common SNPs was not significantly different from zero, but polygenic risk scores, weighted using linkage information, significantly predicted extraversion scores in an independent cohort. These results show that extraversion is a highly polygenic personality trait, with an architecture possibly different from other complex human traits, including other personality traits. Future studies are required to further determine which genetic variants, by what modes of gene action, constitute the heritable nature of extraversion.
••
TL;DR: A genome-wide association meta-analysis of 4 QRS traits in up to 73,518 individuals of European ancestry provides new insights into genes and biological pathways controlling myocardial mass and may help identify novel therapeutic targets.
17 Nov 2016
TL;DR: A human genetics study sheds light on how HDL (good) cholesterol protects against cardiovascular disease by identifying a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI
Abstract: A scavenger that protects the heart Coronary heart disease is a tale of two forms of plasma cholesterol. In contrast to the well-established effects of “bad” cholesterol (LDL-C), the role of “good” cholesterol (HDL-C) is mysterious. Elevated HDL-C correlates with a lower risk of heart disease, yet drugs that raise HDL-C levels do not reduce risk. Zanoni et al. found that some people with exceptionally high levels of HDL-C carry a rare sequence variant in the gene encoding the major HDL-C receptor, scavenger receptor BI. This variant destroys the receptor's ability to take up HDL-C. Interestingly, people with this variant have a higher risk of heart disease despite having high levels of HDL-C. Science, this issue p. 1166 A human genetics study sheds light on how HDL (good) cholesterol protects against cardiovascular disease. Scavenger receptor BI (SR-BI) is the major receptor for high-density lipoprotein (HDL) cholesterol (HDL-C). In humans, high amounts of HDL-C in plasma are associated with a lower risk of coronary heart disease (CHD). Mice that have depleted Scarb1 (SR-BI knockout mice) have markedly elevated HDL-C levels but, paradoxically, increased atherosclerosis. The impact of SR-BI on HDL metabolism and CHD risk in humans remains unclear. Through targeted sequencing of coding regions of lipid-modifying genes in 328 individuals with extremely high plasma HDL-C levels, we identified a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI. The P376L variant impairs posttranslational processing of SR-BI and abrogates selective HDL cholesterol uptake in transfected cells, in hepatocyte-like cells derived from induced pluripotent stem cells from the homozygous subject, and in mice. Large population-based studies revealed that subjects who are heterozygous carriers of the P376L variant have significantly increased levels of plasma HDL-C. P376L carriers have a profound HDL-related phenotype and an increased risk of CHD (odds ratio = 1.79, which is statistically significant).
••
National Institutes of Health1, Université de Montréal2, University of Greifswald3, Kanazawa University4, Broad Institute5, Harvard University6, Johns Hopkins University7, Icahn School of Medicine at Mount Sinai8, GlaxoSmithKline9, University of Minnesota10, University of Texas Health Science Center at Houston11, Vanderbilt University12, University of Washington13, University of North Carolina at Chapel Hill14, University of Virginia15, University of Edinburgh16, Erasmus University Rotterdam17, Imperial College London18, University of Ioannina19, University of Turku20, University of Vermont21, Morehouse School of Medicine22, University of Michigan23, Boston University24, Pennsylvania State University25, King Abdulaziz University26, Queen Mary University of London27, University of Leicester28, Glenfield Hospital29, Technische Universität München30, University of Lübeck31, University of Iceland32, Wake Forest University33, University of California, Los Angeles34, Baylor College of Medicine35, Stanford University36, University of Mississippi37, University of Tartu38, Stony Brook University39, Lund University40, Uppsala University41, University of Auckland42, Group Health Cooperative43, Greifswald University Hospital44, University of Wisconsin–Milwaukee45, Fred Hutchinson Cancer Research Center46
TL;DR: The authors' large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.
Abstract: Platelet production, maintenance, and clearance are tightly controlled processes indicative of platelets’ important roles in hemostasis and thrombosis. Platelets are common targets for primary and secondary prevention of several conditions. They are monitored clinically by complete blood counts, specifically with measurements of platelet count (PLT) and mean platelet volume (MPV). Identifying genetic effects on PLT and MPV can provide mechanistic insights into platelet biology and their role in disease. Therefore, we formed the Blood Cell Consortium (BCX) to perform a large-scale meta-analysis of Exomechip association results for PLT and MPV in 157,293 and 57,617 individuals, respectively. Using the low-frequency/rare coding variant-enriched Exomechip genotyping array, we sought to identify genetic variants associated with PLT and MPV. In addition to confirming 47 known PLT and 20 known MPV associations, we identified 32 PLT and 18 MPV associations not previously observed in the literature across the allele frequency spectrum, including rare large effect (FCER1A), low-frequency (IQGAP2, MAP1A, LY75), and common (ZMIZ2, SMG6, PEAR1, ARFGAP3/PACSIN2) variants. Several variants associated with PLT/MPV (PEAR1, MRVI1, PTGES3) were also associated with platelet reactivity. In concurrent BCX analyses, there was overlap of platelet-associated variants with red (MAP1A, TMPRSS6, ZMIZ2) and white (PEAR1, ZMIZ2, LY75) blood cell traits, suggesting common regulatory pathways with shared genetic architecture among these hematopoietic lineages. Our large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.
••
TL;DR: From the analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but the collective experience offers some valuable lessons for WGS initiatives.
Abstract: Massively parallel whole-genome sequencing (WGS) data have ushered in a new era in human genetics. These data are now being used to understand the role of rare variants in complex traits and to advance the goals of precision medicine. The technological and computing advances that have enabled us to generate WGS data on thousands of individuals have also outpaced our ability to perform analyses in scientifically and statistically rigorous and thoughtful ways. The past several years have witnessed the application of whole-exome sequencing (WES) to complex traits and diseases. From our analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but our collective experience offers some valuable lessons for WGS initiatives. These include caveats associated with generating automated pipelines for quality control and analysis of rare variants; the importance of studying minority populations; sample size requirements and efficient study designs for identifying rare-variant associations; and the significance of incidental findings in population-based genetic research. With the ESP as an example, we offer guidance and a framework on how to conduct a large-scale association study in the era of WGS.
••
Janina S. Ried, Janina M. Jeff1, Audrey Y. Chu2, Jennifer L. Bragg-Gresham3 +327 more•Institutions (76)
TL;DR: In this paper, the authors examined whether genetic variants affect body shape as a composite phenotype that is represented by a combination of anthropometric traits, and identified six novel loci: LEMD2 and CD47 for AvPC1, RPS6KA5/C14orf159 and GANAB for AVPC3, and ARL15 and ANP32 for Avpc4.
Abstract: Large consortia have revealed hundreds of genetic loci associated with anthropometric traits, one trait at a time. We examined whether genetic variants affect body shape as a composite phenotype that is represented by a combination of anthropometric traits. We developed an approach that calculates averaged PCs (AvPCs) representing body shape derived from six anthropometric traits (body mass index, height, weight, waist and hip circumference, waist-to-hip ratio). The first four AvPCs explain >99% of the variability, are heritable, and associate with cardiometabolic outcomes. We performed genome-wide association analyses for each body shape composite phenotype across 65 studies and meta-analysed summary statistics. We identify six novel loci: LEMD2 and CD47 for AvPC1, RPS6KA5/C14orf159 and GANAB for AvPC3, and ARL15 and ANP32 for AvPC4. Our findings highlight the value of using multiple traits to define complex phenotypes for discovery, which are not captured by single-trait analyses, and may shed light onto new pathways.
••
Mayo Clinic1, University of Texas at Brownsville2, University of Texas Southwestern Medical Center3, Yale University4, Cincinnati Children's Hospital Medical Center5, University of Texas Health Science Center at Houston6, University of Exeter7, University of Michigan8, University of Texas Health Science Center at San Antonio9, Harvard University10, University of California, San Francisco11
TL;DR: It is shown that both PSAP reduction and overexpression lead to significantly elevated extracellular P GRN levels, and PSAP-induced changes in PGRN levels and oligomerization replicate in human-derived fibroblasts obtained from a GRN mutation carrier, further supporting PSAP as a potential PGRn-related therapeutic target.
Abstract: Progranulin (GRN) loss-of-function mutations leading to progranulin protein (PGRN) haploinsufficiency are prevalent genetic causes of frontotemporal dementia. Reports also indicated PGRN-mediated neuroprotection in models of Alzheimer's and Parkinson's disease; thus, increasing PGRN levels is a promising therapeutic for multiple disorders. To uncover novel PGRN regulators, we linked whole-genome sequence data from 920 individuals with plasma PGRN levels and identified the prosaposin (PSAP) locus as a new locus significantly associated with plasma PGRN levels. Here we show that both PSAP reduction and overexpression lead to significantly elevated extracellular PGRN levels. Intriguingly, PSAP knockdown increases PGRN monomers, whereas PSAP overexpression increases PGRN oligomers, partly through a protein-protein interaction. PSAP-induced changes in PGRN levels and oligomerization replicate in human-derived fibroblasts obtained from a GRN mutation carrier, further supporting PSAP as a potential PGRN-related therapeutic target. Future studies should focus on addressing the relevance and cellular mechanism by which PGRN oligomeric species provide neuroprotection.
••
University of Texas at Austin1, University of Michigan2, University of Texas Health Science Center at Houston3, Harvard University4, University of Exeter5, Texas Biomedical Research Institute6, Baylor College of Medicine7, University of Texas Health Science Center at San Antonio8, Yale University9, University of Western Australia10, Kuwait University11, University of Pennsylvania12
TL;DR: The GAW19 data are an expansion of the data used at GAW18, which included the family-based whole genome sequence, blood pressure, and simulated phenotypes, but not the gene expression data or the set of 1943 unrelated individuals with exome sequence.
Abstract: The Genetic Analysis Workshops (GAW) are a forum for development, testing, and comparison of statistical genetic methods and software. Each contribution to the workshop includes an application to a specified data set. Here we describe the data distributed for GAW19, which focused on analysis of human genomic and transcriptomic data. GAW19 data were donated by the T2D-GENES Consortium and the San Antonio Family Heart Study and included whole genome and exome sequences for odd-numbered autosomes, measures of gene expression, systolic and diastolic blood pressures, and related covariates in two Mexican American samples. These two samples were a collection of 20 large families with whole genome sequence and transcriptomic data and a set of 1943 unrelated individuals with exome sequence. For each sample, simulated phenotypes were constructed based on the real sequence data. ‘Functional’ genes and variants for the simulations were chosen based on observed correlations between gene expression and blood pressure. The simulations focused primarily on additive genetic models but also included a genotype-by-medication interaction. A total of 245 genes were designated as ‘functional’ in the simulations with a few genes of large effect and most genes explaining < 1 % of the trait variation. An additional phenotype, Q1, was simulated to be correlated among related individuals, based on theoretical or empirical kinship matrices, but was not associated with any sequence variants. Two hundred replicates of the phenotypes were simulated. The GAW19 data are an expansion of the data used at GAW18, which included the family-based whole genome sequence, blood pressure, and simulated phenotypes, but not the gene expression data or the set of 1943 unrelated individuals with exome sequence.
••
Broad Institute1, Harvard University2, Pennsylvania State University3, Beth Israel Deaconess Medical Center4, Mayo Clinic5, University of Michigan6, University of Virginia Health System7, University of Florida8, University of Washington9, University of Cincinnati Academic Health Center10, Jagiellonian University Medical College11, University of Graz12, Utrecht University13, Radboud University Nijmegen14, University of Brescia15, Autonomous University of Barcelona16, University of Arizona17, University of Maryland, Baltimore18, National Institutes of Health19, Wake Forest University20
TL;DR: The hypothesis that CETP DNA sequence variants associated with higher HDL‐C also increase risk for ICH is tested.
Abstract: Objective
In observational epidemiologic studies, higher plasma high-density lipoprotein cholesterol (HDL-C) has been associated with increased risk of intracerebral hemorrhage (ICH). DNA sequence variants that decrease cholesteryl ester transfer protein (CETP) gene activity increase plasma HDL-C; as such, medicines that inhibit CETP and raise HDL-C are in clinical development. Here, we test the hypothesis that CETP DNA sequence variants associated with higher HDL-C also increase risk for ICH.
Methods
We performed two candidate-gene analyses of CETP. First, we tested individual CETP variants in a discovery cohort of 1149 ICH cases and 1238 controls from 3 studies, followed by replication in 1625 cases and 1845 controls from 5 studies. Second, we constructed a genetic risk score comprised of 7 independent variants at the CETP locus and tested this score for association with HDL-C as well as ICH risk.
Results
Twelve variants within CETP demonstrated nominal association with ICH, with the strongest association at the rs173539 locus (odds ratio (OR) 1.25, standard error (SE) 0.06, p=6.0x10−4) with no heterogeneity across studies (I2=0%). This association was replicated in patients of European ancestry (p=0.03). A genetic score of CETP variants found to increase HDL-C by ∼2.85mg/dL in the Global Lipids Genetics Consortium was strongly associated with ICH risk (OR 1.86, SE 0.13, p=1.39x10−6).
Interpretation
Genetic variants in CETP associated with increased HDL-C raise the risk of ICH. Given ongoing therapeutic development in CETP inhibition and other HDL-raising strategies, further exploration of potential adverse cerebrovascular outcomes may be warranted. This article is protected by copyright. All rights reserved.
••
University of Oxford1, Wellcome Trust Centre for Human Genetics2, University of Michigan3, Wellcome Trust Sanger Institute4, University of Tokyo5, University of Cambridge6, Ealing Hospital7, Harvard University8, Boston University9, Institute of Genomics and Integrative Biology10, Broad Institute11, Centre national de la recherche scientifique12, University of Texas Health Science Center at Houston13, University of North Carolina at Chapel Hill14, University of Chicago15, Texas Biomedical Research Institute16, University of California, San Francisco17, Systems Research Institute18, University of Haifa19, Albert Einstein College of Medicine20, The Chinese University of Hong Kong21, University of Mississippi Medical Center22, Jawaharlal Nehru University23, Hallym University24, Seoul National University25, National Institutes of Health26, Imperial College Healthcare27, University of Pennsylvania28, National University of Singapore29, Vanderbilt University30, Beta31, Imperial College London32, Life Sciences Institute33, University of Liverpool34
TL;DR: Transancestral fine-mapping data is undertook in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent to provide insight into the mechanisms through which type 2 diabetes association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci.
Abstract: To gain insight into potential regulatory mechanisms through which the effects of variants at four established type 2 diabetes (T2D) susceptibility loci (CDKAL1, CDKN2A-B, IGF2BP2 and KCNQ1) are mediated, we undertook transancestral fine-mapping in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent. Through high-density imputation and conditional analyses, we identified seven distinct association signals at these four loci, each with allelic effects on T2D susceptibility that were homogenous across ancestry groups. By leveraging differences in the structure of linkage disequilibrium between diverse populations, and increased sample size, we localised the variants most likely to drive each distinct association signal. We demonstrated that integration of these genetic fine-mapping data with genomic annotation can highlight potential causal regulatory elements in T2D-relevant tissues. These analyses provide insight into the mechanisms through which T2D association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci.
••
TL;DR: A new phasing algorithm, Eagle2, is introduced that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional BurrowsWheeler transform.
Abstract: Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing within a genotyped cohort, an approach that can attain high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here, we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ≈20x speedup and ≈10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2x the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.
••
TL;DR: The population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestryassociated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland, providing evidence for a sex-biased demographic history in Sardinia.
Abstract: The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.
••
TL;DR: After sequencing genes from 95 GWAS loci in participants with extremely high plasma lipid levels, one new coding variant is identified associated with TG, providing insight regarding design of similar sequencing studies with respect to sample size, follow-up, and analysis methodology.
••
Baylor College of Medicine1, Fred Hutchinson Cancer Research Center2, University of Wisconsin–Milwaukee3, University of Antioquia4, University of North Carolina at Chapel Hill5, Boston University6, National Institutes of Health7, University of California, Los Angeles8, University of Minnesota9, University of Washington10, Harvard University11, University of Mississippi12, Ohio State University13, New York University14, University of Michigan15, Broad Institute16
TL;DR: This study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.
Abstract: Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10(-8)) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10(-4)) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.
••
Memorial Sloan Kettering Cancer Center1, Fred Hutchinson Cancer Research Center2, University of Washington3, Harvard University4, University of Michigan5, University of Nantes6, German Cancer Research Center7, Kaiser Permanente8, University of Southern California9, Translational Genomics Research Institute10, University of Toronto11, New York University12, University of Melbourne13, Ontario Institute for Cancer Research14, University of Hawaii at Manoa15, Baylor College of Medicine16, Massey University17, University of Pittsburgh18, National Institutes of Health19, University of Utah20
TL;DR: The utility of integrating data from comprehensive fine-mapping with expanding publicly available genomic databases to help clarify GWAS associations and identify functional candidates that warrant more onerous laboratory follow-up is supported.
Abstract: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) associated with colorectal cancer risk. These SNPs may tag correlated variants with biological importance. Fine-mapping around GWAS loci can facilitate detection of functional candidates and additional independent risk variants. We analyzed 11,900 cases and 14,311 controls in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry. To fine-map genomic regions containing all known common risk variants, we imputed high-density genetic data from the 1000 Genomes Project. We tested single-variant associations with colorectal tumor risk for all variants spanning genomic regions 250-kb upstream or downstream of 31 GWAS-identified SNPs (index SNPs). We queried the University of California, Santa Cruz Genome Browser to examine evidence for biological function. Index SNPs did not show the strongest association signals with colorectal tumor risk in their respective genomic regions. Bioinformatics analysis of SNPs showing smaller P-values in each region revealed 21 functional candidates in 12 loci (5q31.1, 8q24, 11q13.4, 11q23, 12p13.32, 12q24.21, 14q22.2, 15q13, 18q21, 19q13.1, 20p12.3, and 20q13.33). We did not observe evidence of additional independent association signals in GWAS-identified regions. Our results support the utility of integrating data from comprehensive fine-mapping with expanding publicly available genomic databases to help clarify GWAS associations and identify functional candidates that warrant more onerous laboratory follow-up. Such efforts may aid the eventual discovery of disease-causing variant(s).
••
TL;DR: Using whole genomes and peripheral white blood cell transcriptomes from 624 Sardinian individuals, Sardinian eQTLs were identified at genes involved in malarial resistance and multiple sclerosis, reflecting the long-term epidemiological history of the island’s population.
Abstract: Identifying functional non-coding variants can enhance genome interpretation and inform novel genetic risk factors. We used whole genomes and peripheral white blood cell transcriptomes from 624 Sardinian individuals to identify non-coding variants that contribute to population, family, and individual differences in transcript abundance. We identified 21,183 independent expression quantitative trait loci (eQTLs) and 6,768 independent splicing quantitative trait loci (sQTLs) influencing 73 and 41% of all tested genes. When we compared Sardinian eQTLs to those previously identified in Europe, we identified differentiated eQTLs at genes involved in malarial resistance and multiple sclerosis, reflecting the long-term epidemiological history of the island9s population. Taking advantage of pedigree data for the population sample, we identify segregating patterns of outlier gene expression and allelic imbalance in 61 Sardinian trios. We identified 809 expression outliers (median z-score of 2.97) averaging 13.3 genes with outlier expression per individual. We then connected these outlier expression events to rare non-coding variants. Our results provide new insight into the effects of non-coding variants and their relationship to population history, traits and individual genetic risk.
••
••
TL;DR: This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension.
Abstract: The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naive multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.
••
Fred Hutchinson Cancer Research Center1, University of Michigan2, National Institutes of Health3, German Cancer Research Center4, University of Southern California5, Harvard University6, Mount Sinai Hospital, Toronto7, Medical University of Vienna8, Ontario Institute for Cancer Research9, University of Hawaii10, University of Utah11
TL;DR: Large scale whole genome sequencing with imputation into GWAS improves the understanding of the genetic architecture of colorectal cancer.
Abstract: Whole-genome sequencing (WGS) has started a new era in human genetics in which data can be used to more fully understand the role of genetic variation in common complex diseases, including the role of less frequent and rare variants and structural variation. To explore the impact of these variants on colorectal cancer risk we conducted the first large scale WGS study for colorectal cancer (CRC) including 1,961 CRC cases and 981 controls. These WGS data as well as those from the Haplotype Reference Consortium were imputed in 13,104 CRC cases and 15,521 controls with genome-wide association study (GWAS) data that are part of the Colorectal Cancer Family Registry (CCFR) and the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO). Focusing on rare and less frequent variants, insertions and deletions we observed potentially novel variants: a less frequent variant (MAF = 0.026) on chromosome 5 located in NREP/STARD4-AS1 (p = value 4E-08); and a novel rare multi-allelic variant (MAF = 0.003) on chromosome 9 near KLF9 and TRPM3 (p-value 2E-09; the other allele of this multi-allelic variant had a MAF of 0.0003 and p-value of 0.55). Furthermore, we observed an independent locus close to the known region 8q24 that was located upstream of GSDMC (MAF = 0.16, p-value 5E-08). Within the known region 8q23/EIF3H we identified several low frequency variants with similar MAF (0.0181 to 0.0204) including a 6bp deletion with p-values between 4E-08 and 1E-09 that were independent of the common variant signal in this region. In addition, we identified statistically significant (p Citation Format: Jeroen Huyghe, Sai Chen, Hyun M. Kang, Tabitha A. Harrison, Sonja I. Berndt, Stephane Bezieau, Hermann Brenner, Graham Casey, Andrew T. Chan, Jenny Chang-Claude, Gallinger J. Steven, Stephen B. Gruber, Andrea Gsur, Michael Hoffmeister, Thomas J. Hudson, Loic Le Marchand, Polly A. Newcomb, John D. Potter, Conghui Qu, Martha L. Slattery, Joshua D. Smith, Emily White, Li Hsu, Goncalo R. Abecasis, Deborah A. Nickerson, Ulrike Peters. Large scale whole genome sequencing with imputation into GWAS improves our understanding of the genetic architecture of colorectal cancer. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5230.