scispace - formally typeset
Search or ask a question

Showing papers by "Gonçalo R. Abecasis published in 2016"


Journal ArticleDOI
TL;DR: Improvements to imputation machinery are described that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools.
Abstract: Christian Fuchsberger, Goncalo Abecasis and colleagues describe a new web-based imputation service that enables rapid imputation of large numbers of samples and allows convenient access to large reference panels of sequenced individuals. Their state space reduction provides a computationally efficient solution for genotype imputation with no loss in imputation accuracy.

2,556 citations


Journal ArticleDOI
Shane A. McCarthy1, Sayantan Das2, Warren W. Kretzschmar3, Olivier Delaneau4, Andrew R. Wood5, Alexander Teumer6, Hyun Min Kang2, Christian Fuchsberger2, Petr Danecek1, Kevin Sharp3, Yang Luo1, C Sidore7, Alan Kwong2, Nicholas J. Timpson8, Seppo Koskinen, Scott I. Vrieze9, Laura J. Scott2, He Zhang2, Anubha Mahajan3, Jan H. Veldink, Ulrike Peters10, Ulrike Peters11, Carlos N. Pato12, Cornelia M. van Duijn13, Christopher E. Gillies2, Ilaria Gandin14, Massimo Mezzavilla, Arthur Gilly1, Massimiliano Cocca14, Michela Traglia, Andrea Angius7, Jeffrey C. Barrett1, D.I. Boomsma15, Kari Branham2, Gerome Breen16, Gerome Breen17, Chad M. Brummett2, Fabio Busonero7, Harry Campbell18, Andrew T. Chan19, Sai Chen2, Emily Y. Chew20, Francis S. Collins20, Laura J Corbin8, George Davey Smith8, George Dedoussis21, Marcus Dörr6, Aliki-Eleni Farmaki21, Luigi Ferrucci20, Lukas Forer22, Ross M. Fraser2, Stacey Gabriel23, Shawn Levy, Leif Groop24, Leif Groop25, Tabitha A. Harrison11, Andrew T. Hattersley5, Oddgeir L. Holmen26, Kristian Hveem26, Matthias Kretzler2, James Lee27, Matt McGue28, Thomas Meitinger29, David Melzer5, Josine L. Min8, Karen L. Mohlke30, John B. Vincent31, Matthias Nauck6, Deborah A. Nickerson10, Aarno Palotie19, Aarno Palotie23, Michele T. Pato12, Nicola Pirastu14, Melvin G. McInnis2, J. Brent Richards32, J. Brent Richards16, Cinzia Sala, Veikko Salomaa, David Schlessinger20, Sebastian Schoenherr22, P. Eline Slagboom33, Kerrin S. Small16, Tim D. Spector16, Dwight Stambolian34, Marcus A. Tuke5, Jaakko Tuomilehto, Leonard H. van den Berg, Wouter van Rheenen, Uwe Völker6, Cisca Wijmenga35, Daniela Toniolo, Eleftheria Zeggini1, Paolo Gasparini14, Matthew G. Sampson2, James F. Wilson18, Timothy M. Frayling5, Paul I.W. de Bakker36, Morris A. Swertz35, Steven A. McCarroll19, Charles Kooperberg11, Annelot M. Dekker, David Altshuler, Cristen J. Willer2, William G. Iacono28, Samuli Ripatti25, Nicole Soranzo27, Nicole Soranzo1, Klaudia Walter1, Anand Swaroop20, Francesco Cucca7, Carl A. Anderson1, Richard M. Myers, Michael Boehnke2, Mark I. McCarthy3, Mark I. McCarthy37, Richard Durbin1, Gonçalo R. Abecasis2, Jonathan Marchini3 
TL;DR: A reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

2,149 citations


Shane A. McCarthy, Sayantan Das, Warren W. Kretzschmar, Olivier Delaneau, Andrew R. Wood, Alexander Teumer, Hyun Min Kang, Christian Fuchsberger, Petr Danecek, Kevin Sharp, Yang Luo, Carlo Sidorel, Alan Kwong, Nicholas J. Timpson, Seppo Koskinen, Scott I. Vrieze, Laura J. Scott, He Zhang, Anubha Mahajan, Jan H. Veldink, Ulrike Peters, Carlos N. Pato, Cornelia M. van Duijn, Christopher E. Gillies, Ilaria Gandin, Massimo Mezzavilla, Arthur Gilly, Massimiliano Cocca, Michela Traglia, Andrea Angius, Jeffrey C. Barrett, D.I. Boomsma, Kari Branham, Gerome Breen, Chad M. Brummett, Fabio Busonero, Harry Campbell, Andrew T. Chan, Sai Che, Emily Y. Chew, Francis S. Collins, Laura J Corbin, George Davey Smith, George Dedoussis, Marcus Dörr, Aliki-Eleni Farmaki, Luigi Ferrucci, Lukas Forer, Ross M. Fraser, Stacey Gabriel, Shawn Levy, Leif Groop, Tabitha A. Harrison, Andrew T. Hattersley, Oddgeir L. Holmen, Kristian Hveem, Matthias Kretzler, James Lee, Matt McGue, Thomas Meitinger, David Melzer, Josine L. Min, Karen L. Mohlke, John B. Vincent, Matthias Nauck, Deborah A. Nickerson, Aarno Palotie, Michele T. Pato, Nicola Pirastu, Melvin G. McInnis, J. Brent Richards, Cinzia Sala, Veikko Salomaa, David Schlessinger, Sebastian Schoenherr, P. Eline Slagboom, Kerrin S. Small, Tim D. Spector, Dwight Stambolian, Marcus A. Tuke, Jaakko Tuomilehto, Leonard H. van den Berg, Wouter van Rheenen, Uwe Völker, Cisca Wijmenga, Daniela Toniolo, Eleftheria Zeggini, Paolo Gasparini, Matthew G. Sampson, James F. Wilson, Timothy M. Frayling, Paul I.W. de Bakker, Morris A. Swertz, Steven A. McCarroll, Charles Kooperberg, Annelot M. Dekker, David Altshuler, Cristen J. Willer, William G. Iacono, Samuli Ripatti, Nicole Soranzo, Klaudia Walter, Anand Swaroop, Francesco Cucca, Carl A. Anderson, Richard M. Myers, Michael Boehnke, Mark I. McCarthy, Richard Durbin, Gonçalo R. Abecasis, Jonathan Marchini 
01 Jan 2016
TL;DR: In this article, a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry is presented.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

1,261 citations


Journal ArticleDOI
TL;DR: A new phasing algorithm, Eagle2, is introduced that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform.
Abstract: Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.

1,246 citations


Journal ArticleDOI
11 Jul 2016-Nature
TL;DR: In this paper, the authors performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing for 12,940 individuals from five ancestry groups.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

866 citations


01 Jan 2016
TL;DR: Large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes, but most fell within regions previously identified by genome-wide association studies.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

698 citations


Journal ArticleDOI
11 Mar 2016-Science
TL;DR: In this paper, the authors identified a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI.
Abstract: Scavenger receptor BI (SR-BI) is the major receptor for high-density lipoprotein (HDL) cholesterol (HDL-C). In humans, high amounts of HDL-C in plasma are associated with a lower risk of coronary heart disease (CHD). Mice that have depleted Scarb1 (SR-BI knockout mice) have markedly elevated HDL-C levels but, paradoxically, increased atherosclerosis. The impact of SR-BI on HDL metabolism and CHD risk in humans remains unclear. Through targeted sequencing of coding regions of lipid-modifying genes in 328 individuals with extremely high plasma HDL-C levels, we identified a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI. The P376L variant impairs posttranslational processing of SR-BI and abrogates selective HDL cholesterol uptake in transfected cells, in hepatocyte-like cells derived from induced pluripotent stem cells from the homozygous subject, and in mice. Large population-based studies revealed that subjects who are heterozygous carriers of the P376L variant have significantly increased levels of plasma HDL-C. P376L carriers have a profound HDL-related phenotype and an increased risk of CHD (odds ratio = 1.79, which is statistically significant).

417 citations


Journal ArticleDOI
Cristian Pattaro, Alexander Teumer1, Mathias Gorski2, Audrey Y. Chu3  +732 moreInstitutions (157)
TL;DR: A meta-analysis of genome-wide association studies for estimated glomerular filtration rate suggests that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways.
Abstract: Reduced glomerular filtration rate defines chronic kidney disease and is associated with cardiovascular and all-cause mortality. We conducted a meta-analysis of genome-wide association studies for estimated glomerular filtration rate (eGFR), combining data across 133,413 individuals with replication in up to 42,166 individuals. We identify 24 new and confirm 29 previously identified loci. Of these 53 loci, 19 associate with eGFR among individuals with diabetes. Using bioinformatics, we show that identified genes at eGFR loci are enriched for expression in kidney tissues and in pathways relevant for kidney development and transmembrane transporter activity, kidney structure, and regulation of glucose metabolism. Chromatin state mapping and DNase I hypersensitivity analyses across adult tissues demonstrate preferential mapping of associated variants to regulatory regions in kidney but not extra-renal tissues. These findings suggest that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways.

409 citations


Journal ArticleDOI
TL;DR: RVTESTS is developed, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals and provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection.
Abstract: Motivation: Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data. Availability and implementation: RVTESTS is available on Linux, MacOS and Windows. Source code and executable files can be obtained at https://github.com/zhanxw/rvtests Contact: moc.liamg@wxnahz; ude.hcimu@olacnog; moc.kooltuo@uil.gnaijad Supplementary information: Supplementary data are available at Bioinformatics online.

344 citations


Journal ArticleDOI
TL;DR: A calibrated phylogenetic tree is constructed on the basis of binary single-nucleotide variants and the more complex variants onto it, estimating the number of mutations for each class and shows bursts of extreme expansion in male numbers that have occurred independently among the five continental superpopulations examined.
Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

280 citations


Journal ArticleDOI
TL;DR: Beyond CHD, genetically lowered Lp(a) levels are associated with a lower risk of peripheral vascular disease, stroke, heart failure, and aortic stenosis, and no association with 31 other disorders, including type 2 diabetes and cancer.

Journal ArticleDOI
Stéphanie Martine van den Berg1, Marleen H. M. de Moor2, Karin J. H. Verweij2, Karin J. H. Verweij3, Robert F. Krueger4, Michelle Luciano5, Alejandro Arias Vasquez, Lindsay K. Matteson4, Jaime Derringer6, Tõnu Esko7, Najaf Amin8, Scott D. Gordon3, Narelle K. Hansell3, Amy B. Hart9, Ilkka Seppälä10, Jennifer E. Huffman11, Bettina Konte12, Jari Lahti13, Minyoung Lee14, Michael B. Miller4, Teresa Nutile, Toshiko Tanaka15, Alexander Teumer16, Alexander Viktorin17, Juho Wedenoja13, Abdel Abdellaoui2, Gonçalo R. Abecasis18, Daniel E. Adkins14, Arpana Agrawal19, Jueri Allik7, Jueri Allik20, Katja Appel16, Timothy B. Bigdeli14, Fabio Busonero, Harry Campbell5, Paul T. Costa21, George Davey Smith22, Gail Davies5, Harriet de Wit9, Jun Ding15, Barbara E. Engelhardt21, Johan G. Eriksson, Iryna O. Fedko2, Luigi Ferrucci15, Barbara Franke23, Ina Giegling12, Richard A. Grucza19, Annette M. Hartmann12, Andrew C. Heath19, Kati Heinonen13, Anjali K. Henders3, Georg Homuth24, Jouke-Jan Hottenga2, William G. Iacono4, Joost G. E. Janzing23, Markus Jokela13, Robert Karlsson17, John P. Kemp22, John P. Kemp25, Matthew G. Kirkpatrick9, Antti Latvala13, Antti Latvala15, Terho Lehtimäki10, David C. Liewald5, Pamela A. F. Madden19, Chiara Magri26, Patrik K. E. Magnusson17, Jonathan Marten11, Andrea Maschio, Hamdi Mbarek2, Sarah E. Medland3, Evelin Mihailov7, Yuri Milaneschi27, Grant W. Montgomery3, Matthias Nauck16, Michel G. Nivard2, Klaasjan G. Ouwens2, Aarno Palotie28, Aarno Palotie13, Erik Pettersson17, Ozren Polasek29, Yong Qian15, Laura Pulkki-Råback13, Olli T. Raitakari30, Olli T. Raitakari31, Anu Realo7, Richard J. Rose32, Daniela Ruggiero, Carsten Oliver Schmidt16, Wendy S. Slutske33, Rossella Sorice, John M. Starr5, Beate St Pourcain, Angelina R. Sutin34, Angelina R. Sutin15, Nicholas J. Timpson22, Holly Trochet11, Sita H. Vermeulen23, Eero Vuoksimaa13, Elisabeth Widen13, Jasper Wouda1, Jasper Wouda2, Margaret J. Wright3, Lina Zgaga5, Lina Zgaga35, David J. Porteous5, Alessandra Minelli26, Abraham A. Palmer9, Dan Rujescu12, Marina Ciullo, Caroline Hayward11, Igor Rudan5, Andres Metspalu20, Andres Metspalu7, Jaakko Kaprio13, Jaakko Kaprio15, Ian J. Deary5, Katri Räikkönen13, James F. Wilson5, James F. Wilson11, Liisa Keltikangas-Järvinen13, Laura J. Bierut19, John M. Hettema14, Hans J. Grabe16, Brenda W.J.H. Penninx27, Cornelia M. van Duijn8, David M. Evans22, David Schlessinger15, Nancy L. Pedersen17, Antonio Terracciano15, Matt McGue4, Matt McGue36, Nicholas G. Martin3, Dorret I. Boomsma2 
TL;DR: A large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts shows that extraversion is a highly polygenic personality trait, with an architecture possibly different from other complex human traits, including other personality traits.
Abstract: Extraversion is a relatively stable and heritable personality trait associated with numerous psychosocial, lifestyle and health outcomes. Despite its substantial heritability, no genetic variants have been detected in previous genome-wide association (GWA) studies, which may be due to relatively small sample sizes of those studies. Here, we report on a large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts. Extraversion item data from multiple personality inventories were harmonized across inventories and cohorts. No genome-wide significant associations were found at the single nucleotide polymorphism (SNP) level but there was one significant hit at the gene level for a long non-coding RNA site (LOC101928162). Genome-wide complex trait analysis in two large cohorts showed that the additive variance explained by common SNPs was not significantly different from zero, but polygenic risk scores, weighted using linkage information, significantly predicted extraversion scores in an independent cohort. These results show that extraversion is a highly polygenic personality trait, with an architecture possibly different from other complex human traits, including other personality traits. Future studies are required to further determine which genetic variants, by what modes of gene action, constitute the heritable nature of extraversion.

Journal ArticleDOI
Pim van der Harst1, Jessica van Setten2, Niek Verweij1, Georg Vogler3  +182 moreInstitutions (54)
TL;DR: A genome-wide association meta-analysis of 4 QRS traits in up to 73,518 individuals of European ancestry provides new insights into genes and biological pathways controlling myocardial mass and may help identify novel therapeutic targets.

17 Nov 2016
TL;DR: A human genetics study sheds light on how HDL (good) cholesterol protects against cardiovascular disease by identifying a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI
Abstract: A scavenger that protects the heart Coronary heart disease is a tale of two forms of plasma cholesterol. In contrast to the well-established effects of “bad” cholesterol (LDL-C), the role of “good” cholesterol (HDL-C) is mysterious. Elevated HDL-C correlates with a lower risk of heart disease, yet drugs that raise HDL-C levels do not reduce risk. Zanoni et al. found that some people with exceptionally high levels of HDL-C carry a rare sequence variant in the gene encoding the major HDL-C receptor, scavenger receptor BI. This variant destroys the receptor's ability to take up HDL-C. Interestingly, people with this variant have a higher risk of heart disease despite having high levels of HDL-C. Science, this issue p. 1166 A human genetics study sheds light on how HDL (good) cholesterol protects against cardiovascular disease. Scavenger receptor BI (SR-BI) is the major receptor for high-density lipoprotein (HDL) cholesterol (HDL-C). In humans, high amounts of HDL-C in plasma are associated with a lower risk of coronary heart disease (CHD). Mice that have depleted Scarb1 (SR-BI knockout mice) have markedly elevated HDL-C levels but, paradoxically, increased atherosclerosis. The impact of SR-BI on HDL metabolism and CHD risk in humans remains unclear. Through targeted sequencing of coding regions of lipid-modifying genes in 328 individuals with extremely high plasma HDL-C levels, we identified a homozygote for a loss-of-function variant, in which leucine replaces proline 376 (P376L), in SCARB1, the gene encoding SR-BI. The P376L variant impairs posttranslational processing of SR-BI and abrogates selective HDL cholesterol uptake in transfected cells, in hepatocyte-like cells derived from induced pluripotent stem cells from the homozygous subject, and in mice. Large population-based studies revealed that subjects who are heterozygous carriers of the P376L variant have significantly increased levels of plasma HDL-C. P376L carriers have a profound HDL-related phenotype and an increased risk of CHD (odds ratio = 1.79, which is statistically significant).

Journal ArticleDOI
John D. Eicher1, Nathalie Chami2, Tim Kacprowski3, Akihiro Nomura4, Akihiro Nomura5, Akihiro Nomura6, Ming-Huei Chen1, Lisa R. Yanek7, Salman M. Tajuddin1, Ursula M. Schick8, Andrew J. Slater9, Nathan Pankratz10, Linda M. Polfus11, Claudia Schurmann8, Ayush Giri12, Jennifer A. Brody13, Leslie A. Lange14, Ani Manichaikul15, W. David Hill16, Raha Pazoki17, Paul Elliot18, Evangelos Evangelou18, Evangelos Evangelou19, Ioanna Tzoulaki19, Ioanna Tzoulaki18, He Gao18, Anne-Claire Vergnaud18, Rasika A. Mathias7, Diane M. Becker7, Lewis C. Becker7, Amber A. Burt13, David R. Crosslin13, Leo-Pekka Lyytikäinen, Kjell Nikus, Jussi Hernesniemi, Mika Kähönen, Emma Raitoharju, Nina Mononen, Olli T. Raitakari20, Terho Lehtimäki, Mary Cushman21, Neil A. Zakai21, Deborah A. Nickerson13, Laura M. Raffield14, Rakale C. Quarells22, Cristen J. Willer23, Gina M. Peloso6, Gina M. Peloso24, Gina M. Peloso5, Gonçalo R. Abecasis23, Dajiang J. Liu25, Panos Deloukas26, Panos Deloukas27, Nilesh J. Samani28, Nilesh J. Samani29, Heribert Schunkert30, Jeanette Erdmann31, Myriam Fornage11, Melissa A. Richard11, Jean-Claude Tardif2, John D. Rioux2, Marie-Pierre Dubé2, Simon de Denus2, Yingchang Lu8, Erwin P. Bottinger8, Ruth J. F. Loos8, Albert V. Smith32, Tamara B. Harris1, Lenore J. Launer1, Vilmundur Gudnason32, Digna R. Velez Edwards12, Eric S. Torstenson12, Yongmei Liu33, Russell P. Tracy21, Jerome I. Rotter34, Stephen S. Rich15, Heather M. Highland14, Heather M. Highland11, Eric Boerwinkle35, Eric Boerwinkle11, Jin Li36, Ethan M. Lange14, James G. Wilson37, Evelin Mihailov38, Reedik Mägi38, Joel N. Hirschhorn6, Joel N. Hirschhorn5, Andres Metspalu38, Tõnu Esko38, Tõnu Esko5, Caterina Vacchi-Suzzi39, Mike A. Nalls1, Alan B. Zonderman1, Michele K. Evans1, Gunnar Engström40, Marju Orho-Melander40, Olle Melander40, Michelle L. O'Donoghue6, Dawn M. Waterworth9, Lars Wallentin41, Harvey D. White42, James S. Floyd13, Traci M. Bartz13, Kenneth Rice13, Bruce M. Psaty43, Bruce M. Psaty13, John M. Starr16, David C. Liewald16, Caroline Hayward16, Ian J. Deary16, Andreas Greinacher44, Uwe Völker3, Thomas Thiele44, Henry Völzke44, Frank J. A. van Rooij17, André G. Uitterlinden17, Oscar H. Franco17, Abbas Dehghan17, Todd L. Edwards12, Santhi K. Ganesh23, Sekar Kathiresan5, Sekar Kathiresan6, Nauder Faraday7, Paul L. Auer45, Alexander P. Reiner46, Alexander P. Reiner13, Guillaume Lettre2, Andrew D. Johnson1 
TL;DR: The authors' large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.
Abstract: Platelet production, maintenance, and clearance are tightly controlled processes indicative of platelets’ important roles in hemostasis and thrombosis. Platelets are common targets for primary and secondary prevention of several conditions. They are monitored clinically by complete blood counts, specifically with measurements of platelet count (PLT) and mean platelet volume (MPV). Identifying genetic effects on PLT and MPV can provide mechanistic insights into platelet biology and their role in disease. Therefore, we formed the Blood Cell Consortium (BCX) to perform a large-scale meta-analysis of Exomechip association results for PLT and MPV in 157,293 and 57,617 individuals, respectively. Using the low-frequency/rare coding variant-enriched Exomechip genotyping array, we sought to identify genetic variants associated with PLT and MPV. In addition to confirming 47 known PLT and 20 known MPV associations, we identified 32 PLT and 18 MPV associations not previously observed in the literature across the allele frequency spectrum, including rare large effect (FCER1A), low-frequency (IQGAP2, MAP1A, LY75), and common (ZMIZ2, SMG6, PEAR1, ARFGAP3/PACSIN2) variants. Several variants associated with PLT/MPV (PEAR1, MRVI1, PTGES3) were also associated with platelet reactivity. In concurrent BCX analyses, there was overlap of platelet-associated variants with red (MAP1A, TMPRSS6, ZMIZ2) and white (PEAR1, ZMIZ2, LY75) blood cell traits, suggesting common regulatory pathways with shared genetic architecture among these hematopoietic lineages. Our large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.

Journal ArticleDOI
TL;DR: From the analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but the collective experience offers some valuable lessons for WGS initiatives.
Abstract: Massively parallel whole-genome sequencing (WGS) data have ushered in a new era in human genetics. These data are now being used to understand the role of rare variants in complex traits and to advance the goals of precision medicine. The technological and computing advances that have enabled us to generate WGS data on thousands of individuals have also outpaced our ability to perform analyses in scientifically and statistically rigorous and thoughtful ways. The past several years have witnessed the application of whole-exome sequencing (WES) to complex traits and diseases. From our analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but our collective experience offers some valuable lessons for WGS initiatives. These include caveats associated with generating automated pipelines for quality control and analysis of rare variants; the importance of studying minority populations; sample size requirements and efficient study designs for identifying rare-variant associations; and the significance of incidental findings in population-based genetic research. With the ESP as an example, we offer guidance and a framework on how to conduct a large-scale association study in the era of WGS.

Journal ArticleDOI
TL;DR: In this paper, the authors examined whether genetic variants affect body shape as a composite phenotype that is represented by a combination of anthropometric traits, and identified six novel loci: LEMD2 and CD47 for AvPC1, RPS6KA5/C14orf159 and GANAB for AVPC3, and ARL15 and ANP32 for Avpc4.
Abstract: Large consortia have revealed hundreds of genetic loci associated with anthropometric traits, one trait at a time. We examined whether genetic variants affect body shape as a composite phenotype that is represented by a combination of anthropometric traits. We developed an approach that calculates averaged PCs (AvPCs) representing body shape derived from six anthropometric traits (body mass index, height, weight, waist and hip circumference, waist-to-hip ratio). The first four AvPCs explain >99% of the variability, are heritable, and associate with cardiometabolic outcomes. We performed genome-wide association analyses for each body shape composite phenotype across 65 studies and meta-analysed summary statistics. We identify six novel loci: LEMD2 and CD47 for AvPC1, RPS6KA5/C14orf159 and GANAB for AvPC3, and ARL15 and ANP32 for AvPC4. Our findings highlight the value of using multiple traits to define complex phenotypes for discovery, which are not captured by single-trait analyses, and may shed light onto new pathways.

Journal ArticleDOI
TL;DR: It is shown that both PSAP reduction and overexpression lead to significantly elevated extracellular P GRN levels, and PSAP-induced changes in PGRN levels and oligomerization replicate in human-derived fibroblasts obtained from a GRN mutation carrier, further supporting PSAP as a potential PGRn-related therapeutic target.
Abstract: Progranulin (GRN) loss-of-function mutations leading to progranulin protein (PGRN) haploinsufficiency are prevalent genetic causes of frontotemporal dementia. Reports also indicated PGRN-mediated neuroprotection in models of Alzheimer's and Parkinson's disease; thus, increasing PGRN levels is a promising therapeutic for multiple disorders. To uncover novel PGRN regulators, we linked whole-genome sequence data from 920 individuals with plasma PGRN levels and identified the prosaposin (PSAP) locus as a new locus significantly associated with plasma PGRN levels. Here we show that both PSAP reduction and overexpression lead to significantly elevated extracellular PGRN levels. Intriguingly, PSAP knockdown increases PGRN monomers, whereas PSAP overexpression increases PGRN oligomers, partly through a protein-protein interaction. PSAP-induced changes in PGRN levels and oligomerization replicate in human-derived fibroblasts obtained from a GRN mutation carrier, further supporting PSAP as a potential PGRN-related therapeutic target. Future studies should focus on addressing the relevance and cellular mechanism by which PGRN oligomeric species provide neuroprotection.

Journal ArticleDOI
TL;DR: The GAW19 data are an expansion of the data used at GAW18, which included the family-based whole genome sequence, blood pressure, and simulated phenotypes, but not the gene expression data or the set of 1943 unrelated individuals with exome sequence.
Abstract: The Genetic Analysis Workshops (GAW) are a forum for development, testing, and comparison of statistical genetic methods and software. Each contribution to the workshop includes an application to a specified data set. Here we describe the data distributed for GAW19, which focused on analysis of human genomic and transcriptomic data. GAW19 data were donated by the T2D-GENES Consortium and the San Antonio Family Heart Study and included whole genome and exome sequences for odd-numbered autosomes, measures of gene expression, systolic and diastolic blood pressures, and related covariates in two Mexican American samples. These two samples were a collection of 20 large families with whole genome sequence and transcriptomic data and a set of 1943 unrelated individuals with exome sequence. For each sample, simulated phenotypes were constructed based on the real sequence data. ‘Functional’ genes and variants for the simulations were chosen based on observed correlations between gene expression and blood pressure. The simulations focused primarily on additive genetic models but also included a genotype-by-medication interaction. A total of 245 genes were designated as ‘functional’ in the simulations with a few genes of large effect and most genes explaining < 1 % of the trait variation. An additional phenotype, Q1, was simulated to be correlated among related individuals, based on theoretical or empirical kinship matrices, but was not associated with any sequence variants. Two hundred replicates of the phenotypes were simulated. The GAW19 data are an expansion of the data used at GAW18, which included the family-based whole genome sequence, blood pressure, and simulated phenotypes, but not the gene expression data or the set of 1943 unrelated individuals with exome sequence.

Journal ArticleDOI
TL;DR: The hypothesis that CETP DNA sequence variants associated with higher HDL‐C also increase risk for ICH is tested.
Abstract: Objective In observational epidemiologic studies, higher plasma high-density lipoprotein cholesterol (HDL-C) has been associated with increased risk of intracerebral hemorrhage (ICH). DNA sequence variants that decrease cholesteryl ester transfer protein (CETP) gene activity increase plasma HDL-C; as such, medicines that inhibit CETP and raise HDL-C are in clinical development. Here, we test the hypothesis that CETP DNA sequence variants associated with higher HDL-C also increase risk for ICH. Methods We performed two candidate-gene analyses of CETP. First, we tested individual CETP variants in a discovery cohort of 1149 ICH cases and 1238 controls from 3 studies, followed by replication in 1625 cases and 1845 controls from 5 studies. Second, we constructed a genetic risk score comprised of 7 independent variants at the CETP locus and tested this score for association with HDL-C as well as ICH risk. Results Twelve variants within CETP demonstrated nominal association with ICH, with the strongest association at the rs173539 locus (odds ratio (OR) 1.25, standard error (SE) 0.06, p=6.0x10−4) with no heterogeneity across studies (I2=0%). This association was replicated in patients of European ancestry (p=0.03). A genetic score of CETP variants found to increase HDL-C by ∼2.85mg/dL in the Global Lipids Genetics Consortium was strongly associated with ICH risk (OR 1.86, SE 0.13, p=1.39x10−6). Interpretation Genetic variants in CETP associated with increased HDL-C raise the risk of ICH. Given ongoing therapeutic development in CETP inhibition and other HDL-raising strategies, further exploration of potential adverse cerebrovascular outcomes may be warranted. This article is protected by copyright. All rights reserved.

Journal ArticleDOI
TL;DR: Transancestral fine-mapping data is undertook in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent to provide insight into the mechanisms through which type 2 diabetes association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci.
Abstract: To gain insight into potential regulatory mechanisms through which the effects of variants at four established type 2 diabetes (T2D) susceptibility loci (CDKAL1, CDKN2A-B, IGF2BP2 and KCNQ1) are mediated, we undertook transancestral fine-mapping in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent. Through high-density imputation and conditional analyses, we identified seven distinct association signals at these four loci, each with allelic effects on T2D susceptibility that were homogenous across ancestry groups. By leveraging differences in the structure of linkage disequilibrium between diverse populations, and increased sample size, we localised the variants most likely to drive each distinct association signal. We demonstrated that integration of these genetic fine-mapping data with genomic annotation can highlight potential causal regulatory elements in T2D-relevant tissues. These analyses provide insight into the mechanisms through which T2D association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci.

Posted ContentDOI
07 Jul 2016-bioRxiv
TL;DR: A new phasing algorithm, Eagle2, is introduced that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional BurrowsWheeler transform.
Abstract: Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing within a genotyped cohort, an approach that can attain high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here, we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ≈20x speedup and ≈10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2x the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.

Posted ContentDOI
07 Dec 2016-bioRxiv
TL;DR: The population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestryassociated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland, providing evidence for a sex-biased demographic history in Sardinia.
Abstract: The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

Journal ArticleDOI
TL;DR: After sequencing genes from 95 GWAS loci in participants with extremely high plasma lipid levels, one new coding variant is identified associated with TG, providing insight regarding design of similar sequencing studies with respect to sample size, follow-up, and analysis methodology.

Journal ArticleDOI
TL;DR: This study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.
Abstract: Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10(-8)) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10(-4)) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.

Journal ArticleDOI
05 Jul 2016-PLOS ONE
TL;DR: The utility of integrating data from comprehensive fine-mapping with expanding publicly available genomic databases to help clarify GWAS associations and identify functional candidates that warrant more onerous laboratory follow-up is supported.
Abstract: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) associated with colorectal cancer risk. These SNPs may tag correlated variants with biological importance. Fine-mapping around GWAS loci can facilitate detection of functional candidates and additional independent risk variants. We analyzed 11,900 cases and 14,311 controls in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry. To fine-map genomic regions containing all known common risk variants, we imputed high-density genetic data from the 1000 Genomes Project. We tested single-variant associations with colorectal tumor risk for all variants spanning genomic regions 250-kb upstream or downstream of 31 GWAS-identified SNPs (index SNPs). We queried the University of California, Santa Cruz Genome Browser to examine evidence for biological function. Index SNPs did not show the strongest association signals with colorectal tumor risk in their respective genomic regions. Bioinformatics analysis of SNPs showing smaller P-values in each region revealed 21 functional candidates in 12 loci (5q31.1, 8q24, 11q13.4, 11q23, 12p13.32, 12q24.21, 14q22.2, 15q13, 18q21, 19q13.1, 20p12.3, and 20q13.33). We did not observe evidence of additional independent association signals in GWAS-identified regions. Our results support the utility of integrating data from comprehensive fine-mapping with expanding publicly available genomic databases to help clarify GWAS associations and identify functional candidates that warrant more onerous laboratory follow-up. Such efforts may aid the eventual discovery of disease-causing variant(s).

Posted ContentDOI
21 Jul 2016-bioRxiv
TL;DR: Using whole genomes and peripheral white blood cell transcriptomes from 624 Sardinian individuals, Sardinian eQTLs were identified at genes involved in malarial resistance and multiple sclerosis, reflecting the long-term epidemiological history of the island’s population.
Abstract: Identifying functional non-coding variants can enhance genome interpretation and inform novel genetic risk factors. We used whole genomes and peripheral white blood cell transcriptomes from 624 Sardinian individuals to identify non-coding variants that contribute to population, family, and individual differences in transcript abundance. We identified 21,183 independent expression quantitative trait loci (eQTLs) and 6,768 independent splicing quantitative trait loci (sQTLs) influencing 73 and 41% of all tested genes. When we compared Sardinian eQTLs to those previously identified in Europe, we identified differentiated eQTLs at genes involved in malarial resistance and multiple sclerosis, reflecting the long-term epidemiological history of the island9s population. Taking advantage of pedigree data for the population sample, we identify segregating patterns of outlier gene expression and allelic imbalance in 61 Sardinian trios. We identified 809 expression outliers (median z-score of 2.97) averaging 13.3 genes with outlier expression per individual. We then connected these outlier expression events to rare non-coding variants. Our results provide new insight into the effects of non-coding variants and their relationship to population history, traits and individual genetic risk.


Journal ArticleDOI
TL;DR: This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension.
Abstract: The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naive multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

Proceedings ArticleDOI
TL;DR: Large scale whole genome sequencing with imputation into GWAS improves the understanding of the genetic architecture of colorectal cancer.
Abstract: Whole-genome sequencing (WGS) has started a new era in human genetics in which data can be used to more fully understand the role of genetic variation in common complex diseases, including the role of less frequent and rare variants and structural variation. To explore the impact of these variants on colorectal cancer risk we conducted the first large scale WGS study for colorectal cancer (CRC) including 1,961 CRC cases and 981 controls. These WGS data as well as those from the Haplotype Reference Consortium were imputed in 13,104 CRC cases and 15,521 controls with genome-wide association study (GWAS) data that are part of the Colorectal Cancer Family Registry (CCFR) and the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO). Focusing on rare and less frequent variants, insertions and deletions we observed potentially novel variants: a less frequent variant (MAF = 0.026) on chromosome 5 located in NREP/STARD4-AS1 (p = value 4E-08); and a novel rare multi-allelic variant (MAF = 0.003) on chromosome 9 near KLF9 and TRPM3 (p-value 2E-09; the other allele of this multi-allelic variant had a MAF of 0.0003 and p-value of 0.55). Furthermore, we observed an independent locus close to the known region 8q24 that was located upstream of GSDMC (MAF = 0.16, p-value 5E-08). Within the known region 8q23/EIF3H we identified several low frequency variants with similar MAF (0.0181 to 0.0204) including a 6bp deletion with p-values between 4E-08 and 1E-09 that were independent of the common variant signal in this region. In addition, we identified statistically significant (p Citation Format: Jeroen Huyghe, Sai Chen, Hyun M. Kang, Tabitha A. Harrison, Sonja I. Berndt, Stephane Bezieau, Hermann Brenner, Graham Casey, Andrew T. Chan, Jenny Chang-Claude, Gallinger J. Steven, Stephen B. Gruber, Andrea Gsur, Michael Hoffmeister, Thomas J. Hudson, Loic Le Marchand, Polly A. Newcomb, John D. Potter, Conghui Qu, Martha L. Slattery, Joshua D. Smith, Emily White, Li Hsu, Goncalo R. Abecasis, Deborah A. Nickerson, Ulrike Peters. Large scale whole genome sequencing with imputation into GWAS improves our understanding of the genetic architecture of colorectal cancer. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5230.