scispace - formally typeset
Search or ask a question

Showing papers by "Gonçalo R. Abecasis published in 2015"


Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations


01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations


01 Jan 2015
TL;DR: This paper conducted a genome-wide association study and meta-analysis of body mass index (BMI), a measure commonly used to define obesity and assess adiposity, in up to 339,224 individuals.
Abstract: Obesity is heritable and predisposes to many diseases. To understand the genetic basis of obesity better, here we conduct a genome-wide association study and Metabochip meta-analysis of body mass index (BMI), a measure commonly used to define obesity and assess adiposity, in up to 339,224 individuals. This analysis identifies 97 BMI-associated loci (P 20% of BMI variation. Pathway analyses provide strong support for a role of the central nervous system in obesity susceptibility and implicate new genes and pathways, including those related to synaptic function, glutamate signalling, insulin secretion/action, energy metabolism, lipid biology and adipogenesis.

2,721 citations


Journal ArticleDOI
Thomas W. Winkler1, Anne E. Justice2, Mariaelisa Graff2, Llilda Barata3  +435 moreInstitutions (106)
TL;DR: In this paper, the authors performed meta-analyses of 114 studies with genome-wide chip and/or Metabochip data by the Genetic Investigation of Anthropometric Traits (GIANT) Consortium.
Abstract: Genome-wide association studies (GWAS) have identified more than 100 genetic variants contributing to BMI, a measure of body size, or waist-to-hip ratio (adjusted for BMI, WHRadjBMI), a measure of body shape. Body size and shape change as people grow older and these changes differ substantially between men and women. To systematically screen for age- and/or sex-specific effects of genetic variants on BMI and WHRadjBMI, we performed meta-analyses of 114 studies (up to 320,485 individuals of European descent) with genome-wide chip and/or Metabochip data by the Genetic Investigation of Anthropometric Traits (GIANT) Consortium. Each study tested the association of up to ~2.8M SNPs with BMI and WHRadjBMI in four strata (men ≤50y, men >50y, women ≤50y, women >50y) and summary statistics were combined in stratum-specific meta-analyses. We then screened for variants that showed age-specific effects (G x AGE), sex-specific effects (G x SEX) or age-specific effects that differed between men and women (G x AGE x SEX). For BMI, we identified 15 loci (11 previously established for main effects, four novel) that showed significant (FDR<5%) age-specific effects, of which 11 had larger effects in younger (<50y) than in older adults (≥50y). No sex-dependent effects were identified for BMI. For WHRadjBMI, we identified 44 loci (27 previously established for main effects, 17 novel) with sex-specific effects, of which 28 showed larger effects in women than in men, five showed larger effects in men than in women, and 11 showed opposite effects between sexes. No age-dependent effects were identified for WHRadjBMI. This is the first genome-wide interaction meta-analysis to report convincing evidence of age-dependent genetic effects on BMI. In addition, we confirm the sex-specificity of genetic effects on WHRadjBMI. These results may provide further insights into the biology that underlies weight change with age or the sexually dimorphism of body shape.

584 citations


Journal ArticleDOI
Ron Do1, Ron Do2, Nathan O. Stitziel3, Hong-Hee Won2, Hong-Hee Won1, Anders Berg Jørgensen4, Stefano Duga5, Pier Angelica Merlini, Adam Kiezun1, Martin Farrall6, Anuj Goel6, Or Zuk1, Illaria Guella5, Rosanna Asselta5, Leslie A. Lange7, Gina M. Peloso2, Gina M. Peloso1, Paul L. Auer8, Domenico Girelli9, Nicola Martinelli9, Deborah N. Farlow1, Mark A. DePristo1, Robert Roberts10, Alex Stewart10, Danish Saleheen11, John Danesh11, Stephen E. Epstein12, Suthesh Sivapalaratnam13, G. Kees Hovingh13, John J.P. Kastelein13, Nilesh J. Samani14, Heribert Schunkert15, Jeanette Erdmann16, Svati H. Shah17, William E. Kraus17, Robert W. Davies10, Majid Nikpay10, Christopher T. Johansen18, Jian Wang18, Robert A. Hegele18, Eliana Hechter1, Winfried März19, Winfried März20, Winfried März21, Marcus E. Kleber20, Jie Huang, Andrew D. Johnson22, Mingyao Li23, Greg L. Burke24, Myron D. Gross25, Yongmei Liu26, Themistocles L. Assimes27, Gerardo Heiss7, Ethan M. Lange7, Aaron R. Folsom25, Herman A. Taylor28, Oliviero Olivieri9, Anders Hamsten29, Robert Clarke6, Dermot F. Reilly30, Wu Yin30, Manuel A. Rivas6, Peter Donnelly6, Jacques E. Rossouw22, Bruce M. Psaty31, Bruce M. Psaty32, David M. Herrington26, James G. Wilson28, Stephen S. Rich33, Michael J. Bamshad31, Russell P. Tracy34, L. Adrienne Cupples35, Daniel J. Rader23, Muredach P. Reilly23, John A. Spertus36, Sharon Cresci3, Jaana Hartiala37, W.H. Wilson Tang38, Stanley L. Hazen38, Hooman Allayee37, Alexander P. Reiner8, Alexander P. Reiner31, Christopher S. Carlson8, Charles Kooperberg8, Rebecca D. Jackson39, Eric Boerwinkle40, Eric S. Lander1, Stephen M. Schwartz8, Stephen M. Schwartz31, David S. Siscovick31, Ruth McPherson10, Anne Tybjærg-Hansen4, Gonçalo R. Abecasis41, Hugh Watkins6, Deborah A. Nickerson31, Diego Ardissino, Shamil R. Sunyaev2, Shamil R. Sunyaev1, Christopher J. O'Donnell, David Altshuler1, David Altshuler2, Stacey Gabriel1, Sekar Kathiresan1, Sekar Kathiresan2 
05 Feb 2015-Nature
TL;DR: Kathiresan et al. as mentioned in this paper used exome sequencing of nearly 10,000 people to identify alleles associated with early-onset myocardial infarction; mutations in low-density lipoprotein receptor (LDLR) or apolipoprotein A-V (APOA5) were associated with disease risk.
Abstract: Exome sequence analysis of nearly 10,000 people was carried out to identify alleles associated with early-onset myocardial infarction; mutations in low-density lipoprotein receptor (LDLR) or apolipoprotein A-V (APOA5) were associated with disease risk, identifying the key roles of low-density lipoprotein cholesterol and metabolism of triglyceride-rich lipoproteins. Sekar Kathiresan and colleagues use exome sequencing of nearly 10,000 people to probe the contribution of multiple rare mutations within a gene to risk for myocardial infarction at a population level. They find that mutations in low-density lipoprotein receptor (LDLR) or apolipoprotein A-V (APOA5) are associated with disease risk. When compared with non-carriers, LDLR mutation carriers had higher plasma levels of LDL cholesterol, whereas APOA5 mutation carriers had higher plasma levels of triglycerides. As well as confirming that APOA5 is a myocardial infarction gene, this work informs the design and conduct of rare-variant association studies for complex diseases. Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance1,2. When MI occurs early in life, genetic inheritance is a major component to risk1. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families3,4,5,6,7,8, whereas common variants at more than 45 loci have been associated with MI risk in the population9,10,11,12,13,14,15. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol16. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl−1. At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase15,17 and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.

521 citations


Journal ArticleDOI
TL;DR: This work demonstrates how the application of software engineering techniques can help to keep imputation broadly accessible and speed up imputation by an order of magnitude compared with the previous implementation.
Abstract: Summary: Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation. Availability and implementation: minimac2, including source code, documentation, and examples is available at http://genome.sph.umich.edu/wiki/Minimac2 Contact: ude.hcimu@bshcufc, ude.hcimu@olacnog

454 citations


Journal ArticleDOI
Kyle J. Gaulton1, Kyle J. Gaulton2, Teresa Ferreira1, Yeji Lee3  +258 moreInstitutions (73)
TL;DR: This paper performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry, and identified 49 distinct association signals at these loci including five mapping in or near KCNQ1.
Abstract: We performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in or near KCNQ1. 'Credible sets' of the variants most likely to drive each distinct signal mapped predominantly to noncoding sequence, implying that association with T2D is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine mapping implicated rs10830963 as driving T2D association. We confirmed that the T2D risk allele for this SNP increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease.

370 citations


Journal ArticleDOI
TL;DR: A software tool vt normalize is presented that normalizes representation of genetic variants in the VCF and demonstrates the inconsistent representation of variants across existing sequence analysis tools and shows that the tool facilitates integration of diverse variant types and call sets.
Abstract: Summary: A genetic variant can be represented in the Variant Call Format (VCF) in multiple different ways. Inconsistent representation of variants between variant callers and analyses will magnify discrepancies between them and complicate variant filtering and duplicate removal. We present a software tool vt normalize that normalizes representation of genetic variants in the VCF. We formally define variant normalization as the consistent representation of genetic variants in an unambiguous and concise way and derive a simple general algorithm to enforce it. We demonstrate the inconsistent representation of variants across existing sequence analysis tools and show that our tool facilitates integration of diverse variant types and call sets. Availability and implementation: The source code is available for download at http://github.com/atks/vt. More detailed documentation is available at http://genome.sph.umich.edu/wiki/Variant_Normalization. Contact: ude.hcimu@gnakmh Supplementary information: Supplementary data are available at Bioinformatics online.

363 citations


Journal ArticleDOI
Marleen H. M. de Moor1, Stéphanie Martine van den Berg2, Karin J. H. Verweij1, Karin J. H. Verweij3, Robert F. Krueger4, Michelle Luciano5, Alejandro Arias Vasquez6, Lindsay K. Matteson4, Jaime Derringer7, Tõnu Esko8, Najaf Amin9, Scott D. Gordon3, Narelle K. Hansell3, Amy B. Hart10, Ilkka Seppälä, Jennifer E. Huffman5, Bettina Konte11, Jari Lahti12, Minyoung Lee13, Michael B. Miller4, Teresa Nutile14, Toshiko Tanaka15, Alexander Teumer16, Alexander Viktorin17, Juho Wedenoja12, Gonçalo R. Abecasis18, Daniel E. Adkins13, Arpana Agrawal19, Jüri Allik8, Jüri Allik20, Katja Appel16, Timothy B. Bigdeli13, Fabio Busonero13, Harry Campbell5, Paul T. Costa21, George Davey Smith22, Gail Davies5, Harriet de Wit10, Jun Ding15, Barbara E. Engelhardt23, Johan G. Eriksson, Iryna O. Fedko1, Luigi Ferrucci15, Barbara Franke6, Ina Giegling11, Richard A. Grucza19, Annette M. Hartmann11, Andrew C. Heath19, Kati Heinonen12, Anjali K. Henders3, Georg Homuth16, Jouke-Jan Hottenga1, William G. Iacono4, Joost G. E. Janzing6, Markus Jokela12, Robert Karlsson17, John P. Kemp22, John P. Kemp24, Matthew G. Kirkpatrick10, Antti Latvala25, Antti Latvala12, Terho Lehtimäki, David C. Liewald5, Pamela A. F. Madden19, Chiara Magri26, Patrik K. E. Magnusson17, Jonathan Marten5, Andrea Maschio27, Sarah E. Medland3, Evelin Mihailov8, Yuri Milaneschi1, Grant W. Montgomery3, Matthias Nauck16, Klaasjan G. Ouwens1, Aarno Palotie28, Aarno Palotie12, Erik Pettersson17, Ozren Polasek29, Yong Qian15, Laura Pulkki-Råback12, Olli T. Raitakari30, Anu Realo8, Richard J. Rose31, Daniela Ruggiero14, Carsten Oliver Schmidt16, Wendy S. Slutske32, Rossella Sorice14, John M. Starr5, Beate St Pourcain22, Angelina R. Sutin33, Angelina R. Sutin15, Nicholas J. Timpson22, Holly Trochet5, Sita H. Vermeulen6, Eero Vuoksimaa12, Elisabeth Widen12, Jasper Wouda2, Jasper Wouda1, Margaret J. Wright3, Lina Zgaga5, Lina Zgaga34, David J. Porteous5, Alessandra Minelli26, Abraham A. Palmer10, Dan Rujescu11, Marina Ciullo14, Caroline Hayward5, Igor Rudan5, Andres Metspalu5, Jaakko Kaprio12, Jaakko Kaprio25, Ian J. Deary5, Katri Räikkönen12, James F. Wilson5, Liisa Keltikangas-Järvinen12, Laura J. Bierut19, John M. Hettema13, Hans Joergen Grabe13, Cornelia M. van Duijn9, David M. Evans22, David M. Evans24, David Schlessinger15, N. L. Pedersen14, Antonio Terracciano33, Matt McGue35, Matt McGue4, Brenda W.J.H. Penninx1, Nicholas G. Martin3, Dorret I. Boomsma1 
TL;DR: This study identifies a novel locus for neuroticism located in a known gene that has been associated with bipolar disorder and schizophrenia in previous studies and shows that neuroticism is influenced by many genetic variants of small effect that are either common or tagged by common variants.
Abstract: Importance Neuroticism is a pervasive risk factor for psychiatric conditions. It genetically overlaps with major depressive disorder (MDD) and is therefore an important phenotype for psychiatric genetics. The Genetics of Personality Consortium has created a resource for genome-wide association analyses of personality traits in more than 63 000 participants (including MDD cases). Objectives To identify genetic variants associated with neuroticism by performing a meta-analysis of genome-wide association results based on 1000 Genomes imputation; to evaluate whether common genetic variants as assessed by single-nucleotide polymorphisms (SNPs) explain variation in neuroticism by estimating SNP-based heritability; and to examine whether SNPs that predict neuroticism also predict MDD. Design, Setting, and Participants Genome-wide association meta-analysis of 30 cohorts with genome-wide genotype, personality, and MDD data from the Genetics of Personality Consortium. The study included 63 661 participants from 29 discovery cohorts and 9786 participants from a replication cohort. Participants came from Europe, the United States, or Australia. Analyses were conducted between 2012 and 2014. Main Outcomes and Measures Neuroticism scores harmonized across all 29 discovery cohorts by item response theory analysis, and clinical MDD case-control status in 2 of the cohorts. Results A genome-wide significant SNP was found on 3p14 in MAGI1 (rs35855737; P = 9.26 × 10−9 in the discovery meta-analysis). This association was not replicated (P = .32), but the SNP was still genome-wide significant in the meta-analysis of all 30 cohorts (P = 2.38 × 10−8). Common genetic variants explain 15% of the variance in neuroticism. Polygenic scores based on the meta-analysis of neuroticism in 27 cohorts significantly predicted neuroticism (1.09 × 10−12 < P < .05) and MDD (4.02 × 10−9 < P < .05) in the 2 other cohorts. Conclusions and Relevance This study identifies a novel locus for neuroticism. The variant is located in a known gene that has been associated with bipolar disorder and schizophrenia in previous studies. In addition, the study shows that neuroticism is influenced by many genetic variants of small effect that are either common or tagged by common variants. These genetic variants also influence MDD. Future studies should confirm the role of the MAGI1 locus for neuroticism and further investigate the association of MAGI1 and the polygenic association to a range of other psychiatric disorders that are phenotypically correlated with neuroticism

286 citations


Journal ArticleDOI
TL;DR: GotCloud is presented, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data that automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information.
Abstract: The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.

257 citations


Journal ArticleDOI
TL;DR: It is found that many independent loci contribute to population genetic differences in height and body mass index in 9,416 individuals across 14 European countries.
Abstract: Across-nation differences in the mean values for complex traits are common, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10(-8); BMI, P < 5.95 × 10(-4)), and we find an among-population genetic correlation for tall and slender individuals (r = -0.80, 95% CI = -0.95, -0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).

Journal ArticleDOI
TL;DR: A genome-wide association study to discern differences in genetic risk factors for PsA and cutaneous-only psoriasis (PsC) and finds multiple independent susceptibility variants in the IL12B, NOS2, and IFIH1 regions.
Abstract: Psoriasis vulgaris (PsV) is a common inflammatory and hyperproliferative skin disease. Up to 30% of people with PsV eventually develop psoriatic arthritis (PsA), an inflammatory musculoskeletal condition. To discern differences in genetic risk factors for PsA and cutaneous-only psoriasis (PsC), we carried out a genome-wide association study (GWAS) of 1,430 PsA case subjects and 1,417 unaffected control subjects. Meta-analysis of this study with three other GWASs and two targeted genotyping studies, encompassing a total of 9,293 PsV case subjects, 3,061 PsA case subjects, 3,110 PsC case subjects, and 13,670 unaffected control subjects of European descent, detected 10 regions associated with PsA and 11 with PsC at genome-wide (GW) significance. Several of these association signals (IFNLR1, IFIH1, NFKBIA for PsA; TNFRSF9, LCE3C/B, TRAF3IP2, IL23A, NFKBIA for PsC) have not previously achieved GW significance. After replication, we also identified a PsV-associated SNP near CDKAL1 (rs4712528, odds ratio [OR] = 1.16, p = 8.4 × 10(-11)). Among identified psoriasis risk variants, three were more strongly associated with PsC than PsA (rs12189871 near HLA-C, p = 5.0 × 10(-19); rs4908742 near TNFRSF9, p = 0.00020; rs10888503 near LCE3A, p = 0.0014), and two were more strongly associated with PsA than PsC (rs12044149 near IL23R, p = 0.00018; rs9321623 near TNFAIP3, p = 0.00022). The PsA-specific variants were independent of previously identified psoriasis variants near IL23R and TNFAIP3. We also found multiple independent susceptibility variants in the IL12B, NOS2, and IFIH1 regions. These results provide insights into the pathogenetic similarities and differences between PsC and PsA.

Journal ArticleDOI
TL;DR: It is found that many lncRNAs, in particular those that are differentially expressed, are co-expressed with genes involved in immune related functions, and that novel lnc RNAs are enriched for localization in the epidermal differentiation complex.
Abstract: Although analysis pipelines have been developed to use RNA-seq to identify long non-coding RNAs (lncRNAs), inference of their biological and pathological relevance remains a challenge. As a result, most transcriptome studies of autoimmune disease have only assessed protein-coding transcripts. We used RNA-seq data from 99 lesional psoriatic, 27 uninvolved psoriatic, and 90 normal skin biopsies, and applied computational approaches to identify and characterize expressed lncRNAs. We detect 2,942 previously annotated and 1,080 novel lncRNAs which are expected to be skin specific. Notably, over 40% of the novel lncRNAs are differentially expressed and the proportions of differentially expressed transcripts among protein-coding mRNAs and previously-annotated lncRNAs are lower in psoriasis lesions versus uninvolved or normal skin. We find that many lncRNAs, in particular those that are differentially expressed, are co-expressed with genes involved in immune related functions, and that novel lncRNAs are enriched for localization in the epidermal differentiation complex. We also identify distinct tissue-specific expression patterns and epigenetic profiles for novel lncRNAs, some of which are shown to be regulated by cytokine treatment in cultured human keratinocytes. Together, our results implicate many lncRNAs in the immunopathogenesis of psoriasis, and our results provide a resource for lncRNA studies in other autoimmune diseases.

Journal ArticleDOI
TL;DR: The new associations would have been missed in analyses based on 1000 Genomes Project data, underlining the advantages of large-scale sequencing in this founder population of Sardinians.
Abstract: We report ∼17.6 million genetic variants from whole-genome sequencing of 2,120 Sardinians; 22% are absent from previous sequencing-based compilations and are enriched for predicted functional consequences. Furthermore, ∼76,000 variants common in our sample (frequency >5%) are rare elsewhere (<0.5% in the 1000 Genomes Project). We assessed the impact of these variants on circulating lipid levels and five inflammatory biomarkers. We observe 14 signals, including 2 major new loci, for lipid levels and 19 signals, including 2 new loci, for inflammatory markers. The new associations would have been missed in analyses based on 1000 Genomes Project data, underlining the advantages of large-scale sequencing in this founder population.

Journal ArticleDOI
TL;DR: The combined analysis, consisting of over 15,000 cases and 27,000 controls, identifies five new psoriasis susceptibility loci at genomewide significance, and demonstrates that NFKBIZ is a TRAF3IP2–dependent target of IL-17 signaling in human skin keratinocytes, thereby functionally linking two strong candidate genes.
Abstract: Psoriasis is a chronic autoimmune disease with complex genetic architecture. Previous genome-wide association studies (GWAS) and a recent meta-analysis using Immunochip data have uncovered 36 susce ...

Journal ArticleDOI
TL;DR: It is speculated that differences in autoantigen-binding repertoires between a heterozygote's two expressed HLA variants might result in additional non-additive risk effects that increase disease risk and explain moderate but significant fractions of phenotypic variance.
Abstract: Human leukocyte antigen (HLA) genes confer substantial risk for autoimmune diseases on a log-additive scale. Here we speculated that differences in autoantigen-binding repertoires between a heterozygote's two expressed HLA variants might result in additional non-additive risk effects. We tested the non-additive disease contributions of classical HLA alleles in patients and matched controls for five common autoimmune diseases: rheumatoid arthritis (n(cases) = 5,337), type 1 diabetes (T1D; n(cases) = 5,567), psoriasis vulgaris (n(cases) = 3,089), idiopathic achalasia (n(cases) = 727) and celiac disease (ncases = 11,115). In four of the five diseases, we observed highly significant, non-additive dominance effects (rheumatoid arthritis, P = 2.5 x 10(-12); T1D, P = 2.4 x 10(-10); psoriasis, P = 5.9 x 10(-6); celiac disease, P = 1.2 x 10(-87)). In three of these diseases, the non-additive dominance effects were explained by interactions between specific classical HLA alleles (rheumatoid arthritis, P = 1.8 x 10(-3); T1D, P = 8.6 x 10(-27); celiac disease, P = 6.0 x 10(-100)). These interactions generally increased disease risk and explained moderate but significant fractions of phenotypic variance (rheumatoid arthritis, 1.4%; T1D, 4.0%; celiac disease, 4.1%) beyond a simple additive model.

Journal ArticleDOI
TL;DR: LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×, and will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
Abstract: Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.

Journal ArticleDOI
TL;DR: Se sequencing-based whole-genome association analyses to evaluate the impact of rare and founder variants on stature in 6,307 individuals on the island of Sardinia show consistent with selection for shorter stature in Sardinia and a suggestive human example of the proposed 'island effect' reducing the size of large mammals.
Abstract: We report sequencing-based whole-genome association analyses to evaluate the impact of rare and founder variants on stature in 6,307 individuals on the island of Sardinia. We identify two variants with large effects. One variant, which introduces a stop codon in the GHR gene, is relatively frequent in Sardinia (0.87% versus <0.01% elsewhere) and in the homozygous state causes Laron syndrome involving short stature. We find that this variant reduces height in heterozygotes by an average of 4.2 cm (-0.64 s.d.). The other variant, in the imprinted KCNQ1 gene (minor allele frequency (MAF) = 7.7% in Sardinia versus <1% elsewhere) reduces height by an average of 1.83 cm (-0.31 s.d.) when maternally inherited. Additionally, polygenic scores indicate that known height-decreasing alleles are at systematically higher frequencies in Sardinians than would be expected by genetic drift. The findings are consistent with selection for shorter stature in Sardinia and a suggestive human example of the proposed 'island effect' reducing the size of large mammals.


Journal ArticleDOI
TL;DR: This is the first study to confirm the link between early and late menopause and breast cancer risk using genetic information and identifies both common and low-frequency coding variants associated with ANM.
Abstract: Menopause timing has a major impact on infertility and risk of disease. Younger age at natural (nonsurgical) menopause (ANM) is associated with a higher risk of osteoporosis, cardiovascular disease, and type 2 diabetes and a lower risk of breast cancer. Late menopause is associated with a higher risk of breast cancer. It is well known that the age at which women go through menopause is partly determined by genes, but the underlying mechanisms are poorly understood. Genome-wide association studies have identified 18 common genetic variants associated with ANM. These variants explain less than 5% of the variation in ANM compared with the 21% explained by all common variants on genome-wide association study arrays. This genome-wide association study was the collaborative effort of researchers from 177 institutions worldwide. The study was designed to investigate genetic variants associated with timing of menopause among a population of approximately 70,000 women of European ancestry. A dual strategy was used to identify both common and, for the first time, low-frequency coding variants associated with ANM. The causal relationship between ANM and breast cancer was investigated using a Mendelian randomization approach. Combined analysis identified 1208 single-nucleotide polymorphisms (SNPs) of a total of approximately 2.6 million that reached the genome-wide significance threshold for association with ANM. Forty-four regions with common variants were identified; among these 44 loci were 2 rare low-frequency missense alleles of large effect. A majority of ANM SNPs were enriched in DNA damage response (DDR) genes, including the first common coding variant in BRCA1 associated with any complex trait. Mendelian randomization analyses supported a causal relationship between delayed ANM and breast cancer risk; there was approximately 6% increase in risk per year; P = 3 × 10-14); increased risk with delayed menopause appeared to be mediated primarily by prolonged sex hormone exposure in a woman’s lifetime, not DDR mechanisms. This is the first study to confirm the link between early and late menopause and breast cancer risk using genetic information. Age at natural menopause genetic variants influence breast cancer risk primarily through variation in menopause timing. Although carrying higher numbers of ANM-increasing variants and enrichment in DDR genes are associated with a modest increase in breast cancer risk, the major mechanism for increased risk appears to be prolonged estrogen and/or progesterone exposure due to delayed menopause.

Journal ArticleDOI
TL;DR: In this article, the authors compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA).
Abstract: The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies.

Journal ArticleDOI
TL;DR: It is demonstrated that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function as well as common variants that explain ≥20% of the variance in TSH and FT4.
Abstract: Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

Journal ArticleDOI
Anubha Mahajan1, Xueling Sim2, Hui Jin Ng3, Alisa K. Manning4, Manuel A. Rivas1, Heather M. Highland5, Adam E. Locke2, Niels Grarup6, Hae Kyung Im7, Pablo Cingolani8, Jason Flannick9, Pierre Fontanillas4, Christian Fuchsberger2, Kyle J. Gaulton1, Tanya M. Teslovich2, N. William Rayner1, Neil R. Robertson1, Nicola L. Beer3, Jana K. Rundle3, Jette Bork-Jensen6, Claes Ladenvall10, Christine Blancher1, David Buck1, Gemma Buck1, Noël P. Burtt4, Stacey Gabriel4, Anette P. Gjesing6, Christopher J. Groves3, Mette Hollensted6, Jeroen R. Huyghe2, Anne U. Jackson2, Goo Jun2, Johanne Marie Justesen6, Massimo Mangino11, Jacquelyn Murphy4, Matt J. Neville3, Robert C. Onofrio4, Kerrin S. Small11, Heather M. Stringham2, Ann-Christine Syvänen12, Joseph Trakalo1, Gonçalo R. Abecasis2, Graeme I. Bell7, John Blangero13, Nancy J. Cox7, Ravindranath Duggirala13, Craig L. Hanis5, Mark Seielstad14, James G. Wilson15, Cramer Christensen, Ivan Brandslund16, Rainer Rauramaa, Gabriela L. Surdulescu11, Alex S. F. Doney17, Lars Lannfelt18, Allan Linneberg6, Bo Isomaa, Tiinamaija Tuomi19, Marit E. Jørgensen20, Torben Jørgensen21, Johanna Kuusisto22, Matti Uusitupa22, Veikko Salomaa23, Tim D. Spector11, Andrew D. Morris17, Colin N. A. Palmer17, Francis S. Collins23, Karen L. Mohlke24, Richard N. Bergman25, Erik Ingelsson1, Lars Lind18, Jaakko Tuomilehto26, Torben Hansen16, Richard M. Watanabe27, Inga Prokopenko1, Josée Dupuis28, Fredrik Karpe3, Leif Groop10, Markku Laakso22, Oluf Pedersen6, Jose C. Florez9, Andrew P. Morris1, David Altshuler29, James B. Meigs9, Michael Boehnke2, Mark I. McCarthy1, Cecilia M. Lindgren1, Anna L. Gloyn3 
TL;DR: In this article, the authors analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry and identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal.
Abstract: Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.

Journal ArticleDOI
TL;DR: In this paper, the levels of A1, A2 and fetal hemoglobins were analyzed concurrently for the first time concurrently, and they detected 23 associations at 10 loci: MPHOSPH9, PLTP-PCIF1, ZFPM1 (FOG1), NFIX and CCND3.
Abstract: We report genome-wide association study results for the levels of A1, A2 and fetal hemoglobins, analyzed for the first time concurrently. Integrating high-density array genotyping and whole-genome sequencing in a large general population cohort from Sardinia, we detected 23 associations at 10 loci. Five signals are due to variants at previously undetected loci: MPHOSPH9, PLTP-PCIF1, ZFPM1 (FOG1), NFIX and CCND3. Among the signals at known loci, ten are new lead variants and four are new independent signals. Half of all variants also showed pleiotropic associations with different hemoglobins, which further corroborated some of the detected associations and identified features of coordinated hemoglobin species production.

Journal ArticleDOI
TL;DR: In this current issue, Awh et al have further refined their genetic subgroups based on outcome, and furthered their claim that AREDS supplements can be harmful to individuals with certain genotypes.


Journal ArticleDOI
TL;DR: This work proposes methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses and demonstrates that, for moderate contamination levels, contamination-adjusted calls eliminate 48%-77% of the genotyping errors.
Abstract: DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.

Journal ArticleDOI
TL;DR: The results demonstrate allelic heterogeneity at IL12B and identify a high-risk MHC class I haplotype, consistent with the existence of multiple psoriasis effectors in the MHC, and find that SNP rs114255771 yielded a more significant association than any HLA allele or amino-acid residue.
Abstract: Previous studies have identified 41 independent genome-wide significant psoriasis susceptibility loci. After our first psoriasis genome-wide association study, we designed a custom genotyping array to fine-map eight genome-wide significant susceptibility loci known at that time (IL23R, IL13, IL12B, TNIP1, MHC, TNFAIP3, IL23A and RNF114) enabling genotyping of 2269 single-nucleotide polymorphisms (SNPs) in the eight loci for 2699 psoriasis cases and 2107 unaffected controls of European ancestry. We imputed these data using the latest 1000 Genome reference haplotypes, which included both indels and SNPs, to increase the marker density of the eight loci to 49 239 genetic variants. Using stepwise conditional association analysis, we identified nine independent signals distributed across six of the eight loci. In the major histocompatibility complex (MHC) region, we detected three independent signals at rs114255771 (P=2.94 × 10−74), rs6924962 (P=3.21 × 10−19) and rs892666 (P=1.11 × 10−10). Near IL12B we detected two independent signals at rs62377586 (P=7.42 × 10−16) and rs918518 (P=3.22 × 10−11). Only one signal was observed in each of the TNIP1 (rs17728338; P=4.15 × 10−13), IL13 (rs1295685; P=1.65 × 10−7), IL23A (rs61937678; P=1.82 × 10−7) and TNFAIP3 (rs642627; P=5.90 × 10−7) regions. We also imputed variants for eight HLA genes and found that SNP rs114255771 yielded a more significant association than any HLA allele or amino-acid residue. Further analysis revealed that the HLA-C*06-B*57 haplotype tagged by this SNP had a significantly higher odds ratio than other HLA-C*06-bearing haplotypes. The results demonstrate allelic heterogeneity at IL12B and identify a high-risk MHC class I haplotype, consistent with the existence of multiple psoriasis effectors in the MHC.

Journal ArticleDOI
TL;DR: This work describes situations where family‐based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values and finds that when sample sizes are limited and only a modest fraction of all trait‐associated variants can be identified, family samples are more powerful.
Abstract: Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.

Journal ArticleDOI
TL;DR: New insights are provided into the underlying genetic architecture of gene expression across tissues and a new resource to interpret function of diseases and traits associated structure variants is provided.
Abstract: Genome-wide gene expression quantitative trait loci (eQTL) mapping have been focused on single-nucleotide polymorphisms and have helped interpret findings from diseases mapping studies. The functional effect of structure variants, especially short insertions and deletions (indel) has not been well investigated. Here we impute 1,380,133 indels based on the latest 1,000 Genomes Project panel into three eQTL data sets from multiple tissues. Imputation of indels increased 9.9% power and identifies indel-specific eQTLs for 325 genes. We find introns and vicinities of UTRs are more enriched of indel eQTLs and 3.6 (single-tissue)-9.2%(multi-tissue) of previous identified eSNPs were taggers of eindels. Functional analyses identifies epigenetics marks, gene ontology categories and disease GWAS loci affected by SNPs and indels eQTLs showing tissue-consistent or tissue-specific effects. This study provides new insights into the underlying genetic architecture of gene expression across tissues and new resource to interpret function of diseases and traits associated structure variants.