scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Collectively, naturally-occurring polymorphisms in DPP4 are identified that negatively impact cellular entry of MERS-CoV and might thus modulate MERS development in infected patients.
Abstract: Middle East respiratory syndrome (MERS) coronavirus (MERS-CoV) causes a severe respiratory disease in humans. The MERS-CoV spike (S) glycoprotein mediates viral entry into target cells. For this, MERS-CoV S engages the host cell protein dipeptidyl peptidase 4 (DPP4, CD26) and the interface between MERS-CoV S and DPP4 has been resolved on the atomic level. Here, we asked whether naturally-occurring polymorphisms in DPP4, that alter amino acid residues required for MERS-CoV S binding, influence cellular entry of MERS-CoV. By screening of public databases, we identified fourteen such polymorphisms. Introduction of the respective mutations into DPP4 revealed that all except one (Δ346-348) were compatible with robust DPP4 expression. Four polymorphisms (K267E, K267N, A291P and Δ346-348) strongly reduced binding of MERS-CoV S to DPP4 and S protein-driven host cell entry, as determined using soluble S protein and S protein bearing rhabdoviral vectors, respectively. Two polymorphisms (K267E and A291P) were analyzed in the context of authentic MERS-CoV and were found to attenuate viral replication. Collectively, we identified naturally-occurring polymorphisms in DPP4 that negatively impact cellular entry of MERS-CoV and might thus modulate MERS development in infected patients.

78 citations

Journal ArticleDOI
Gabriel Cuellar-Partida1, Joyce Y. Tung, Nicholas Eriksson, Eva Albrecht, Fazil Aliev2, Fazil Aliev3, Ole A. Andreassen4, Inês Barroso5, Inês Barroso6, Jacques S. Beckmann7, Marco P. Boks8, Dorret I. Boomsma9, Dorret I. Boomsma10, Heather A. Boyd11, Monique M.B. Breteler12, Harry Campbell13, Daniel I. Chasman14, Lynn Cherkas15, Gail Davies13, Eco J. C. de Geus10, Eco J. C. de Geus9, Ian J. Deary13, Panos Deloukas16, Danielle M. Dick2, David L. Duffy17, Johan G. Eriksson, Tõnu Esko18, Tõnu Esko19, Bjarke Feenstra11, Frank Geller11, Christian Gieger, Ina Giegling20, Scott D. Gordon17, Jiali Han21, Thomas Hansen22, Annette M. Hartmann20, Caroline Hayward13, Kauko Heikkilä23, Andrew A. Hicks, Joel N. Hirschhorn14, Joel N. Hirschhorn19, Jouke-Jan Hottenga9, Jouke-Jan Hottenga10, Jennifer E. Huffman13, Liang-Dar Hwang1, M. Arfan Ikram24, Jaakko Kaprio23, John P. Kemp1, John P. Kemp25, Kay-Tee Khaw5, Norman Klopp26, Bettina Konte20, Zoltán Kutalik27, Zoltán Kutalik7, Jari Lahti23, Jari Lahti28, Xin Li21, Ruth J. F. Loos29, Ruth J. F. Loos5, Michelle Luciano13, Sigurdur H. Magnusson30, Massimo Mangino15, Pedro Marques-Vidal7, Nicholas G. Martin17, Wendy L. McArdle25, Mark I. McCarthy31, Mark I. McCarthy32, Carolina Medina-Gomez24, Mads Melbye11, Mads Melbye22, Mads Melbye33, Scott Melville, Andres Metspalu18, Lili Milani18, Vincent Mooser7, Mari Nelis18, Dale R. Nyholt17, Dale R. Nyholt34, Kevin S. O’Connell4, Roel A. Ophoff35, Roel A. Ophoff24, Cameron D. Palmer36, Aarno Palotie23, Teemu Palviainen23, Guillaume Paré37, Lavinia Paternoster25, Leena Peltonen23, Brenda W.J.H. Penninx10, Brenda W.J.H. Penninx9, Ozren Polasek38, Ozren Polasek39, Peter P. Pramstaller, Inga Prokopenko40, Inga Prokopenko41, Katri Räikkönen23, Samuli Ripatti23, Fernando Rivadeneira24, Igor Rudan13, Dan Rujescu20, Johannes H. Smit10, Johannes H. Smit9, George Davey Smith25, Jordan W. Smoller19, Jordan W. Smoller14, Nicole Soranzo6, Tim D. Spector15, Beate St Pourcain42, Beate St Pourcain43, Beate St Pourcain25, John M. Starr13, Hreinn Stefansson30, Stacy Steinberg30, Maris Teder-Laving18, Gudmar Thorleifsson30, Kari Stefansson30, Nicholas J. Timpson25, André G. Uitterlinden24, Cornelia M. van Duijn24, Frank J. A. van Rooij24, J.M. Vink10, J.M. Vink43, Peter Vollenweider7, Eero Vuoksimaa23, Gérard Waeber7, Nicholas J. Wareham5, Nicole M. Warrington1, Dawn M. Waterworth44, Thomas Werge22, Thomas Werge45, H.-Erich Wichmann, Elisabeth Widen23, Gonneke Willemsen10, Alan F. Wright13, Margaret J. Wright1, Mousheng Xu14, Jing Hua Zhao5, Peter Kraft14, David A. Hinds, Cecilia M. Lindgren31, Reedik Mägi18, Benjamin M. Neale19, Benjamin M. Neale14, David M. Evans25, David M. Evans1, Sarah E. Medland17, Sarah E. Medland1 
TL;DR: It is suggested that handedness is highly polygenic and that the genetic variants that predispose to left-handedness may underlie part of the association with some psychiatric disorders.
Abstract: Handedness has been extensively studied because of its relationship with language and the over-representation of left-handers in some neurodevelopmental disorders. Using data from the UK Biobank, 23andMe and the International Handedness Consortium, we conducted a genome-wide association meta-analysis of handedness (N = 1,766,671). We found 41 loci associated (P < 5 × 10−8) with left-handedness and 7 associated with ambidexterity. Tissue-enrichment analysis implicated the CNS in the aetiology of handedness. Pathways including regulation of microtubules and brain morphology were also highlighted. We found suggestive positive genetic correlations between left-handedness and neuropsychiatric traits, including schizophrenia and bipolar disorder. Furthermore, the genetic correlation between left-handedness and ambidexterity is low (rG = 0.26), which implies that these traits are largely influenced by different genetic mechanisms. Our findings suggest that handedness is highly polygenic and that the genetic variants that predispose to left-handedness may underlie part of the association with some psychiatric disorders. A genome-wide association study of 1.7 million individuals identified 41 genetic variants associated with left-handedness and 7 associated with ambidexterity. The genetic correlation between the traits was low, thereby implying different aetiologies.

78 citations

Journal ArticleDOI
TL;DR: It is hypothesized that the selective patterns observed in Europeans were driven by a change in dietary composition of fatty acids following the transition to agriculture, resulting in a lower intake of arachidonic acid and eicosapentaenoic acid, but a higher intake of linoleic acid and α-linolenic acid.
Abstract: FADS genes encode fatty acid desaturases that are important for the conversion of short chain polyunsaturated fatty acids (PUFAs) to long chain fatty acids. Prior studies indicate that the FADS genes have been subjected to strong positive selection in Africa, South Asia, Greenland, and Europe. By comparing FADS sequencing data from present-day and Bronze Age (5-3k years ago) Europeans, we identify possible targets of selection in the European population, which suggest that selection has targeted different alleles in the FADS genes in Europe than it has in South Asia or Greenland. The alleles showing the strongest changes in allele frequency since the Bronze Age show associations with expression changes and multiple lipid-related phenotypes. Furthermore, the selected alleles are associated with a decrease in linoleic acid and an increase in arachidonic and eicosapentaenoic acids among Europeans; this is an opposite effect of that observed for selected alleles in Inuit from Greenland. We show that multiple SNPs in the region affect expression levels and PUFA synthesis. Additionally, we find evidence for a gene-environment interaction influencing low-density lipoprotein (LDL) levels between alleles affecting PUFA synthesis and PUFA dietary intake: carriers of the derived allele display lower LDL cholesterol levels with a higher intake of PUFAs. We hypothesize that the selective patterns observed in Europeans were driven by a change in dietary composition of fatty acids following the transition to agriculture, resulting in a lower intake of arachidonic acid and eicosapentaenoic acid, but a higher intake of linoleic acid and α-linolenic acid.

78 citations


Cites background or methods from "A global reference for human geneti..."

  • ...We used the 1000 Genomes Project Phase 3 data (Genomes Project et al. 2015), specifically the five populations of European ancestry CEU, GBR, FIN, IBS, and TSI (often combined into a super population denoted as EUR), and an ancient DNA data set comprised of lowcoverage (0.7 average coverage)…...

    [...]

  • ...Taking advantage of genome-wide sequencing data from 101 Bronze Age individuals by Allentoft et al. (2015) as well as data from the 1000 Genomes Project (Genomes Project et al. 2015), we identify novel potential targets of selection in the FADS region....

    [...]

  • ...In addition to CEU, we also compared two other groups comprised of panels from the 1000 Genomes Project (Genomes Project et al. 2015), Northern European (FIN, GBR, CEU) and Southern European (TSI, IBS), to the Bronze Age data from (Allentoft et al. 2015) (fig....

    [...]

  • ...…of temporal allele frequency differentiation in Europe, by comparing the allele frequencies of FADS SNPs in present-day CEU from the 1000 Genomes Project (Genomes Project et al. 2015) and 54 Bronze Age Europeans (Allentoft et al. 2015) (supplementary table 7, Supplementary Material online)....

    [...]

Journal ArticleDOI
TL;DR: This study identifies the first shared genetic risk loci of AgP and CP with genome‐wide significance and highlights the role of innate and adaptive immunity in the etiology of periodontitis.
Abstract: Periodontitis is one of the most common inflammatory diseases, with a prevalence of 11% worldwide for the severe forms and an estimated heritability of 50%. The disease is characterized by destruction of the alveolar bone due to an aberrant host inflammatory response to a dysbiotic oral microbiome. Previous genome-wide association studies (GWAS) have reported several suggestive susceptibility loci. Here, we conducted a GWAS using a German and Dutch case-control sample of aggressive periodontitis (AgP, 896 cases, 7,104 controls), a rare but highly severe and early-onset form of periodontitis, validated the associations in a German sample of severe forms of the more moderate phenotype chronic periodontitis (CP) (993 cases, 1,419 controls). Positive findings were replicated in a Turkish sample of AgP (223 cases, 564 controls). A locus at SIGLEC5 (sialic acid binding Ig-like lectin 5) and a chromosomal region downstream of the DEFA1A3 locus (defensin alpha 1-3) showed association with both disease phenotypes and were associated with periodontitis at a genome-wide significance level in the pooled samples, with P = 1.09E-08 (rs4284742,-G; OR = 1.34, 95% CI = 1.21-1.48) and P = 5.48E-10 (rs2738058,-T; OR = 1.28, 95% CI = 1.18-1.38), respectively. SIGLEC5 is expressed in various myeloid immune cells and classified as an inhibitory receptor with the potential to mediate tyrosine phosphatases SHP-1/-2 dependent signaling. Alpha defensins are antimicrobial peptides with expression in neutrophils and mucosal surfaces and a role in phagocyte-mediated host defense. This study identifies the first shared genetic risk loci of AgP and CP with genome-wide significance and highlights the role of innate and adaptive immunity in the etiology of periodontitis.

77 citations


Cites methods from "A global reference for human geneti..."

  • ...Next, we used 1000G Phase 3 data of Northern Europeans from Utah (1000GP3-CEU) to flip all genotypes on the forward strand (40)....

    [...]

Proceedings ArticleDOI
TL;DR: This work proposes using de Bruijn graphs as path indexes, compressing them by merging redundant subgraphs, and encoding them with the Burrows-Wheeler transform, resulting in a fast, space-efficient, and versatile index.
Abstract: Variation graphs, which represent genetic variation within a population, are replacing sequences as reference genomes. Path indexes are one of the most important tools for working with variation graphs. They generalize text indexes to graphs, allowing one to find the paths matching the query string. We propose using de Bruijn graphs as path indexes, compressing them by merging redundant subgraphs, and encoding them with the Burrows-Wheeler transform. The resulting fast, space-efficient, and versatile index is used in the variation graph toolkit vg.

77 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations