scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: This study identifies eight significant genetic associations with intrusive reexperiencing of trauma in post-traumatic stress disorder (PTSD) from the Million Veteran Program, a large biobank focused on US veterans.
Abstract: Post-traumatic stress disorder (PTSD) is a major problem among military veterans and civilians alike, yet its pathophysiology remains poorly understood. We performed a genome-wide association study and bioinformatic analyses, which included 146,660 European Americans and 19,983 African Americans in the US Million Veteran Program, to identify genetic risk factors relevant to intrusive reexperiencing of trauma, which is the most characteristic symptom cluster of PTSD. In European Americans, eight distinct significant regions were identified. Three regions had values of P < 5 × 10−10: CAMKV; chromosome 17 closest to KANSL1, but within a large high linkage disequilibrium region that also includes CRHR1; and TCF4. Associations were enriched with respect to the transcriptomic profiles of striatal medium spiny neurons. No significant associations were observed in the African American cohort of the sample. Results in European Americans were replicated in the UK Biobank data. These results provide new insights into the biology of PTSD in a well-powered genome-wide association study. This study identifies eight significant genetic associations with intrusive reexperiencing of trauma in post-traumatic stress disorder (PTSD) from the Million Veteran Program (MVP), a large biobank focused on US veterans.

128 citations

Journal ArticleDOI
31 Oct 2019-Cell
TL;DR: The largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from rural Uganda, finds evidence of geographically correlated fine-scale population substructure.

128 citations


Cites methods or result from "A global reference for human geneti..."

  • ...On comparing results using an older version of the ClinVar database (2014 version), we find clear evidence of ascertainment bias in the older database, with a greater number of clinically significant disease alleles/individual among Europeans compared with Africans, as have been reported before (Table S4.2; Auton et al., 2015)....

    [...]

  • ...In order tominimize ascertainment bias, we used Ugandan sequence data on 2,100 individuals called and refined across the Ugandan and 1000Genomes Project phase 3 (Auton et al., 2015) and AGVP panel (Gurdasani et al....

    [...]

  • ...We removed low complexity regions of sequence as defined by the 1000 Genomes Project, as described previously (Auton et al., 2015)....

    [...]

Posted Content
TL;DR: The results suggest that the distribution of genetic variance among the loci discovered in GWAS take a simple form that depends on one evolutionary parameter, and provide a simple interpretation for missing heritability and why it varies among traits.
Abstract: Genome-wide association studies (GWAS) in humans are revealing the genetic architecture of biomedical, life history and anthropomorphic traits, i.e., the frequencies and effect sizes of variants contributing to heritable variation in a trait. To interpret these findings, we need to understand how genetic architecture is shaped by basic population genetics processes - notably, by mutation, natural selection and genetic drift. Because many quantitative traits are subject to stabilizing selection and genetic variation that affects one trait often affects many others, we model the genetic architecture of a focal trait that arises under stabilizing selection in a multi-dimensional trait space. We solve the model for the phenotypic distribution and allelic dynamics at steady state and derive robust, closed form solutions for summaries of genetic architecture. Our results suggest that the distribution of genetic variance among the loci discovered in GWAS take a simple form that depends on one evolutionary parameter, and provide a simple interpretation for missing heritability and why it varies among traits. We test our predictions against the results of GWAS for height and body mass index (BMI) and find that they fit the data well, allowing us to make inferences about the degree of pleiotropy and the mutational target size. Our findings help to understand why GWAS for height explain more of the heritable variance than similarly-sized GWAS for BMI, and to predict how future increases in sample size will translate into explained heritability.

127 citations

Journal ArticleDOI
TL;DR: The inconsistent association between asthma and gene expression levels in blood or lung cells from older children and adults suggests that genotype effects may mediate asthma risk or protection during critical developmental windows and/or in response to relevant exposures in early life.
Abstract: Chromosome 17q12-21 remains the most highly replicated and significant asthma locus. Genotypes in the core region defined by the first genome-wide association study correlate with expression of 2 genes, ORM1-like 3 (ORMDL3) and gasdermin B (GSDMB), making these prime candidate asthma genes, although recent studies have implicated gasdermin A (GSDMA) distal to and post-GPI attachment to proteins 3 (PGAP3) proximal to the core region as independent loci. We review 10 years of studies on the 17q12-21 locus and suggest that genotype-specific risks for asthma at the proximal and distal loci are not specific to early-onset asthma and mediated by PGAP3, ORMDL3, and/or GSDMA expression. We propose that the weak and inconsistent associations of 17q single nucleotide polymorphisms with asthma in African Americans is due to the high frequency of some 17q alleles, the breakdown of linkage disequilibrium on African-derived chromosomes, and possibly different early-life asthma endotypes in these children. Finally, the inconsistent association between asthma and gene expression levels in blood or lung cells from older children and adults suggests that genotype effects may mediate asthma risk or protection during critical developmental windows and/or in response to relevant exposures in early life. Thus studies of young children and ethnically diverse populations are required to fully understand the relationship between genotype and asthma phenotype and the gene regulatory architecture at this locus.

127 citations

Journal ArticleDOI
Stéphanie Boisson-Dupuis1, Stéphanie Boisson-Dupuis2, Stéphanie Boisson-Dupuis3, Noé Ramírez-Alejo1, Zhi Li2, Zhi Li4, Etienne Patin5, Etienne Patin4, Geetha Rao6, Gaspard Kerner2, Gaspard Kerner3, Che Kang Lim7, Che Kang Lim8, Dimitry N. Krementsov9, Nicholas Hernandez1, Cindy S. Ma6, Cindy S. Ma10, Qian Zhang11, Qian Zhang1, Janet Markle1, Rubén Martínez-Barricarte1, Kathryn Payne6, Robert Fisch1, Caroline Deswarte3, Caroline Deswarte2, Joshua Halpern1, Matthieu Bouaziz3, Matthieu Bouaziz2, Jeanette Mulwa1, Durga Sivanesan12, Durga Sivanesan13, Tomi Lazarov14, Rodrigo Naves15, Patricia García16, Yuval Itan1, Yuval Itan17, Bertrand Boisson1, Bertrand Boisson3, Bertrand Boisson2, Alix Checchi2, Alix Checchi3, Fabienne Jabot-Hanin2, Fabienne Jabot-Hanin3, Aurélie Cobat2, Aurélie Cobat3, Andrea Guennoun11, Carolyn C. Jackson14, Carolyn C. Jackson1, Sevgi Pekcan, Zafer Caliskaner, Jaime Inostroza18, Beatriz Tavares Costa-Carvalho19, Jose Antonio Tavares de Albuquerque20, Humberto García-Ortiz, Lorena Orozco, Tayfun Ozcelik21, Ahmed Abid, Ismail Abderahmani Rhorfi22, Hicham Souhi, Hicham Naji Amrani, Adil Zegmout, Frederic Geissmann14, Stephen W. Michnick13, Ingrid Müller-Fleckenstein22, Bernhard Fleckenstein22, Anne Puel2, Anne Puel1, Anne Puel3, Michael J. Ciancanelli1, Nico Marr11, Hassan Abolhassani23, Hassan Abolhassani7, María Elvira Balcells16, Antonio Condino-Neto20, Alexis Strickler24, Katia Abarca16, Cory Teuscher9, Hans D. Ochs25, Ismail Reisli, Esra Hazar Sayar, Jamila El-Baghdadi, Jacinta Bustamante, Lennart Hammarström8, Lennart Hammarström7, Lennart Hammarström26, Stuart G. Tangye10, Stuart G. Tangye6, Sandra Pellegrini2, Sandra Pellegrini4, Lluis Quintana-Murci5, Lluis Quintana-Murci4, Laurent Abel2, Laurent Abel1, Laurent Abel3, Jean-Laurent Casanova 
TL;DR: Homozygosity for the catalytically inactive P1104A missense variant of the TYK2 Janus kinase selectively disrupts the induction of IFN-γ by IL-23 and is a common monogenic etiology of tuberculosis.
Abstract: Inherited IL-12Rβ1 and TYK2 deficiencies impair both IL-12- and IL-23-dependent IFN-γ immunity and are rare monogenic causes of tuberculosis, each found in less than 1/600,000 individuals. We show that homozygosity for the common TYK2 P1104A allele, which is found in about 1/600 Europeans and between 1/1000 and 1/10,000 individuals in regions other than East Asia, is more frequent in a cohort of patients with tuberculosis from endemic areas than in ethnicity-adjusted controls (P = 8.37 × 10-8; odds ratio, 89.31; 95% CI, 14.7 to 1725). Moreover, the frequency of P1104A in Europeans has decreased, from about 9% to 4.2%, over the past 4000 years, consistent with purging of this variant by endemic tuberculosis. Surprisingly, we also show that TYK2 P1104A impairs cellular responses to IL-23, but not to IFN-α, IL-10, or even IL-12, which, like IL-23, induces IFN-γ via activation of TYK2 and JAK2. Moreover, TYK2 P1104A is properly docked on cytokine receptors and can be phosphorylated by the proximal JAK, but lacks catalytic activity. Last, we show that the catalytic activity of TYK2 is essential for IL-23, but not IL-12, responses in cells expressing wild-type JAK2. In contrast, the catalytic activity of JAK2 is redundant for both IL-12 and IL-23 responses, because the catalytically inactive P1057A JAK2, which is also docked and phosphorylated, rescues signaling in cells expressing wild-type TYK2. In conclusion, homozygosity for the catalytically inactive P1104A missense variant of TYK2 selectively disrupts the induction of IFN-γ by IL-23 and is a common monogenic etiology of tuberculosis.

127 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations