scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: It is reported that selection in the region with the second most extreme signal of positive selection in Greenlandic Inuit favored a deeply divergent haplotype that is closely related to the sequence in the Denisovan genome, and was likely introgressed from an archaic population.
Abstract: A recent study conducted the first genome-wide scan for selection in Inuit from Greenland using single nucleotide polymorphism chip data. Here, we report that selection in the region with the second most extreme signal of positive selection in Greenlandic Inuit favored a deeply divergent haplotype that is closely related to the sequence in the Denisovan genome, and was likely introgressed from an archaic population. The region contains two genes, WARS2 and TBX15, and has previously been associated with adipose tissue differentiation and body-fat distribution in humans. We show that the adaptively introgressed allele has been under selection in a much larger geographic region than just Greenland. Furthermore, it is associated with changes in expression of WARS2 and TBX15 in multiple tissues including the adrenal gland and subcutaneous adipose tissue, and with regional DNA methylation changes in TBX15.

107 citations


Cites background or methods from "A global reference for human geneti..."

  • ...Infigure1A, we show the geographic distribution of allele frequencies for one of these SNPs (rs2298080) as an example, using data from phase 3 of the 1,000 Genomes Project (Auton et al. 2015) and the Geography of Genetic Variants Browser (Marcus and Novembre 2016)....

    [...]

  • ...%) in non-American Eurasians (EURþ EASþ SAS) from phase 3 of the 1,000 Genomes Project (Auton et al. 2015)....

    [...]

  • ...…the genome into nonoverlapping 40 kb windows and computed, in each window, the number of SNPs where the Denisovan allele is at a frequency higher than 20% in Eurasians but less than 1% in Africans, using the population panels from phase 3 of the 1,000 Genomes Project (Auton et al. 2015)....

    [...]

  • ...…alleles in the high-coverage Denisova genome (Meyer et al. 2012), 4) the alleles present in a present-day human genome from the 1,000 Genomes Project (Auton et al. 2015) (HG00436) that is homozygous for the introgressed tract, 5) the alleles in a present-day human genome (HG00407) that does not…...

    [...]

  • ...FST Scan in Native Americans We extracted sequencing data in BAM format for a 100 Mbp region surrounding the putatively introgressed haplotype from the 1,000 Genomes Project data set (Auton et al. 2015)....

    [...]

Journal ArticleDOI
26 Feb 2019-JAMA
TL;DR: The findings suggest that NUDT15 genotyping may be considered prior to initiation of thiopurine therapy; however, further study including additional validation in independent cohorts is required.
Abstract: Importance: Use of thiopurines may be limited by myelosuppression. TPMT pharmacogenetic testing identifies only 25% of at-risk patients of European ancestry. Among patients of East Asian ancestry, NUDT15 variants are associated with thiopurine-induced myelosuppression (TIM). Objective: To identify genetic variants associated with TIM among patients of European ancestry with inflammatory bowel disease (IBD). Design, Setting, and Participants: Case-control study of 491 patients affected by TIM and 679 thiopurine-tolerant unaffected patients who were recruited from 89 international sites between March 2012 and November 2015. Genome-wide association studies (GWAS) and exome-wide association studies (EWAS) were conducted in patients of European ancestry. The replication cohort comprised 73 patients affected by TIM and 840 thiopurine-tolerant unaffected patients. Exposures: Genetic variants associated with TIM. Main Outcomes and Measures: Thiopurine-induced myelosuppression, defined as a decline in absolute white blood cell count to 2.5 × 109/L or less or a decline in absolute neutrophil cell count to 1.0 × 109/L or less leading to a dose reduction or drug withdrawal. Results: Among 1077 patients (398 affected and 679 unaffected; median age at IBD diagnosis, 31.0 years [interquartile range, 21.2 to 44.1 years]; 540 [50%] women; 602 [56%] diagnosed as having Crohn disease), 919 (311 affected and 608 unaffected) were included in the GWAS analysis and 961 (328 affected and 633 unaffected) in the EWAS analysis. The GWAS analysis confirmed association of TPMT (chromosome 6, rs11969064) with TIM (30.5% [95/311] affected vs 16.4% [100/608] unaffected patients; odds ratio [OR], 2.3 [95% CI, 1.7 to 3.1], P = 5.2 × 10-9). The EWAS analysis demonstrated an association with an in-frame deletion in NUDT15 (chromosome 13, rs746071566) and TIM (5.8% [19/328] affected vs 0.2% [1/633] unaffected patients; OR, 38.2 [95% CI, 5.1 to 286.1], P = 1.3 × 10-8), which was replicated in a different cohort (2.7% [2/73] affected vs 0.2% [2/840] unaffected patients; OR, 11.8 [95% CI, 1.6 to 85.0], P = .03). Carriage of any of 3 coding NUDT15 variants was associated with an increased risk (OR, 27.3 [95% CI, 9.3 to 116.7], P = 1.1 × 10-7) of TIM, independent of TPMT genotype and thiopurine dose. Conclusions and Relevance: Among patients of European ancestry with IBD, variants in NUDT15 were associated with increased risk of TIM. These findings suggest that NUDT15 genotyping may be considered prior to initiation of thiopurine therapy; however, further study including additional validation in independent cohorts is required.

106 citations

Journal ArticleDOI
TL;DR: The results demonstrate that comprehensive catalogs of GxE interactions are indispensable to thoroughly annotate genes and bridge epidemiological and genome-wide association studies.
Abstract: Gene-by-environment (GxE) interactions determine common disease risk factors and biomedically relevant complex traits. However, quantifying how the environment modulates genetic effects on human quantitative phenotypes presents unique challenges. Environmental covariates are complex and difficult to measure and control at the organismal level, as found in GWAS and epidemiological studies. An alternative approach focuses on the cellular environment using in vitro treatments as a proxy for the organismal environment. These cellular environments simplify the organism-level environmental exposures to provide a tractable influence on subcellular phenotypes, such as gene expression. Expression quantitative trait loci (eQTL) mapping studies identified GxE interactions in response to drug treatment and pathogen exposure. However, eQTL mapping approaches are infeasible for large-scale analysis of multiple cellular environments. Recently, allele-specific expression (ASE) analysis emerged as a powerful tool to identify GxE interactions in gene expression patterns by exploiting naturally occurring environmental exposures. Here we characterized genetic effects on the transcriptional response to 50 treatments in five cell types. We discovered 1455 genes with ASE (FDR < 10%) and 215 genes with GxE interactions. We demonstrated a major role for GxE interactions in complex traits. Genes with a transcriptional response to environmental perturbations showed sevenfold higher odds of being found in GWAS. Additionally, 105 genes that indicated GxE interactions (49%) were identified by GWAS as associated with complex traits. Examples include GIPR-caffeine interaction and obesity and include LAMP3-selenium interaction and Parkinson disease. Our results demonstrate that comprehensive catalogs of GxE interactions are indispensable to thoroughly annotate genes and bridge epidemiological and genome-wide association studies.

106 citations


Cites background from "A global reference for human geneti..."

  • ...…expression quantitative trait loci (reQTL) mapping studies found that SNPs associated with specific immune traits are enriched for infection reQTL and for expression quantitative trait loci (eQTL) identified only in infected cells (Barreiro et al. 2012; Fairfax et al. 2014; Lee et al. 2014)....

    [...]

Journal ArticleDOI
TL;DR: It is found that Native American ancestry components in Latin Americans correspond geographically to the present-day genetic structure of Native groups, and that sources of non-Native ancestry, and admixture timings, match documented migratory flows.
Abstract: Historical records and genetic analyses indicate that Latin Americans trace their ancestry mainly to the intermixing (admixture) of Native Americans, Europeans and Sub-Saharan Africans. Using novel haplotype-based methods, here we infer sub-continental ancestry in over 6,500 Latin Americans and evaluate the impact of regional ancestry variation on physical appearance. We find that Native American ancestry components in Latin Americans correspond geographically to the present-day genetic structure of Native groups, and that sources of non-Native ancestry, and admixture timings, match documented migratory flows. We also detect South/East Mediterranean ancestry across Latin America, probably stemming mostly from the clandestine colonial migration of Christian converts of non-European origin (Conversos). Furthermore, we find that ancestry related to highland (Central Andean) versus lowland (Mapuche) Natives is associated with variation in facial features, particularly nose morphology, and detect significant differences in allele frequencies between these groups at loci previously associated with nose morphology in this sample.

106 citations

Journal ArticleDOI
TL;DR: The RV analyses showed nonrandom distributions over the affected proteins, and different distributions were observed between aHUS and C3G that clarify their phenotypes.
Abstract: Atypical hemolytic uremic syndrome (aHUS) and C3 glomerulopathy (C3G) are associated with dysregulation and overactivation of the complement alternative pathway. Typically, gene analysis for aHUS and C3G is undertaken in small patient numbers, yet it is unclear which genes most frequently predispose to aHUS or C3G. Accordingly, we performed a six-center analysis of 610 rare genetic variants in 13 mostly complement genes (CFH, CFI, CD46, C3, CFB, CFHR1, CFHR3, CFHR4, CFHR5, CFP, PLG, DGKE, and THBD) from >3500 patients with aHUS and C3G. We report 371 novel rare variants (RVs) for aHUS and 82 for C3G. Our new interactive Database of Complement Gene Variants was used to extract allele frequency data for these 13 genes using the Exome Aggregation Consortium server as the reference genome. For aHUS, significantly more protein-altering rare variation was found in five genes CFH, CFI, CD46, C3, and DGKE than in the Exome Aggregation Consortium (allele frequency < 0.01%), thus correlating these with aHUS. For C3G, an association was only found for RVs in C3 and the N-terminal C3b-binding or C-terminal nonsurface-associated regions of CFH In conclusion, the RV analyses showed nonrandom distributions over the affected proteins, and different distributions were observed between aHUS and C3G that clarify their phenotypes.

106 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations