scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Genome-wide association meta-analysis of data sets from Iceland and the UK identifies 16 new risk loci for osteoarthritis, including missense variants in SMO, IL11, and COL11A1.
Abstract: Osteoarthritis has a highly negative impact on quality of life because of the associated pain and loss of joint function. Here we describe the largest meta-analysis so far of osteoarthritis of the hip and the knee in samples from Iceland and the UK Biobank (including 17,151 hip osteoarthritis patients, 23,877 knee osteoarthritis patients, and more than 562,000 controls). We found 23 independent associations at 22 loci in the additive meta-analyses, of which 16 of the loci were novel: 12 for hip and 4 for knee osteoarthritis. Two associations are between rare or low-frequency missense variants and hip osteoarthritis, affecting the genes SMO (rs143083812, frequency 0.11%, odds ratio (OR) = 2.8, P = 7.9 × 10-12, p.Arg173Cys) and IL11 (rs4252548, frequency 2.08%, OR = 1.30, P = 2.1 × 10-11, p.Arg112His). A common missense variant in the COL11A1 gene also associates with hip osteoarthritis (rs3753841, frequency 61%, P = 5.2 × 10-10, OR = 1.08, p.Pro1284Leu). In addition, using a recessive model, we confirm an association between hip osteoarthritis and a variant of CHADL1 (rs117018441, P = 1.8 × 10-25, OR = 5.9). Furthermore, we observe a complex relationship between height and risk of osteoarthritis.

112 citations

Journal ArticleDOI
TL;DR: It is suggested that the Neandertal-introgressed haplotype likely reintroduced an ancestral splice variant of OAS1 encoding a more active protein, suggesting that adaptive introgression occurred as a means to resurrect adaptive variation that had been lost outside Africa.
Abstract: The 2’-5’ oligoadenylate synthetase (OAS) locus encodes for three OAS enzymes (OAS1-3) involved in innate immune response. This region harbors high amounts of Neandertal ancestry in non-African populations; yet, strong evidence of positive selection in the OAS region is still lacking. Here we used a broad array of selection tests in concert with neutral coalescent simulations to demonstrate a signal of adaptive introgression at the OAS locus. Furthermore, we characterized the functional consequences of the Neandertal haplotype in the transcriptional regulation of OAS genes at baseline and infected conditions. We found that cells from people with the Neandertal-like haplotype express lower levels of OAS3 upon infection, as well as distinct isoforms of OAS1 and OAS2. We present evidence that a Neandertal haplotype at the OAS locus was subjected to positive selection in the human population. This haplotype is significantly associated with functional consequences at the level of transcriptional regulation of innate immune responses. Notably, we suggest that the Neandertal-introgressed haplotype likely reintroduced an ancestral splice variant of OAS1 encoding a more active protein, suggesting that adaptive introgression occurred as a means to resurrect adaptive variation that had been lost outside Africa.

112 citations


Cites methods from "A global reference for human geneti..."

  • ...these results, using data from phase 3 of the 1000 Genomes Project [20], by examining the relationships of all modern human sequences at the OAS locus with the...

    [...]

  • ...For the frequency analysis, we combined these filtered datasets with 10 samples from outside of Africa (5-Eur European: CEU, FIN, GBR, IBS, TSI; 5-East Asian: CDX, CHB, CHS, JPT, KHB; see Additional file 2: Table S7 for population codes) and one sub-Saharan African sample YRI (Yoruba in Ibadan, Nigeria) from the 1000 Genomes Project Phase 3 [20], which we downloaded from (https:// mathgen....

    [...]

  • ...1000 Genomes Project [20] at the 11 NLS falling within the bounds of the three OAS genes, based on the an-...

    [...]

Journal ArticleDOI
TL;DR: Using whole genome sequencing data from 212 gastric tumors, the authors identify recurring mutations at specific CTCF binding sites that are common across gastrointestinal cancers and associated with chromosomal instability.
Abstract: Tissue-specific driver mutations in non-coding genomic regions remain undefined for most cancer types. Here, we unbiasedly analyze 212 gastric cancer (GC) whole genomes to identify recurrently mutated non-coding regions in GC. Applying comprehensive statistical approaches to accurately model background mutational processes, we observe significant enrichment of non-coding indels (insertions/deletions) in three gastric lineage-specific genes. We further identify 34 mutation hotspots, of which 11 overlap CTCF binding sites (CBSs). These CBS hotspots remain significant even after controlling for a genome-wide elevated mutation rate at CBSs. In 3 out of 4 tested CBS hotspots, mutations are nominally associated with expression change of neighboring genes. CBS hotspot mutations are enriched in tumors showing chromosomal instability, co-occur with neighboring chromosomal aberrations, and are common in gastric (25%) and colorectal (19%) tumors but rare in other cancer types. Mutational disruption of specific CBSs may thus represent a tissue-specific mechanism of tumorigenesis conserved across gastrointestinal cancers.

111 citations

Posted ContentDOI
18 Apr 2020-bioRxiv
TL;DR: A protein structural analysis was employed to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 binding, and found the number of predicted unfavorable changes significantly correlated with the binding score.
Abstract: The novel coronavirus SARS-CoV-2 is the cause of Coronavirus Disease-2019 (COVID-19). The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of 410 vertebrates, including 252 mammals, to study cross-species conservation of ACE2 and its likelihood to function as a SARS-CoV-2 receptor. We designed a five-category ranking score based on the conservation properties of 25 amino acids important for the binding between receptor and virus, classifying all species from very high to very low. Only mammals fell into the medium to very high categories, and only catarrhine primates in the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 binding, and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (<0.1%) variants in 10/25 binding sites. In addition, we observed evidence of positive selection in ACE2 in multiple species, including bats. Utilized appropriately, our results may lead to the identification of intermediate host species for SARS-CoV-2, justify the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care.

111 citations


Cites methods from "A global reference for human geneti..."

  • ...All variants at the 25 residues critical for effective ACE2 binding to SARS-CoV-2-S (10, 12, 20) were compiled from dbSNP (90), 1KGP (91), Topmed (92), UK10K (93), and CHINAMAP (24)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a method called GLIMPSE is proposed for haplotype phasing and genotype imputation of low-coverage sequencing datasets from modern reference panels, which achieves a remarkable performance across different coverages and human populations.
Abstract: Low-coverage whole-genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined because current imputation methods are computationally expensive and unable to leverage large reference panels. Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. GLIMPSE achieves imputation of a genome for less than US$1 in computational cost, considerably outperforming other methods and improving imputation accuracy over the full allele frequency range. As a proof of concept, we show that 1× coverage enables effective gene expression association studies and outperforms dense SNP arrays in rare variant burden tests. Overall, this study illustrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies. GLIMPSE is a new method for haplotype phasing and genotype imputation of low-coverage sequencing datasets from large reference panels. GLIMPSE shows remarkable performance across different coverages and human populations.

111 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations