scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: A large number of mammals that can potentially be infected by SARS-CoV-2 via their ACE2 proteins are identified to assist the identification of intermediate hosts for Sars-Cov-2 and hence reduce the opportunity for a future outbreak of COVID-19.
Abstract: The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of COVID-19. The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of ACE2 sequences from 410 vertebrate species, including 252 mammals, to study the conservation of ACE2 and its potential to be used as a receptor by SARS-CoV-2. We designed a five-category binding score based on the conservation properties of 25 amino acids important for the binding between ACE2 and the SARS-CoV-2 spike protein. Only mammals fell into the medium to very high categories and only catarrhine primates into the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 spike protein binding and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (frequency <0.001) variants in 10/25 binding sites. In addition, we found significant signals of selection and accelerated evolution in the ACE2 coding sequence across all mammals, and specific to the bat lineage. Our results, if confirmed by additional experimental data, may lead to the identification of intermediate host species for SARS-CoV-2, guide the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care.

466 citations


Cites methods from "A global reference for human geneti..."

  • ...All variants at the 25 residues critical for effective ACE2 binding to SARS-CoV-2-S (10, 12, 20) were compiled from dbSNP (90), 1KGP (91), Topmed (92), UK10K (93), and CHINAMAP (24)....

    [...]

Journal ArticleDOI
TL;DR: This genetic atlas provides evidence linking associated SNPs to causal genes, offers new insight into osteoporosis pathophysiology, and highlights opportunities for drug development.
Abstract: Osteoporosis is a common aging-related disease diagnosed primarily using bone mineral density (BMD). We assessed genetic determinants of BMD as estimated by heel quantitative ultrasound in 426,824 individuals, identifying 518 genome-wide significant loci (301 novel), explaining 20% of its variance. We identified 13 bone fracture loci, all associated with estimated BMD (eBMD), in ~1.2 million individuals. We then identified target genes enriched for genes known to influence bone density and strength (maximum odds ratio (OR) = 58, P = 1 × 10-75) from cell-specific features, including chromatin conformation and accessible chromatin sites. We next performed rapid-throughput skeletal phenotyping of 126 knockout mice with disruptions in predicted target genes and found an increased abnormal skeletal phenotype frequency compared to 526 unselected lines (P < 0.0001). In-depth analysis of one gene, DAAM2, showed a disproportionate decrease in bone strength relative to mineralization. This genetic atlas provides evidence linking associated SNPs to causal genes, offers new insight into osteoporosis pathophysiology, and highlights opportunities for drug development.

466 citations

Journal ArticleDOI
15 Jun 2017-Nature
TL;DR: This study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer.
Abstract: Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.

462 citations

Journal ArticleDOI
18 Apr 2019-Cell
TL;DR: A new polygenic predictor comprised of 2.1 million common variants to quantify inherited susceptibility to obesity is derived and validated and test this predictor in more than 300,000 individuals ranging from middle age to birth.

452 citations


Cites background or methods from "A global reference for human geneti..."

  • ...Derivation, Validation, and Testing of a Genome-wide Polygenic Score for Obesity A genome-wide polygenic score (GPS) for obesity was derived by starting with two independent datasets: first, a list of 2,100,302 common genetic variants and estimated effect of each on BMI from a large GWAS study (Locke et al., 2015) and second, genetic information from 503 individuals of European ancestry from the 1000 Genomes Study used to measure ‘‘linkage disequilibrium,’’ the correlation between genetic variants (1000 Genomes Project Consortium et al., 2015)....

    [...]

  • ...For our score derivation, we used summary statistics from a recent genome-wide association study (GWAS) for body mass index (BMI) including up to 339,224 individuals and a linkage disequilibrium reference panel of 503 European samples from 1000 Genomes phase 3 version 5 (Locke et al., 2015; 1000 Genomes Project Consortium et al., 2015)....

    [...]

  • ...…statistics from a recent genome-wide association study (GWAS) for body mass index (BMI) including up to 339,224 individuals and a linkage disequilibrium reference panel of 503 European samples from 1000 Genomes phase 3 version 5 (Locke et al., 2015; 1000 Genomes Project Consortium et al., 2015)....

    [...]

  • ...We then estimated heritability using the resulting association statistics and a linkage disequilibrium reference panel of individuals of European ancestry from the 1000 Genomes Study (1000 Genomes Project Consortium et al., 2015)....

    [...]

  • ...…of each on BMI from a large GWAS study (Locke et al., 2015) and second, genetic information from 503 individuals of European ancestry from the 1000 Genomes Study used to measure ‘‘linkage disequilibrium,’’ the correlation between genetic variants (1000 Genomes Project Consortium et al., 2015)....

    [...]

Journal ArticleDOI
TL;DR: The authors review the role of genetic structural variation in disease and the pathogenic potential of changes to the 3D genome.
Abstract: Structural and quantitative chromosomal rearrangements, collectively referred to as structural variation (SV), contribute to a large extent to the genetic diversity of the human genome and thus are of high relevance for cancer genetics, rare diseases and evolutionary genetics. Recent studies have shown that SVs can not only affect gene dosage but also modulate basic mechanisms of gene regulation. SVs can alter the copy number of regulatory elements or modify the 3D genome by disrupting higher-order chromatin organization such as topologically associating domains. As a result of these position effects, SVs can influence the expression of genes distant from the SV breakpoints, thereby causing disease. The impact of SVs on the 3D genome and on gene expression regulation has to be considered when interpreting the pathogenic potential of these variant types.

451 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations