scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
01 May 2017-Genetics
TL;DR: It is shown that the Yoruba SFS is not informative enough to discriminate between several different models of growth, and that for such simple demographies, the fit of one-parameter models outperforms the stairway plot, a recently developed model-flexible method.
Abstract: Some methods for demographic inference based on the observed genetic diversity of current populations rely on the use of summary statistics such as the Site Frequency Spectrum (SFS). Demographic models can be either model-constrained with numerous parameters, such as growth rates, timing of demographic events, and migration rates, or model-flexible, with an unbounded collection of piecewise constant sizes. It is still debated whether demographic histories can be accurately inferred based on the SFS. Here, we illustrate this theoretical issue on an example of demographic inference for an African population. The SFS of the Yoruba population (data from the 1000 Genomes Project) is fit to a simple model of population growth described with a single parameter (e.g., founding time). We infer a time to the most recent common ancestor of 1.7 million years (MY) for this population. However, we show that the Yoruba SFS is not informative enough to discriminate between several different models of growth. We also show that for such simple demographies, the fit of one-parameter models outperforms the stairway plot, a recently developed model-flexible method. The use of this method on simulated data suggests that it is biased by the noise intrinsically present in the data.

76 citations

Posted ContentDOI
07 Jun 2016-bioRxiv
TL;DR: Probes on the Infinium MethylationEPIC BeadChip are identified that are predicted to measure methylation at polymorphic sites and hybridise to multiple genomic regions and are intended to be used for quality control procedures when analysing data derived from this platform.
Abstract: Genome-wide analysis of DNA methylation has now become a relatively inexpensive technique thanks to array-based methylation profiling technologies. The recently developed Illumina Infinium MethylationEPIC BeadChip interrogates methylation at over 850,000 sites across the human genome, covering 99% of RefSeq genes. This array supersedes the widely used Infinium HumanMethylation450 BeadChip, which has permitted insights into the relationship between DNA methylation and a wide range of conditions and traits. Previous research has identified issues with certain probes on both the HumanMethylation450 BeadChip and its predecessor, the Infinium HumanMethylation27 BeadChip, which were predicted to affect array performance. These issues concerned probe-binding specificity and the presence of polymorphisms at target sites. Using in silico methods, we have identified probes on the Infinium MethylationEPIC BeadChip that are predicted to (i) measure methylation at polymorphic sites and (ii) hybridise to multiple genomic regions. We intend these resources to be used for quality control procedures when analysing data derived from this platform.

76 citations

Journal ArticleDOI
TL;DR: It is shown that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data and an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs is observed.
Abstract: Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky ( https://github.com/TheJacksonLaboratory/Picky ), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.

76 citations

Journal ArticleDOI
19 Aug 2020-Nature
TL;DR: A genome-wide association study shows that myocardial trabeculae are an important determinant of cardiac performance in the adult heart, identifies conserved pathways that regulate structural complexity and reveals the influence of trabECulae on the susceptibility to cardiovascular disease.
Abstract: The inner surfaces of the human heart are covered by a complex network of muscular strands that is thought to be a remnant of embryonic development1,2. The function of these trabeculae in adults and their genetic architecture are unknown. Here we performed a genome-wide association study to investigate image-derived phenotypes of trabeculae using the fractal analysis of trabecular morphology in 18,096 participants of the UK Biobank. We identified 16 significant loci that contain genes associated with haemodynamic phenotypes and regulation of cytoskeletal arborization3,4. Using biomechanical simulations and observational data from human participants, we demonstrate that trabecular morphology is an important determinant of cardiac performance. Through genetic association studies with cardiac disease phenotypes and Mendelian randomization, we find a causal relationship between trabecular morphology and risk of cardiovascular disease. These findings suggest a previously unknown role for myocardial trabeculae in the function of the adult heart, identify conserved pathways that regulate structural complexity and reveal the influence of the myocardial trabeculae on susceptibility to cardiovascular disease. A genome-wide association study shows that myocardial trabeculae are an important determinant of cardiac performance in the adult heart, identifies conserved pathways that regulate structural complexity and reveals the influence of trabeculae on the susceptibility to cardiovascular disease.

76 citations

Journal ArticleDOI
TL;DR: A genome-wide analysis comparing 4503 OD cases, 4173 opioid-exposed controls, and 32,500 opioid-unexp exposed controls, including participants of European and African descent found variants associated with OE, which highlights the difference between dependence and exposure and the importance of considering the definition of controls in studies of addiction.
Abstract: To provide insights into the biology of opioid dependence (OD) and opioid use (i.e., exposure, OE), we completed a genome-wide analysis comparing 4503 OD cases, 4173 opioid-exposed controls, and 32,500 opioid-unexposed controls, including participants of European and African descent (EUR and AFR, respectively). Among the variants identified, rs9291211 was associated with OE (exposed vs. unexposed controls; EUR z = −5.39, p = 7.2 × 10–8). This variant regulates the transcriptomic profiles of SLC30A9 and BEND4 in multiple brain tissues and was previously associated with depression, alcohol consumption, and neuroticism. A phenome-wide scan of rs9291211 in the UK Biobank (N > 360,000) found association of this variant with propensity to use dietary supplements (p = 1.68 × 10–8). With respect to the same OE phenotype in the gene-based analysis, we identified SDCCAG8 (EUR + AFR z = 4.69, p = 10–6), which was previously associated with educational attainment, risk-taking behaviors, and schizophrenia. In addition, rs201123820 showed a genome-wide significant difference between OD cases and unexposed controls (AFR z = 5.55, p = 2.9 × 10–8) and a significant association with musculoskeletal disorders in the UK Biobank (p = 4.88 × 10–7). A polygenic risk score (PRS) based on a GWAS of risk-tolerance (n = 466,571) was positively associated with OD (OD vs. unexposed controls, p = 8.1 × 10–5; OD cases vs. exposed controls, p = 0.054) and OE (exposed vs. unexposed controls, p = 3.6 × 10–5). A PRS based on a GWAS of neuroticism (n = 390,278) was positively associated with OD (OD vs. unexposed controls, p = 3.2 × 10–5; OD vs. exposed controls, p = 0.002) but not with OE (p = 0.67). Our analyses highlight the difference between dependence and exposure and the importance of considering the definition of controls in studies of addiction.

76 citations


Cites methods from "A global reference for human geneti..."

  • ...The 1000 Genomes Project Phase 3 reference panel [19] was used as a reference for the ancestry assignment....

    [...]

  • ...Imputation was performed using SHAPEIT2 [20] and IMPUTE2 [21], and the 1000 Genomes Project Phase 3 reference panel, which includes five continental groups [19]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations