scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: There is robust evidence for an independent, causal effect of intelligence in lowering AD risk, and the causal effects of educational attainment is likely to be mediated by intelligence.
Abstract: Objectives To examine whether educational attainment and intelligence have causal effects on risk of Alzheimer's disease (AD), independently of each other. Design Two-sample univariable and multivariable Mendelian randomization (MR) to estimate the causal effects of education on intelligence and vice versa, and the total and independent causal effects of both education and intelligence on AD risk. Participants 17 008 AD cases and 37 154 controls from the International Genomics of Alzheimer's Project (IGAP) consortium. Main outcome measure Odds ratio (OR) of AD per standardized deviation increase in years of schooling (SD = 3.6 years) and intelligence (SD = 15 points on intelligence test). Results There was strong evidence of a causal, bidirectional relationship between intelligence and educational attainment, with the magnitude of effect being similar in both directions [OR for intelligence on education = 0.51 SD units, 95% confidence interval (CI): 0.49, 0.54; OR for education on intelligence = 0.57 SD units, 95% CI: 0.48, 0.66]. Similar overall effects were observed for both educational attainment and intelligence on AD risk in the univariable MR analysis; with each SD increase in years of schooling and intelligence, odds of AD were, on average, 37% (95% CI: 23-49%) and 35% (95% CI: 25-43%) lower, respectively. There was little evidence from the multivariable MR analysis that educational attainment affected AD risk once intelligence was taken into account (OR = 1.15, 95% CI: 0.68-1.93), but intelligence affected AD risk independently of educational attainment to a similar magnitude observed in the univariate analysis (OR = 0.69, 95% CI: 0.44-0.88). Conclusions There is robust evidence for an independent, causal effect of intelligence in lowering AD risk. The causal effect of educational attainment on AD risk is likely to be mediated by intelligence.

70 citations

Journal ArticleDOI
TL;DR: The marked disparity in the sampling population of mGWAS carried out to date is highlighted and the critical need for inclusion of diverse populations is drawn attention.
Abstract: The involvement of the microbiome in health and disease is well established. Microbiome genome-wide association studies (mGWAS) are used to elucidate the interaction of host genetic variation with the microbiome. The emergence of this relatively new field has been facilitated by the advent of next generation sequencing technologies that enable the investigation of the complex interaction between host genetics and microbial communities. In this paper, we review recent studies investigating host-microbiome interactions using mGWAS. Additionally, we highlight the marked disparity in the sampling population of mGWAS carried out to date and draw attention to the critical need for inclusion of diverse populations.

70 citations

Journal ArticleDOI
TL;DR: This paper proposes three practical strategies for reducing re-identification risks in beacons that manipulate the beacon such that the presence of rare alleles is obscured and budgets the number of accesses per user for each individual genome.

70 citations

Journal ArticleDOI
TL;DR: Simulation and numerical algorithms are used to show that explosive population growth, as experienced by human populations, can dramatically increase the impact of very rare alleles on trait variance, and RVATs are not robust to realistic human evolutionary forces.
Abstract: The role of rare alleles in complex phenotypes has been hotly debated, but most rare variant association tests (RVATs) do not account for the evolutionary forces that affect genetic architecture. Here, we use simulation and numerical algorithms to show that explosive population growth, as experienced by human populations, can dramatically increase the impact of very rare alleles on trait variance. We then assess the ability of RVATs to detect causal loci using simulations and human RNA-seq data. Surprisingly, we find that statistical performance is worst for phenotypes in which genetic variance is due mainly to rare alleles, and explosive population growth decreases power. Although many studies have attempted to identify causal rare variants, few have reported novel associations. This has sometimes been interpreted to mean that rare variants make negligible contributions to complex trait heritability. Our work shows that RVATs are not robust to realistic human evolutionary forces, so general conclusions about the impact of rare variants on complex traits may be premature.

70 citations

Journal ArticleDOI
TL;DR: The results suggest that RAB10 could be a promising therapeutic target for AD prevention and can be expanded and adapted to other phenotypes, thus serving as a model for future efforts to identify rare variants for AD and other complex human diseases.
Abstract: While age and the APOE e4 allele are major risk factors for Alzheimer’s disease (AD), a small percentage of individuals with these risk factors exhibit AD resilience by living well beyond 75 years of age without any clinical symptoms of cognitive decline. We used over 200 “AD resilient” individuals and an innovative, pedigree-based approach to identify genetic variants that segregate with AD resilience. First, we performed linkage analyses in pedigrees with resilient individuals and a statistical excess of AD deaths. Second, we used whole genome sequences to identify candidate SNPs in significant linkage regions. Third, we replicated SNPs from the linkage peaks that reduced risk for AD in an independent dataset and in a gene-based test. Finally, we experimentally characterized replicated SNPs. Rs142787485 in RAB10 confers significant protection against AD (p value = 0.0184, odds ratio = 0.5853). Moreover, we replicated this association in an independent series of unrelated individuals (p value = 0.028, odds ratio = 0.69) and used a gene-based test to confirm a role for RAB10 variants in modifying AD risk (p value = 0.002). Experimentally, we demonstrated that knockdown of RAB10 resulted in a significant decrease in Aβ42 (p value = 0.0003) and in the Aβ42/Aβ40 ratio (p value = 0.0001) in neuroblastoma cells. We also found that RAB10 expression is significantly elevated in human AD brains (p value = 0.04). Our results suggest that RAB10 could be a promising therapeutic target for AD prevention. In addition, our gene discovery approach can be expanded and adapted to other phenotypes, thus serving as a model for future efforts to identify rare variants for AD and other complex human diseases.

70 citations


Additional excerpts

  • ...002), COSMIC (v68) [31], dbSNP (build 138 (08/09/2013)), 1000 Genome Frequency (v3) [32], TargetScan (v6....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations