scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper applied a multivariate approach that leverages genetic correlations among externalizing traits for genome-wide association analyses and identified more than 500 genetic loci for genes expressed in the brain and related to nervous system development.
Abstract: Behaviors and disorders related to self-regulation, such as substance use, antisocial behavior and attention-deficit/hyperactivity disorder, are collectively referred to as externalizing and have shared genetic liability. We applied a multivariate approach that leverages genetic correlations among externalizing traits for genome-wide association analyses. By pooling data from ~1.5 million people, our approach is statistically more powerful than single-trait analyses and identifies more than 500 genetic loci. The loci were enriched for genes expressed in the brain and related to nervous system development. A polygenic score constructed from our results predicts a range of behavioral and medical outcomes that were not part of genome-wide analyses, including traits that until now lacked well-performing polygenic scores, such as opioid use disorder, suicide, HIV infections, criminal convictions and unemployment. Our findings are consistent with the idea that persistent difficulties in self-regulation can be conceptualized as a neurodevelopmental trait with complex and far-reaching social and health correlates.

79 citations

Journal ArticleDOI
31 Jan 2020-Science
TL;DR: It is found that the genetic architecture of schizophrenia in Africans generally reflects that of Europeans but that the greater genetic variation in Africa provides more power to detect relationships of genes to phenotypes.
Abstract: Africa, the ancestral home of all modern humans, is the most informative continent for understanding the human genome and its contribution to complex disease. To better understand the genetics of schizophrenia, we studied the illness in the Xhosa population of South Africa, recruiting 909 cases and 917 age-, gender-, and residence-matched controls. Individuals with schizophrenia were significantly more likely than controls to harbor private, severely damaging mutations in genes that are critical to synaptic function, including neural circuitry mediated by the neurotransmitters glutamine, γ-aminobutyric acid, and dopamine. Schizophrenia is genetically highly heterogeneous, involving severe ultrarare mutations in genes that are critical to synaptic plasticity. The depth of genetic variation in Africa revealed this relationship with a moderate sample size and informed our understanding of the genetics of schizophrenia worldwide.

79 citations

Posted ContentDOI
04 Apr 2019-bioRxiv
TL;DR: Vireo is presented, a computationally efficient Bayesian model to demultiplex single-cell data from pooled experimental designs that can be applied in settings when only partial or no genotype information is available.
Abstract: The joint analysis of multiple samples using single-cell RNA-seq is a promising experimental design, offering both increased throughput while allowing to account for batch variation To achieve multi-sample designs, genetic variants that segregate between the samples in the pool have been proposed as natural barcodes for cell demultiplexing Existing demultiplexing strategies rely on access to complete genotype data from the pooled samples, which greatly limits the applicability of such methods, in particular when genetic variation is not the primary object of study To address this, we here present Vireo, a computationally efficient Bayesian model to demultiplex single-cell data from pooled experimental designs Uniquely, our model can be applied in settings when only partial or no genotype information is available Using simulations based on synthetic mixtures and results on real data, we demonstrate the robustness of our model and illustrate the utility of multi-sample experimental designs for common expression analyses

78 citations

Journal ArticleDOI
TL;DR: This genome-wide association study analyzed criterion counts of comorbid AD and MD in African American and European American data sets collected as part of the Yale-Penn study of the genetics of drug and alcohol dependence to determine whether polygenic risk alleles are shared with neuropsychiatric traits or subcortical brain volumes.
Abstract: Importance Alcohol dependence (AD) and major depression (MD) are leading causes of disability that often co-occur. Genetic epidemiologic data have shown that AD and MD share a common possible genetic cause. The molecular nature of this shared genetic basis is poorly understood. Objectives To detect genetic risk variants for comorbid AD and MD and to determine whether polygenic risk alleles are shared with neuropsychiatric traits or subcortical brain volumes. Design, Setting, and Participants This genome-wide association study analyzed criterion counts of comorbid AD and MD in African American and European American data sets collected as part of the Yale-Penn study of the genetics of drug and alcohol dependence from February 14, 1999, to January 13, 2015. After excluding participants never exposed to alcohol or with missing information for any diagnostic criterion, genome-wide association studies were performed on 2 samples (the Yale-Penn 1 and Yale-Penn 2 samples) totaling 4653 African American participants and 3169 European American participants (analyzed separately). Tests were performed to determine whether polygenic risk scores derived from potentially related traits in European American participants could be used to estimate comorbid AD and MD. Main Outcomes and Measures Comorbid criterion counts (ranging from 0 to 14) for AD (7 criteria) and MD (9 criteria, scaled to 7) as defined by theDSM-IV. Results Of the 7822 participants (3342 women and 4480 men; mean [SD] age, 40.1 [10.7] years), the median comorbid criterion count was 6.2 (interquartile range, 2.3-10.9). Under the linear regression model,rs139438618at the semaphorin 3A (SEMA3A[OMIM603961]) locus was significantly associated with AD and MD comorbidity in African American participants in the Yale-Penn 1 sample (β = 0.89; 95% CI, 0.57-1.20;P = 2.76 × 10−8). In the independent Yale-Penn 2 sample, the association was also significant (β = 0.83; 95% CI, 0.39-1.28;P = 2.06 × 10−4). Meta-analysis of the 2 samples yielded a more robust association (β = 0.87; 95% CI, 0.61-1.12;P = 2.41 × 10−11). There was no significant association identified in European American participants. Analyses of polygenic risk scores showed that individuals with a higher risk of neuroticism (β = 1.01; 95% CI, 0.50-1.52) or depressive symptoms (β = 0.87; 95% CI, 0.32-1.42) and a lower level of subjective well-being (β = –0.94; 95% CI, –1.46 to –0.42) and educational attainment (β = –1.00, 95% CI, −1.57 to –0.44) had a higher level of AD and MD comorbidity, while larger intracranial (β = 1.07; 95% CI, 0.50 to 1.64) and smaller putamen volumes (β = –1.16; 95% CI, –1.86 to –0.46) were associated with higher risks of AD and MD comorbidity. Conclusions and Relevance SEMA3Avariation is significantly and replicably associated with comorbid AD and MD in African American participants. Analyses of polygenic risk scores identified pleiotropy with neuropsychiatric traits and brain volumes. Further studies are warranted to understand the biological and genetic mechanisms of this comorbidity, which could facilitate development of medications and other treatments for comorbid AD and MD.

78 citations


Cites background from "A global reference for human geneti..."

  • ...html)(31) and the 1000Genomes phase 3 reference panel.(32)TheAfrican American and European American samples were imputed separately....

    [...]

Journal ArticleDOI
TL;DR: Using whole-exome sequencing, the authors identify rare truncating variants in TTN that associate with familial and early-onset AF and show defects in cardiac sarcomere assembly in ttn.2-mutant zebrafish.
Abstract: A family history of atrial fibrillation constitutes a substantial risk of developing the disease, however, the pathogenesis of this complex disease is poorly understood. We perform whole-exome sequencing on 24 families with at least three family members diagnosed with atrial fibrillation (AF) and find that titin-truncating variants (TTNtv) are significantly enriched in these patients (P = 1.76 × 10−6). This finding is replicated in an independent cohort of early-onset lone AF patients (n = 399; odds ratio = 36.8; P = 4.13 × 10−6). A CRISPR/Cas9 modified zebrafish carrying a truncating variant of titin is used to investigate TTNtv effect in atrial development. We observe compromised assembly of the sarcomere in both atria and ventricle, longer PR interval, and heterozygous adult zebrafish have a higher degree of fibrosis in the atria, indicating that TTNtv are important risk factors for AF. This aligns with the early onset of the disease and adds an important dimension to the understanding of the molecular predisposition for AF.

78 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations