scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
19 Apr 2018-Cell
TL;DR: It is shown that natural selection on genetic variants in the PDE10A gene have increased spleen size in the Bajau, providing them with a larger reservoir of oxygenated red blood cells and evidence of strong selection specific to the Bjau on BDKRB2, a gene affecting the human diving reflex.

116 citations


Cites methods from "A global reference for human geneti..."

  • ...We therefore merged our sequencing data from the Bajau and Saluan with Han Chinese genomes from the 1000 Genomes Project (Auton et al., 2015) and performed a genome-wide selection scan using a new method for detecting local selection, akin to the PBS statistic (Yi et al....

    [...]

  • ...We therefore merged our sequencing data from the Bajau and Saluan with Han Chinese genomes from the 1000 Genomes Project (Auton et al., 2015) and performed a genome-wide selection scan using a new method for detecting local selection, akin to the PBS statistic (Yi et al., 2010) but based on an explicit likelihood model and adjusted to account for admixture and differing ancestral components (Cheng et al., 2016)....

    [...]

  • ...We therefore merged our sequencing data from the Bajau and Saluan with Han Chinese genomes from the 1000 Genomes Project (Auton et al., 2015) and performed a genome-wide selection scan using a new method for detecting local selection, akin to the PBS statistic (Yi et al., 2010) but based on an…...

    [...]

Journal ArticleDOI
TL;DR: The approach "re-discovered" genes previously implicated in IHH and introduced an approach for highly adaptable variant quality filtering that leads to well-calibrated results, and developed a user-friendly software package for performing gene-based burden testing against public databases.
Abstract: The genetic causes of many Mendelian disorders remain undefined. Factors such as lack of large multiplex families, locus heterogeneity, and incomplete penetrance hamper these efforts for many disorders. Previous work suggests that gene-based burden testing—where the aggregate burden of rare, protein-altering variants in each gene is compared between case and control subjects—might overcome some of these limitations. The increasing availability of large-scale public sequencing databases such as Genome Aggregation Database (gnomAD) can enable burden testing using these databases as controls, obviating the need for additional control sequencing for each study. However, there exist various challenges with using public databases as controls, including lack of individual-level data, differences in ancestry, and differences in sequencing platforms and data processing. To illustrate the approach of using public data as controls, we analyzed whole-exome sequencing data from 393 individuals with idiopathic hypogonadotropic hypogonadism (IHH), a rare disorder with significant locus heterogeneity and incomplete penetrance against control subjects from gnomAD (n = 123,136). We leveraged presumably benign synonymous variants to calibrate our approach. Through iterative analyses, we systematically addressed and overcame various sources of artifact that can arise when using public control data. In particular, we introduce an approach for highly adaptable variant quality filtering that leads to well-calibrated results. Our approach “re-discovered” genes previously implicated in IHH (FGFR1, TACR3, GNRHR). Furthermore, we identified a significant burden in TYRO3, a gene implicated in hypogonadotropic hypogonadism in mice. Finally, we developed a user-friendly software package TRAPD (Test Rare vAriants with Public Data) for performing gene-based burden testing against public databases.

116 citations

Journal ArticleDOI
01 Apr 2022-Science
TL;DR: In this paper , a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled the comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome.
Abstract: Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

116 citations

Posted ContentDOI
14 Jun 2018-bioRxiv
TL;DR: For the first time, specific loci pointing to a potential role of 4 genes (DARS2, ARFGEF2, DCAKD and GATAD2A) that distinguish between BD and SCZ are identified, providing an opportunity to understand the biology contributing to clinical differences of these disorders.
Abstract: Schizophrenia and bipolar disorder are two distinct diagnoses that share symptomology. Understanding the genetic factors contributing to the shared and disorder-specific symptoms will be crucial for improving diagnosis and treatment. In genetic data consisting of 53,555 cases (20,129 bipolar disorder [BD], 33,426 schizophrenia [SCZ]) and 54,065 controls, we identified 114 genome-wide significant loci implicating synaptic and neuronal pathways shared between disorders. Comparing SCZ to BD (23,585 SCZ, 15,270 BD) identified four genomic regions including one with disorder-independent causal variants and potassium ion response genes as contributing to differences in biology between the disorders. Polygenic risk score (PRS) analyses identified several significant correlations within case-only phenotypes including SCZ PRS with psychotic features and age of onset in BD. For the first time, we discover specific loci that distinguish between BD and SCZ and identify polygenic components underlying multiple symptom dimensions. These results point to the utility of genetics to inform symptomology and potential treatment.

116 citations

Posted ContentDOI
Anubha Mahajan1, Cassandra N. Spracklen2, Weihua Zhang3, Maggie C.Y. Ng4  +242 moreInstitutions (95)
23 Sep 2020-medRxiv
TL;DR: Improved fine-mapping enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are mediated, laying foundations for functional investigations.
Abstract: We assembled an ancestrally diverse collection of genome-wide association studies of type 2 diabetes (T2D) in 180,834 cases and 1,159,055 controls (48.9% non-European descent). We identified 277 loci at genome-wide significance (p 50% posterior probability. This improved fine-mapping enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are mediated, laying foundations for functional investigations. Trans-ancestry genetic risk scores enhanced transferability across diverse populations, providing a step towards more effective clinical translation to improve global health.

115 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations