scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: A large-scale genome-wide association study (GWAS) of general cognitive ability was presented in this paper, which showed significant enrichment for genes causing Mendelian disorders with an intellectual disability phenotype.

90 citations

Journal ArticleDOI
TL;DR: In this article, a deep learning model based on U-Net architecture was proposed to accurately predict open chromatin peak calls in rare cell populations and identify cell-type-specific regulatory signatures underlying Type 2 diabetes.
Abstract: Objective Type 2 diabetes (T2D) is a complex disease characterized by pancreatic islet dysfunction, insulin resistance, and disruption of blood glucose levels. Genome-wide association studies (GWAS) have identified > 400 independent signals that encode genetic predisposition. More than 90% of associated single-nucleotide polymorphisms (SNPs) localize to non-coding regions and are enriched in chromatin-defined islet enhancer elements, indicating a strong transcriptional regulatory component to disease susceptibility. Pancreatic islets are a mixture of cell types that express distinct hormonal programs, so each cell type may contribute differentially to the underlying regulatory processes that modulate T2D-associated transcriptional circuits. Existing chromatin profiling methods such as ATAC-seq and DNase-seq, applied to islets in bulk, produce aggregate profiles that mask important cellular and regulatory heterogeneity. Methods We present genome-wide single-cell chromatin accessibility profiles in >1,600 cells derived from a human pancreatic islet sample using single-cell combinatorial indexing ATAC-seq (sci-ATAC-seq). We also developed a deep learning model based on U-Net architecture to accurately predict open chromatin peak calls in rare cell populations. Results We show that sci-ATAC-seq profiles allow us to deconvolve alpha, beta, and delta cell populations and identify cell-type-specific regulatory signatures underlying T2D. Particularly, T2D GWAS SNPs are significantly enriched in beta cell-specific and across cell-type shared islet open chromatin, but not in alpha or delta cell-specific open chromatin. We also demonstrate, using less abundant delta cells, that deep learning models can improve signal recovery and feature reconstruction of rarer cell populations. Finally, we use co-accessibility measures to nominate the cell-specific target genes at 104 non-coding T2D GWAS signals. Conclusions Collectively, we identify the islet cell type of action across genetic signals of T2D predisposition and provide higher-resolution mechanistic insights into genetically encoded risk pathways.

90 citations

Journal ArticleDOI
27 May 2020-Nature
TL;DR: It is demonstrated that NADH reductive stress mediates the effects of GCKR variation on many metabolic traits, including circulating triglyceride levels, glucose tolerance and FGF21 levels, and underscores the utility of genetic tools such as Lb NOX to empower studies of 'causal metabolism’.
Abstract: The cellular NADH/NAD+ ratio is fundamental to biochemistry, but the extent to which it reflects versus drives metabolic physiology in vivo is poorly understood. Here we report the in vivo application of Lactobacillus brevis (Lb)NOX1, a bacterial water-forming NADH oxidase, to assess the metabolic consequences of directly lowering the hepatic cytosolic NADH/NAD+ ratio in mice. By combining this genetic tool with metabolomics, we identify circulating α-hydroxybutyrate levels as a robust marker of an elevated hepatic cytosolic NADH/NAD+ ratio, also known as reductive stress. In humans, elevations in circulating α-hydroxybutyrate levels have previously been associated with impaired glucose tolerance2, insulin resistance3 and mitochondrial disease4, and are associated with a common genetic variant in GCKR5, which has previously been associated with many seemingly disparate metabolic traits. Using LbNOX, we demonstrate that NADH reductive stress mediates the effects of GCKR variation on many metabolic traits, including circulating triglyceride levels, glucose tolerance and FGF21 levels. Our work identifies an elevated hepatic NADH/NAD+ ratio as a latent metabolic parameter that is shaped by human genetic variation and contributes causally to key metabolic traits and diseases. Moreover, it underscores the utility of genetic tools such as LbNOX to empower studies of ‘causal metabolism’. The authors identify an increased hepatic NADH/NAD+ ratio as an underlying metabolic parameter that is shaped by human genetic variation and contributes causally to key metabolic traits and diseases.

89 citations

Journal ArticleDOI
TL;DR: LPL FCS patients have lower postheparin LPL activity and a trend toward higher TGs, whereas low-density lipoprotein cholesterol was higher in non-LPL-FCS patients, according to a phase 3 randomized placebo-controlled trial of volanesorsen.

89 citations

Journal ArticleDOI
TL;DR: The study shows that TGF-β1 has a critical and nonredundant role in the development and homeostasis of intestinal immunity and the CNS in humans.
Abstract: Transforming growth factor (TGF)-β1 (encoded by TGFB1) is the prototypic member of the TGF-β family of 33 proteins that orchestrate embryogenesis, development and tissue homeostasis1,2. Following its discovery 3 , enormous interest and numerous controversies have emerged about the role of TGF-β in coordinating the balance of pro- and anti-oncogenic properties4,5, pro- and anti-inflammatory effects 6 , or pro- and anti-fibrinogenic characteristics 7 . Here we describe three individuals from two pedigrees with biallelic loss-of-function mutations in the TGFB1 gene who presented with severe infantile inflammatory bowel disease (IBD) and central nervous system (CNS) disease associated with epilepsy, brain atrophy and posterior leukoencephalopathy. The proteins encoded by the mutated TGFB1 alleles were characterized by impaired secretion, function or stability of the TGF-β1-LAP complex, which is suggestive of perturbed bioavailability of TGF-β1. Our study shows that TGF-β1 has a critical and nonredundant role in the development and homeostasis of intestinal immunity and the CNS in humans.

89 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations