scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: This Review summarizes the efforts of an international working group that aimed to survey the current landscape of blood-based AD biomarkers and outlines operational steps for an effective academic–industry co-development pathway from identification and assay development to validation for clinical use.
Abstract: Biomarker discovery and development for clinical research, diagnostics and therapy monitoring in clinical trials have advanced rapidly in key areas of medicine - most notably, oncology and cardiovascular diseases - allowing rapid early detection and supporting the evolution of biomarker-guided, precision-medicine-based targeted therapies. In Alzheimer disease (AD), breakthroughs in biomarker identification and validation include cerebrospinal fluid and PET markers of amyloid-β and tau proteins, which are highly accurate in detecting the presence of AD-associated pathophysiological and neuropathological changes. However, the high cost, insufficient accessibility and/or invasiveness of these assays limit their use as viable first-line tools for detecting patterns of pathophysiology. Therefore, a multistage, tiered approach is needed, prioritizing development of an initial screen to exclude from these tests the high numbers of people with cognitive deficits who do not demonstrate evidence of underlying AD pathophysiology. This Review summarizes the efforts of an international working group that aimed to survey the current landscape of blood-based AD biomarkers and outlines operational steps for an effective academic-industry co-development pathway from identification and assay development to validation for clinical use.

396 citations

Journal ArticleDOI
TL;DR: In this article, a genome-wide association study of self-reported daytime napping in the UK Biobank and Mendelian randomization was performed to explore causal associations with cardiometabolic outcomes.
Abstract: Daytime napping is a common, heritable behavior, but its genetic basis and causal relationship with cardiometabolic health remain unclear. Here, we perform a genome-wide association study of self-reported daytime napping in the UK Biobank (n = 452,633) and identify 123 loci of which 61 replicate in the 23andMe research cohort (n = 541,333). Findings include missense variants in established drug targets for sleep disorders (HCRTR1, HCRTR2), genes with roles in arousal (TRPC6, PNOC), and genes suggesting an obesity-hypersomnolence pathway (PNOC, PATJ). Association signals are concordant with accelerometer-measured daytime inactivity duration and 33 loci colocalize with loci for other sleep phenotypes. Cluster analysis identifies three distinct clusters of nap-promoting mechanisms with heterogeneous associations with cardiometabolic outcomes. Mendelian randomization shows potential causal links between more frequent daytime napping and higher blood pressure and waist circumference. The genetic basis of daytime napping and the directional effect of daytime napping on cardiometabolic health are unknown. Here, the authors perform a genome-wide association study on self-reported daytime napping in the UK Biobank and Mendelian randomization to explore causal associations.

393 citations

Journal ArticleDOI
TL;DR: To provide a global distribution map of CYP alleles with clinical importance, whole‐genome and exome sequencing data from 56,945 unrelated individuals of five major human populations are integrated and derive the frequencies of 176 CYP haplotypes, providing an extensive resource for major genetic determinants of drug metabolism.
Abstract: Genetic polymorphisms in cytochrome P450 (CYP) genes can result in altered metabolic activity toward a plethora of clinically important medications. Thus, single nucleotide variants and copy number variations in CYP genes are major determinants of drug pharmacokinetics and toxicity and constitute pharmacogenetic biomarkers for drug dosing, efficacy, and safety. Strikingly, the distribution of CYP alleles differs considerably between populations with important implications for personalized drug therapy and healthcare programs. To provide a global distribution map of CYP alleles with clinical importance, we integrated whole-genome and exome sequencing data from 56,945 unrelated individuals of five major human populations. By combining this dataset with population-specific linkage information, we derive the frequencies of 176 CYP haplotypes, providing an extensive resource for major genetic determinants of drug metabolism. Furthermore, we aggregated this dataset into spectra of predicted functional variability in the respective populations and discuss the implications for population-adjusted pharmacological treatment strategies.

389 citations

Journal ArticleDOI
30 Sep 2020-Nature
TL;DR: Risk of severe COVID-19 is conferred by a genomic segment that is inherited from Neanderthals and is carried by around 50% and 16% of people in south Asia and Europe, respectively.
Abstract: A recent genetic association study1 identified a gene cluster on chromosome 3 as a risk locus for respiratory failure after infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A separate study (COVID-19 Host Genetics Initiative)2 comprising 3,199 hospitalized patients with coronavirus disease 2019 (COVID-19) and control individuals showed that this cluster is the major genetic risk factor for severe symptoms after SARS-CoV-2 infection and hospitalization. Here we show that the risk is conferred by a genomic segment of around 50 kilobases in size that is inherited from Neanderthals and is carried by around 50% of people in south Asia and around 16% of people in Europe.

382 citations

Journal ArticleDOI
TL;DR: This Review discusses bioinformatics tools that have been devised to handle the numerous characteristic features of these long-range data types, with applications in genome assembly, genetic variant detection, haplotype phasing, transcriptomics and epigenomics.
Abstract: Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.

381 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations