scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Two risk scores were developed to help predict liver cancer in individuals with obesity-related metabolic complications and it was shown that the risk scores helped to identify the risk of liver cancer both in high-risk individuals and in the general population.

130 citations

Posted ContentDOI
09 Sep 2016-bioRxiv
TL;DR: This study represents a large systematic application of transcriptome sequencing to rare disease diagnosis and highlights its utility for the detection and interpretation of variants missed by current standard diagnostic approaches.
Abstract: Exome and whole-genome sequencing are becoming increasingly routine approaches in Mendelian disease diagnosis. Despite their success, the current diagnostic rate for genomic analyses across a variety of rare diseases is approximately 25-50% [1-4]. Here, we explore the utility of transcriptome sequencing (RNA-seq) as a complementary diagnostic tool in a cohort of 50 patients with genetically undiagnosed rare neuromuscular disorders. We describe an integrated approach to analyze patient muscle RNA-seq, leveraging an analysis framework focused on the detection of transcript-level changes that are unique to the patient compared to over 180 control skeletal muscle samples. We demonstrate the power of RNA-seq to validate candidate splice-disrupting mutations and to identify splice-altering variants in both exonic and deep intronic regions, yielding an overall diagnosis rate of 35%. We also report the discovery of a highly recurrent de novo intronic mutation in COL6A1 that results in a dominantly acting splice-gain event, disrupting the critical glycine repeat motif of the triple helical domain. We identify this pathogenic variant in a total of 27 genetically unsolved patients in an external collagen VI-like dystrophy cohort, thus explaining approximately 25% of patients clinically suggestive of collagen VI dystrophy in whom prior genetic analysis is negative. Overall, this study represents the largest systematic application of transcriptome sequencing to rare disease diagnosis to date and highlights its utility for the detection and interpretation of variants missed by current standard diagnostic approaches.

129 citations

Journal ArticleDOI
TL;DR: A computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies and shows that the corresponding sequence is highly accurate and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome.
Abstract: We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged ( 99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

128 citations

Journal ArticleDOI
TL;DR: This genetic correlation and 2-sample mendelian randomization study uses large-scale genome-wide association data sources to explore the genetic overlap and associations between inflammatory activity, metabolic dysregulation, and individual depressive symptoms.
Abstract: Importance Observational studies highlight associations of C-reactive protein (CRP), a general marker of inflammation, and interleukin 6 (IL-6), a cytokine-stimulating CRP production, with individual depressive symptoms. However, it is unclear whether inflammatory activity is associated with individual depressive symptoms and to what extent metabolic dysregulation underlies the reported associations. Objective To explore the genetic overlap and associations between inflammatory activity, metabolic dysregulation, and individual depressive symptoms. GWAS Data Sources Genome-wide association study (GWAS) summary data of European individuals, including the following: CRP levels (204 402 individuals); 9 individual depressive symptoms (3 of which did not differentiate between underlying diametrically opposite symptoms [eg, insomnia and hypersomnia]) as measured with the Patient Health Questionnaire 9 (up to 117 907 individuals); summary statistics for major depression, including and excluding UK Biobank participants, resulting in sample sizes of 500 199 and up to 230 214 individuals, respectively; insomnia (up to 386 533 individuals); body mass index (BMI) (up to 322 154 individuals); and height (up to 253 280 individuals). Design In this genetic correlation and 2-sample mendelian randomization (MR) study, linkage disequilibrium score (LDSC) regression was applied to infer single-nucleotide variant-based heritability and genetic correlation estimates. Two-sample MR tested potential causal associations of genetic variants associated with CRP levels, IL-6 signaling, and BMI with depressive symptoms. The study dates were November 2019 to April 2020. Results Based on large GWAS data sources, genetic correlation analyses revealed consistent false discovery rate (FDR)-controlled associations (genetic correlation range, 0.152-0.362; FDR P = .006 to P < .001) between CRP levels and depressive symptoms that were similar in size to genetic correlations of BMI with depressive symptoms. Two-sample MR analyses suggested that genetic upregulation of IL-6 signaling was associated with suicidality (estimate [SE], 0.035 [0.010]; FDR plus Bonferroni correction P = .01), a finding that remained stable across statistical models and sensitivity analyses using alternative instrument selection strategies. Mendelian randomization analyses did not consistently show associations of higher CRP levels or IL-6 signaling with other depressive symptoms, but higher BMI was associated with anhedonia, tiredness, changes in appetite, and feelings of inadequacy. Conclusions and Relevance This study reports coheritability between CRP levels and individual depressive symptoms, which may result from the potentially causal association of metabolic dysregulation with anhedonia, tiredness, changes in appetite, and feelings of inadequacy. The study also found that IL-6 signaling is associated with suicidality. These findings may have clinical implications, highlighting the potential of anti-inflammatory approaches, especially IL-6 blockade, as a putative strategy for suicide prevention.

128 citations

Journal ArticleDOI
TL;DR: This review explores ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied and provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise.
Abstract: Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.

128 citations


Cites background from "A global reference for human geneti..."

  • ...eated an extensive catalogue of common human genetic variants, the differences in DNA sequences, based on microarray data. These studies eventually progressed into the ‘1000 Genomes Project’ (Genomes Project et al., 2015), which leveraged NGS technologies. In cancer research, The Cancer Genome Atlas (TCGA) (Cancer Genome Atlas Research, 2008) contains profiles of tumors and matched normal samples from more than 11000 ...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations