scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
22 Mar 2018
TL;DR: Lancet is presented, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs, and has better accuracy, especially for indel detection, than widely used somatic callers.
Abstract: Reliable detection of somatic variations is of critical importance in cancer research. Here we present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs. We demonstrate, through extensive experimental comparison on synthetic and real whole-genome sequencing datasets, that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system, which is essential for variant prioritization, and detects low-frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local-assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet.

86 citations

Journal ArticleDOI
05 Mar 2020-Cell
TL;DR: The comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes project is leveraged to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers, including undetected weak drivers.

86 citations

Journal ArticleDOI
TL;DR: It is reported here that missense mutations in the epigenetic regulator SMCHD1 mapping to the extended ATPase domain of the encoded protein cause BAMS in all 14 cases studied and biochemical insight into its enzymatic function is provided.
Abstract: Bosma arhinia microphthalmia syndrome (BAMS) is an extremely rare and striking condition characterized by complete absence of the nose with or without ocular defects. We report here that missense mutations in the epigenetic regulator SMCHD1 mapping to the extended ATPase domain of the encoded protein cause BAMS in all 14 cases studied. All mutations were de novo where parental DNA was available. Biochemical tests and in vivo assays in Xenopus laevis embryos suggest that these mutations may behave as gain-of-function alleles. This finding is in contrast to the loss-of-function mutations in SMCHD1 that have been associated with facioscapulohumeral muscular dystrophy (FSHD) type 2. Our results establish SMCHD1 as a key player in nasal development and provide biochemical insight into its enzymatic function that may be exploited for development of therapeutics for FSHD.

86 citations

Journal ArticleDOI
TL;DR: A large genome-wide association study (GWAS) is performed in 26,679 individuals and 27 independent genome- wide associated signals are identified, including 13 new risk loci for SSc as well as loci specific for limited cutaneous and diffuse SSc and, defining credible sets and performing functional annotation, highlight key pathways and cell types for S sc.
Abstract: Systemic sclerosis (SSc) is an autoimmune disease that shows one of the highest mortality rates among rheumatic diseases. We perform a large genome-wide association study (GWAS), and meta-analysis with previous GWASs, in 26,679 individuals and identify 27 independent genome-wide associated signals, including 13 new risk loci. The novel associations nearly double the number of genome-wide hits reported for SSc thus far. We define 95% credible sets of less than 5 likely causal variants in 12 loci. Additionally, we identify specific SSc subtype-associated signals. Functional analysis of high-priority variants shows the potential function of SSc signals, with the identification of 43 robust target genes through HiChIP. Our results point towards molecular pathways potentially involved in vasculopathy and fibrosis, two main hallmarks in SSc, and highlight the spectrum of critical cell types for the disease. This work supports a better understanding of the genetic basis of SSc and provides directions for future functional experiments.

86 citations


Cites background or methods from "A global reference for human geneti..."

  • ...To quantify the extent of this overlap, we investigated whether SSc associations were non-randomly distributed in histone marks of active promoters (H3K9ac, H3K4me2, H3K4me3, H3K4ac), active enhancers (H3K27ac, H3K4me1, H2BK20ac), and active (or at least accessible) genes (H3K79me1, H2BK15ac) across the 127 reference epigenomes available from the Roadmap Epigenomics Consortium and the Encyclopedia of DNA Elements (ENCODE) projects15,24....

    [...]

  • ...Overlap of SNPs with chromatin marks was interrogated in selected cell lines from the Roadmap Epigenomics Project15 (Supplementary Table 1)....

    [...]

  • ...In addition, we explored overlap with histone marks of active promoters and active enhancers (H3K9ac, H3K4me1, and H3K27ac) of cell types relevant to the disease using data from the Roadmap Epigenomics Project15 (Supplementary Table 1) (Methods)....

    [...]

  • ...GARFIELD provides an annotation panel that includes 1005 annotations (genetic annotations, chromatin states, histone modifications, DNase I hypersensitive sites and transcription factor binding sites in different cell lines) from ENCODE, GENCODE and Roadmap Epigenomics projects15,24,64....

    [...]

  • ...The second option was selected for our enrichment analysis using annotations for 127 reference epigenomes (Supplementary Data 13) and 9 epigenetic marks (H2BK20ac, H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K4ac, H3K79me1, H2BK15ac) obtained from the Roadmap Epigenomics Consortium and the Encyclopedia of DNA Elements (ENCODE) projects15,24....

    [...]

Journal ArticleDOI
TL;DR: A computational pipeline involving statistical and machine-learning methods that can assess the risk of progression to PsA based on genetic markers is developed and it is shown that the underlying genetic differences between psoriasis subtypes can be used for individualized subtype risk assessment.
Abstract: Psoriatic arthritis (PsA) is a complex chronic musculoskeletal condition that occurs in ~30% of psoriasis patients. Currently, no systematic strategy is available that utilizes the differences in genetic architecture between PsA and cutaneous-only psoriasis (PsC) to assess PsA risk before symptoms appear. Here, we introduce a computational pipeline for predicting PsA among psoriasis patients using data from six cohorts with >7000 genotyped PsA and PsC patients. We identify 9 new loci for psoriasis or its subtypes and achieve 0.82 area under the receiver operator curve in distinguishing PsA vs. PsC when using 200 genetic markers. Among the top 5% of our PsA prediction we achieve >90% precision with 100% specificity and 16% recall for predicting PsA among psoriatic patients, using conditional inference forest or shrinkage discriminant analysis. Combining statistical and machine-learning techniques, we show that the underlying genetic differences between psoriasis subtypes can be used for individualized subtype risk assessment.

85 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations