scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: While the selection signatures do not show enrichment in archaic hominin-derived genome sequences, they overlap with the SNPs associated with the modern human traits and the strongest overlaps are observed for the alcohol or nutrition metabolism-related traits.
Abstract: Understanding natural selection is crucial to unveiling evolution of modern humans. Here, we report natural selection signatures in the Japanese population using 2234 high-depth whole-genome sequence (WGS) data (25.9×). Using rare singletons, we identify signals of very recent selection for the past 2000–3000 years in multiple loci (ADH cluster, MHC region, BRAP-ALDH2, SERHL2). In large-scale genome-wide association study (GWAS) dataset (n = 171,176), variants with selection signatures show enrichment in heterogeneity of derived allele frequency spectra among the geographic regions of Japan, highlighted by two major regional clusters (Hondo and Ryukyu). While the selection signatures do not show enrichment in archaic hominin-derived genome sequences, they overlap with the SNPs associated with the modern human traits. The strongest overlaps are observed for the alcohol or nutrition metabolism-related traits. Our study illustrates the value of high-depth WGS to understand evolution and their relationship with disease risk. Recent natural selection left signals in human genomes. Here, Okada et al. generate high-depth whole-genome sequence (WGS) data (25.9×) from 2,234 Japanese people of the BioBank Japan Project (BBJ), and identify signals of recent natural selection which overlap variants associated with human traits.

115 citations

Journal ArticleDOI
TL;DR: Despite low mutational load, numerous samples with marked intratumoral genetic heterogeneity are identified, including branching evolution across multiregion sequencing, which collectively redefine the molecular underpinnings of ACC progression and identify further targets for precision therapies.
Abstract: BACKGROUNDAdenoid cystic carcinoma (ACC) is a rare malignancy arising in salivary glands and other sites, characterized by high rates of relapse and distant spread. Recurrent/metastatic (R/M) ACCs are generally incurable, due to a lack of active systemic therapies. To improve outcomes, deeper understanding of genetic alterations and vulnerabilities in R/M tumors is needed.METHODSAn integrated genomic analysis of 1,045 ACCs (177 primary, 868 R/M) was performed to identify alterations associated with advanced and metastatic tumors. Intratumoral genetic heterogeneity, germline mutations, and therapeutic actionability were assessed.RESULTSCompared with primary tumors, R/M tumors were enriched for alterations in key Notch (NOTCH1, 26.3% vs. 8.5%; NOTCH2, 4.6% vs. 2.3%; NOTCH3, 5.7% vs. 2.3%; NOTCH4, 3.6% vs. 0.6%) and chromatin-remodeling (KDM6A, 15.2% vs. 3.4%; KMT2C/MLL3, 14.3% vs. 4.0%; ARID1B, 14.1% vs. 4.0%) genes. TERT promoter mutations (13.1% of R/M cases) were mutually exclusive with both NOTCH1 mutations (q = 3.3 × 10-4) and MYB/MYBL1 fusions (q = 5.6 × 10-3), suggesting discrete, alternative mechanisms of tumorigenesis. This network of alterations defined 4 distinct ACC subgroups: MYB+NOTCH1+, MYB+/other, MYBWTNOTCH1+, and MYBWTTERT+. Despite low mutational load, we identified numerous samples with marked intratumoral genetic heterogeneity, including branching evolution across multiregion sequencing.CONCLUSIONThese observations collectively redefine the molecular underpinnings of ACC progression and identify further targets for precision therapies.FUNDINGAdenoid Cystic Carcinoma Research Foundation, Pershing Square Sohn Cancer Research grant, the PaineWebber Chair, Stand Up 2 Cancer, NIH R01 CA205426, the STARR Cancer Consortium, NCI R35 CA232097, the Frederick Adler Chair, Cycle for Survival, the Jayme Flowers Fund, The Sebastian Nativo Fund, NIH K08 DE024774 and R01 DE027738, and MSKCC through NIH/NCI Cancer Center Support Grant (P30 CA008748).

115 citations

Journal ArticleDOI
TL;DR: RAS-specific ACMG-AMP specifications optimized the utility of available clinical evidence and Ras/MAPK pathway–specific characteristics to consistently classify RASopathy-associated variants, highlighting how grouping genes by shared features promotes rapid multigenic variant assessment without sacrificing specificity and accuracy.

115 citations

Journal ArticleDOI
TL;DR: A reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data is described.
Abstract: Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms. Assembly of haplotype-resolved human genomes is achieved by combining short and long reads.

115 citations

Journal ArticleDOI
TL;DR: A genotype reference panel for the Japanese population followed by GWAS is constructed and a gene-based analysis identifies two genes with multiple height-increasing rare and low-frequency nonsynonymous variants, suggesting that height-associated rare variants are under different selection pressure in Japanese and European populations.
Abstract: Human height is a representative phenotype to elucidate genetic architecture. However, the majority of large studies have been performed in European population. To investigate the rare and low-frequency variants associated with height, we construct a reference panel (N = 3,541) for genotype imputation by integrating the whole-genome sequence data from 1,037 Japanese with that of the 1000 Genomes Project, and perform a genome-wide association study in 191,787 Japanese. We report 573 height-associated variants, including 22 rare and 42 low-frequency variants. These 64 variants explain 1.7% of the phenotypic variance. Furthermore, a gene-based analysis identifies two genes with multiple height-increasing rare and low-frequency nonsynonymous variants (SLC27A3 and CYP26B1; PSKAT-O < 2.5 × 10−6). Our analysis shows a general tendency of the effect sizes of rare variants towards increasing height, which is contrary to findings among Europeans, suggesting that height-associated rare variants are under different selection pressure in Japanese and European populations. Thousands of genetic loci are known to associate with human height, but these are mainly based on studies in European ancestry populations. Here, Akiyama et al. construct a genotype reference panel for the Japanese population followed by GWAS and report 573 height associated variants in 191,787 Japanese.

115 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations