scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Probes on the Infinium MethylationEPIC BeadChip are identified that are predicted to measure methylation at polymorphic sites and hybridise to multiple genomic regions and are intended to be used for quality control procedures when analysing data derived from this platform.
Abstract: Genome-wide analysis of DNA methylation has now become a relatively inexpensive technique thanks to array-based methylation profiling technologies. The recently developed Illumina Infinium MethylationEPIC BeadChip interrogates methylation at over 850,000 sites across the human genome, covering 99% of RefSeq genes. This array supersedes the widely used Infinium HumanMethylation450 BeadChip, which has permitted insights into the relationship between DNA methylation and a wide range of conditions and traits. Previous research has identified issues with certain probes on both the HumanMethylation450 BeadChip and its predecessor, the Infinium HumanMethylation27 BeadChip, which were predicted to affect array performance. These issues concerned probe-binding specificity and the presence of polymorphisms at target sites. Using in silico methods, we have identified probes on the Infinium MethylationEPIC BeadChip that are predicted to (i) measure methylation at polymorphic sites and (ii) hybridise to multiple genomic regions. We intend these resources to be used for quality control procedures when analysing data derived from this platform.

226 citations

Journal ArticleDOI
TL;DR: It is shown that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with reduced microbial resistance and increased allergic disease in large cohorts.
Abstract: Pathogens and the diseases they cause have been among the most important selective forces experienced by humans during their evolutionary history. Although adaptive alleles generally arise by mutation, introgression can also be a valuable source of beneficial alleles. Archaic humans, who lived in Europe and Western Asia for more than 200,000 years, were probably well adapted to this environment and its local pathogens. It is therefore conceivable that modern humans entering Europe and Western Asia who admixed with them obtained a substantial immune advantage from the introgression of archaic alleles. Here we document a cluster of three Toll-like receptors (TLR6-TLR1-TLR10) in modern humans that carries three distinct archaic haplotypes, indicating repeated introgression from archaic humans. Two of these haplotypes are most similar to the Neandertal genome, and the third haplotype is most similar to the Denisovan genome. The Toll-like receptors are key components of innate immunity and provide an important first line of immune defense against bacteria, fungi, and parasites. The unusually high allele frequencies and unexpected levels of population differentiation indicate that there has been local positive selection on multiple haplotypes at this locus. We show that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with Increased microbial resistance and increased allergic disease in large cohorts. This provides strong evidence for recurrent adaptive introgression at the TLR6-TLR1-TLR10 locus, resulting in differences in disease phenotypes in modern humans.

225 citations

Journal ArticleDOI
25 Jan 2019-Science
TL;DR: The detection of recombination and de novo mutations (DNMs) requires genetic data on a proband and its parents, and a fine resolution of these events is possible only with whole-genome sequence data, allowing us to identify crossovers and DNMs in families at a high resolution.
Abstract: Genetic diversity arises from recombination and de novo mutation (DNM). Using a combination of microarray genotype and whole-genome sequence data on parent-child pairs, we identified 4,531,535 crossover recombinations and 200,435 DNMs. The resulting genetic map has a resolution of 682 base pairs. Crossovers exhibit a mutagenic effect, with overrepresentation of DNMs within 1 kilobase of crossovers in males and females. In females, a higher mutation rate is observed up to 40 kilobases from crossovers, particularly for complex crossovers, which increase with maternal age. We identified 35 loci associated with the recombination rate or the location of crossovers, demonstrating extensive genetic control of meiotic recombination, and our results highlight genes linked to the formation of the synaptonemal complex as determinants of crossovers.

224 citations

Journal ArticleDOI
TL;DR: This work demonstrates the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders, and argues that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
Abstract: Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.

221 citations

Journal ArticleDOI
TL;DR: A likelihood-based approach for analyzing summary-level statistics and external linkage disequilibrium information to estimate effect-size distributions of common variants, characterized by the proportion of underlying susceptibility SNPs and a flexible normal-mixture model for their effects, predicts the sample sizes needed to explain the SNP-based heritability of traits.
Abstract: We developed a likelihood-based approach for analyzing summary-level statistics and external linkage disequilibrium information to estimate effect-size distributions of common variants, characterized by the proportion of underlying susceptibility SNPs and a flexible normal-mixture model for their effects. Analysis of results available across 32 genome-wide association studies showed that, while all traits are highly polygenic, there is wide diversity in the degree and nature of polygenicity. Psychiatric diseases and traits related to mental health and ability appear to be most polygenic, involving a continuum of small effects. Most other traits, including major chronic diseases, involve clusters of SNPs that have distinct magnitudes of effects. We predict that the sample sizes needed to identify SNPs that explain most heritability found in genome-wide association studies will range from a few hundred thousand to multiple millions, depending on the underlying effect-size distributions of the traits. Accordingly, we project the risk-prediction ability of polygenic risk scores across a wide variety of diseases.

221 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations