scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: The study shows that genetic variation in cis -regulatory elements affects gene expression in a manner dependent on lymphocyte activation status, contributing to the interindividual complexity of immune responses.
Abstract: Genetic studies have revealed that autoimmune susceptibility variants are over-represented in memory CD4+ T cell regulatory elements1-3. Understanding how genetic variation affects gene expression in different T cell physiological states is essential for deciphering genetic mechanisms of autoimmunity4,5. Here, we characterized the dynamics of genetic regulatory effects at eight time points during memory CD4+ T cell activation with high-depth RNA-seq in healthy individuals. We discovered widespread, dynamic allele-specific expression across the genome, where the balance of alleles changes over time. These genes were enriched fourfold within autoimmune loci. We found pervasive dynamic regulatory effects within six HLA genes. HLA-DQB1 alleles had one of three distinct transcriptional regulatory programs. Using CRISPR-Cas9 genomic editing we demonstrated that a promoter variant is causal for T cell-specific control of HLA-DQB1 expression. Our study shows that genetic variation in cis-regulatory elements affects gene expression in a manner dependent on lymphocyte activation status, contributing to the interindividual complexity of immune responses.

71 citations

Journal ArticleDOI
TL;DR: This data resource provides a foundation for developing new operational systems for molecular surveillance and for accelerating research and development of new vector control tools, and provides a unique resource for the study of population genomics and evolutionary biology in eukaryotic species with high levels of genetic diversity under strong anthropogenic evolutionary pressures.
Abstract: Mosquito control remains a central pillar of efforts to reduce malaria burden in sub-Saharan Africa. However, insecticide resistance is entrenched in malaria vector populations, and countries with a high malaria burden face a daunting challenge to sustain malaria control with a limited set of surveillance and intervention tools. Here we report on the second phase of a project to build an open resource of high-quality data on genome variation among natural populations of the major African malaria vector species Anopheles gambiae and Anopheles coluzzii. We analyzed whole genomes of 1142 individual mosquitoes sampled from the wild in 13 African countries, as well as a further 234 individuals comprising parents and progeny of 11 laboratory crosses. The data resource includes high-confidence single-nucleotide polymorphism (SNP) calls at 57 million variable sites, genome-wide copy number variation (CNV) calls, and haplotypes phased at biallelic SNPs. We use these data to analyze genetic population structure and characterize genetic diversity within and between populations. We illustrate the utility of these data by investigating species differences in isolation by distance, genetic variation within proposed gene drive target sequences, and patterns of resistance to pyrethroid insecticides. This data resource provides a foundation for developing new operational systems for molecular surveillance and for accelerating research and development of new vector control tools. It also provides a unique resource for the study of population genomics and evolutionary biology in eukaryotic species with high levels of genetic diversity under strong anthropogenic evolutionary pressures.

71 citations

Journal ArticleDOI
TL;DR: It has become evident that deep African population history is captured by relationships among African hunter-gatherers, as the world's deepest population divergences occur among these groups, and that the deepest population divergence dates to 300,000 years before present.
Abstract: In the last three decades, genetic studies have played an increasingly important role in exploring human history. They have helped to conclusively establish that anatomically modern humans first ap...

71 citations


Cites background or methods from "A global reference for human geneti..."

  • ...The recent publication of full genome sequence data from Bantu language speakers is promising (4, 20, 44, 76), and the inclusion of more population groups across the continent in genome-wide studies should help to refine and extend hypotheses regarding large- and fine-scale movements of Bantu language speakers....

    [...]

  • ...Data were obtained from several studies (4, 16, 44, 53, 70, 89, 125), and the details of the analysis are described in the Supplemental Methods....

    [...]

Journal ArticleDOI
TL;DR: It is hypothesized that differential coevolution of HPV16 lineages with different but closely related ancestral human populations and subsequent host-switch events in parallel with introgression of archaic alleles into the genomes of modern human ancestors may be largely responsible for the present-day differential prevalence and association with cancers for HPV16 variants.
Abstract: Every human suffers through life a number of papillomaviruses (PVs) infections, most of them asymptomatic. A notable exception are persistent infections by Human papillomavirus 16 (HPV16), the most oncogenic infectious agent for humans and responsible for most infection-driven anogenital cancers. Oncogenic potential is not homogeneous among HPV16 lineages, and genetic variation within HPV16 exhibits some geographic structure. However, an in-depth analysis of the HPV16 evolutionary history was still wanting. We have analyzed extant HPV16 diversity and compared the evolutionary and phylogeographical patterns of humans and of HPV16. We show that codivergence with modern humans explains at most 30% of the present viral geographical distribution. The most explanatory scenario suggests that ancestral HPV16 already infected ancestral human populations and that viral lineages co-diverged with the hosts in parallel with the split between archaic Neanderthal-Denisovans and ancestral modern human populations, generating the ancestral HPV16A and HPV16BCD viral lineages, respectively. We propose that after out-of-Africa migration of modern human ancestors, sexual transmission between human populations introduced HPV16A into modern human ancestor populations. We hypothesize that differential coevolution of HPV16 lineages with different but closely related ancestral human populations and subsequent host-switch events in parallel with introgression of archaic alleles into the genomes of modern human ancestors may be largely responsible for the present-day differential prevalence and association with cancers for HPV16 variants.

71 citations


Cites background from "A global reference for human geneti..."

  • ...And fourth, while human genetic diversity is highest in sub-Saharan Africa (1000 Genomes Project Consortium et al. 2015), our results of HPV16 E6 variability showed consistently higher genetic diversity estimates outside sub-Saharan Africa (table 4)....

    [...]

Journal ArticleDOI
TL;DR: Replacement of the GRCh37 standard reference with QTRG in a best practices genome analysis workflow resulted in an average of 7* deeper coverage depth (an improvement of 23%) and 756,671 fewer variants on average, a reduction of 16% that is attributed to common Qatari alleles being present in Q TRG.
Abstract: Reaching the full potential of precision medicine depends on the quality of personalized genome interpretation. In order to facilitate precision medicine in regions of the Middle East and North Africa (MENA), a population-specific reference genome for the indigenous Arab popula-tion of Qatar (QTRG) was constructed by incorporating allele frequency data from sequencing of 1,161 Qataris, representing 0.4% of the population. A total of 20.9 million SNP and 3.1 million indels were observed in Qatar, including an average of 1.79% novel variants per individual ge-nome. Replacement of the GRCh37 standard reference with QTRG in a best practices genome analysis workflow resulted in an average of 7* deeper coverage depth (an improvement of 23%), and 756,671 fewer variants on average, a reduction of 16% that is attributed to common Qatari alleles being present in the QTRG reference. The benefit for using QTRG varies across ances-tries, a factor that should be taken into consideration when selecting an appropriate reference for analysis.

71 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations