scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Based on the POSTAR platform, novel insights into post-transcriptional regulation are obtained, such as the putative association between CPSF6 binding, RNA structural domains, and Li-Fraumeni syndrome SNPs.
Abstract: We present POSTAR (http://POSTAR.ncrnalab.org), a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Precise characterization of post-transcriptional regulatory maps has accelerated dramatically in the past few years. Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (∼23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes. POSTAR annotates every transcript and its RBP binding sites using extensive information regarding various molecular regulatory events (e.g., splicing, editing, and modification), RNA secondary structures, disease-associated variants, and gene expression and function. Moreover, POSTAR provides a friendly, multi-mode, integrated search interface, which helps users to connect multiple RBP binding sites with post-transcriptional regulatory events, phenotypes, and diseases. Based on our platform, we were able to obtain novel insights into post-transcriptional regulation, such as the putative association between CPSF6 binding, RNA structural domains, and Li-Fraumeni syndrome SNPs. In summary, POSTAR represents an early effort to systematically annotate post-transcriptional regulatory maps and explore the putative roles of RBPs in human diseases.

104 citations

Journal ArticleDOI
TL;DR: A detailed study of a 5,000-y-old mass grave from southern Poland belonging to the Globular Amphora culture and containing the remains of 15 men, women, and children, all killed by blows to the head, provides an unprecedented level of insight into the kinship structure and social behavior of a Late Neolithic community.
Abstract: The third millennium BCE was a period of major cultural and demographic changes in Europe that signaled the beginning of the Bronze Age. People from the Pontic steppe expanded westward, leading to the formation of the Corded Ware complex and transforming the genetic landscape of Europe. At the time, the Globular Amphora culture (3300–2700 BCE) existed over large parts of Central and Eastern Europe, but little is known about their interaction with neighboring Corded Ware groups and steppe societies. Here we present a detailed study of a Late Neolithic mass grave from southern Poland belonging to the Globular Amphora culture and containing the remains of 15 men, women, and children, all killed by blows to the head. We sequenced their genomes to between 1.1- and 3.9-fold coverage and performed kinship analyses that demonstrate that the individuals belonged to a large extended family. The bodies had been carefully laid out according to kin relationships by someone who evidently knew the deceased. From a population genetic viewpoint, the people from Koszyce are clearly distinct from neighboring Corded Ware groups because of their lack of steppe-related ancestry. Although the reason for the massacre is unknown, it is possible that it was connected with the expansion of Corded Ware groups, which may have resulted in competition for resources and violent conflict. Together with the archaeological evidence, these analyses provide an unprecedented level of insight into the kinship structure and social behavior of a Late Neolithic community.

104 citations


Cites methods from "A global reference for human geneti..."

  • ...We identified runs of homozygosity using IBDseq (41) on a merge of the imputed genotypes of the Koszyce individuals and 214 individuals from two populations (IBS, TSI) of the 1000 Genomes Project (33)....

    [...]

  • ...The combined VCF was then split into separate files containing 200,000 markers each and imputed separately using Beagle 4.0 (r1399) (32), using the 1000 Genomes (33) phase 3 map included with Beagle (*.phase3.v5a.snps.vcf.gz and plink.chr*.GRCh37.map) with input through the genotype likelihood option....

    [...]

  • ...We called biallelic sites present in the 1000 Genomes (–genotyping_mode GENOTYPE_GIVEN_ALLELES) filtering for transitions by setting the genotype likelihoods to 0 for all three genotypes (e.g., hom ref, het, and hom alt)....

    [...]

  • ...0 (r1399) (32), using the 1000 Genomes (33) phase 3 map included with Beagle (*....

    [...]

  • ...Auton A, et al.; 1000 Genomes Project Consortium (2015) A global reference for human genetic variation....

    [...]

Journal ArticleDOI
24 Nov 2021-Cell
TL;DR: In this article, single-cell chromatin accessibility assays to 30 adult human tissue types from multiple donors were used to profile the activity of gene regulatory elements in diverse cell types and tissues in the human body.

104 citations

Journal ArticleDOI
TL;DR: A genome-wide association study of host resistance to severe Plasmodium falciparum malaria in over 17,000 individuals from 11 malaria- endemic countries will provide a major building block for future research on the genetic determinants of disease in these diverse human populations.
Abstract: We conducted a genome-wide association study of host resistance to severe Plasmodium falciparum malaria in over 17,000 individuals from 11 malaria- endemic countries, undertaking a wide ranging analysis which identifies five replicable associations with genome-wide levels of evidence. Our findings include a newly implicated variant on chromosome 6 associated with risk of cerebral malaria, and the discovery of an erythroid-specific transcription start site underlying the association in ATP2B4. Previously reported HLA associations cannot be replicated in this dataset. We estimate substantial heritability of severe malaria (h2 ~ 23%), of which around 10% is explained by the currently identified associations. Our dataset will provide a major building block for future research on the genetic determinants of disease in these diverse human populations.

104 citations

Journal ArticleDOI
31 Oct 2019-Blood
TL;DR: PEG is an effective therapy for patients with ET/PV who were previously refractory and/or intolerant to HU, and the Myeloproliferative Disorders Research Consortium (MPD-RC) 111 study was an investigator-initiated, international, multicenter, phase 2 trial evaluating the ability of PEG therapy to induce complete and partial hematologic responses.

104 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations