scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: An extensive characterization of probes on the EPIC and HM450 microarrays is presented, including mappability to the latest genome build, genomic copy number of the 3΄ nested subsequence and influence of polymorphisms including a previously unrecognized color channel switch for Type I probes.
Abstract: Illumina Infinium DNA Methylation BeadChips represent the most widely used genome-scale DNA methylation assays. Existing strategies for masking Infinium probes overlapping repeats or single nucleotide polymorphisms (SNPs) are based largely on ad hoc assumptions and subjective criteria. In addition, the recently introduced MethylationEPIC (EPIC) array expands on the utility of this platform, but has not yet been well characterized. We present in this paper an extensive characterization of probes on the EPIC and HM450 microarrays, including mappability to the latest genome build, genomic copy number of the 3΄ nested subsequence and influence of polymorphisms including a previously unrecognized color channel switch for Type I probes. We show empirical evidence for exclusion criteria for underperforming probes, providing a sounder basis than current ad hoc criteria for exclusion. In addition, we describe novel probe uses, exemplified by the addition of a total of 1052 SNP probes to the existing 59 explicit SNP probes on the EPIC array and the use of these probes to predict ethnicity. Finally, we present an innovative out-of-band color channel application for the dual use of 62 371 probes as internal bisulfite conversion controls.

484 citations

Journal ArticleDOI
Carolina Roselli1, Mark Chaffin1, Lu-Chen Weng2, Lu-Chen Weng1  +257 moreInstitutions (82)
TL;DR: This large, multi-ethnic genome-wide association study identifies 97 loci significantly associated with atrial fibrillation that are enriched for genes involved in cardiac development, electrophysiology, structure and contractile function.
Abstract: Atrial fibrillation (AF) affects more than 33 million individuals worldwide1 and has a complex heritability2. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for AF to date, consisting of more than half a million individuals, including 65,446 with AF. In total, we identified 97 loci significantly associated with AF, including 67 that were novel in a combined-ancestry analysis, and 3 that were novel in a European-specific analysis. We sought to identify AF-associated genes at the GWAS loci by performing RNA-sequencing and expression quantitative trait locus analyses in 101 left atrial samples, the most relevant tissue for AF. We also performed transcriptome-wide analyses that identified 57 AF-associated genes, 42 of which overlap with GWAS loci. The identified loci implicate genes enriched within cardiac developmental, electrophysiological, contractile and structural pathways. These results extend our understanding of the biological pathways underlying AF and may facilitate the development of therapeutics for AF.

477 citations


Cites methods from "A global reference for human geneti..."

  • ...was confirmed by assessing principal components in the samples combined with 1000 Genomes European samples.(41) Associations between gene expression and genotypes were tested in a linear regression model with QTLtools v1....

    [...]

  • ...Studies without available HRC imputation were included based on imputation to the 1000 Genomes Phase 1 integrated v3 panel (March 2012).(41) Participants of the SiGN study were imputed to a combined reference panel consisting of 1000 Genomes phase 1 plus Genome of the Netherlands....

    [...]

Journal ArticleDOI
03 Nov 2017-Science
TL;DR: The genome of a female Neandertal from ~50,000 years ago from Vindija Cave, Croatia, is sequenced to ~30-fold genomic coverage, allowing 10 to 20% more Ne andertal DNA to be identified in present-day humans, including variants involved in low-density lipoprotein cholesterol concentrations, schizophrenia, and other diseases.
Abstract: To date, the only Neandertal genome that has been sequenced to high quality is from an individual found in Southern Siberia. We sequenced the genome of a female Neandertal from ~50,000 years ago from Vindija Cave, Croatia, to ~30-fold genomic coverage. She carried 1.6 differences per 10,000 base pairs between the two copies of her genome, fewer than present-day humans, suggesting that Neandertal populations were of small size. Our analyses indicate that she was more closely related to the Neandertals that mixed with the ancestors of present-day humans living outside of sub-Saharan Africa than the previously sequenced Neandertal from Siberia, allowing 10 to 20% more Neandertal DNA to be identified in present-day humans, including variants involved in low-density lipoprotein cholesterol concentrations, schizophrenia, and other diseases.

473 citations

Posted ContentDOI
22 Mar 2018-bioRxiv
TL;DR: This study demonstrates that, as previously predicted, increasing GWAS sample sizes continues to deliver, by discovery of new loci, increasing prediction accuracy and providing additional data to achieve deeper insight into complex trait biology.
Abstract: Genome-wide association studies (GWAS) stand as powerful experimental designs for identifying DNA variants associated with complex traits and diseases. In the past decade, both the number of such studies and their sample sizes have increased dramatically. Recent GWAS of height and body mass index (BMI) in ~250,000 European participants have led to the discovery of ~700 and ~100 nearly independent SNPs associated with these traits, respectively. Here we combine summary statistics from those two studies with GWAS of height and BMI performed in ~450,000 UK Biobank participants of European ancestry. Overall, our combined GWAS meta-analysis reaches N~700,000 individuals and substantially increases the number of GWAS signals associated with these traits. We identified 3,290 and 716 near-independent SNPs associated with height and BMI, respectively (at a revised genome-wide significance threshold of p

470 citations

Journal ArticleDOI
TL;DR: Comparative population genomics is on its way to providing a solution to 'Lewontin's paradox' — the discrepancy between the many orders of magnitude of variation in population size and the much narrower distribution of diversity levels.
Abstract: The degree of genetic diversity differs greatly among species and across genomic loci within genomes. The wide ranges in genetic diversity have important implications, including for evolution, conservation and management of wild and domesticated species. In this Review, the authors discuss how genome-scale sequencing strategies are providing insight into the varied determinants of genetic variation both among species and across genomic regions.

470 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations