A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

DNA fragility in the parallel evolution of pelvic reduction in stickleback fish.

[...]

Kathleen T. Xie¹, Guliang Wang², Abbey Thompson¹, Julia I. Wucherpfennig¹, Thomas E. Reimchen³, Andrew D. C. MacColl⁴, Dolph Schluter⁵, Michael A. Bell⁶, Karen M. Vasquez², David M. Kingsley¹ - Show less +6 more•Institutions (6)

Stanford University¹, University of Texas at Austin², University of Victoria³, University of Nottingham⁴, University of British Columbia⁵, Stony Brook University⁶

04 Jan 2019-Science

TL;DR: It is shown that a pelvic enhancer gene lies within a region of the genome that is prone to double-stranded DNA breakage owing to a high thymine-guanine content, which could lead to enhanced mutation rates that facilitate repeated adaptations to new environments.

...read moreread less

Abstract: Evolution generates a remarkable breadth of living forms, but many traits evolve repeatedly, by mechanisms that are still poorly understood. A classic example of repeated evolution is the loss of pelvic hindfins in stickleback fish (Gasterosteus aculeatus). Repeated pelvic loss maps to recurrent deletions of a pelvic enhancer of the Pitx1 gene. Here, we identify molecular features contributing to these recurrent deletions. Pitx1 enhancer sequences form alternative DNA structures in vitro and increase double-strand breaks and deletions in vivo. Enhancer mutability depends on DNA replication direction and is caused by TG-dinucleotide repeats. Modeling shows that elevated mutation rates can influence evolution under demographic conditions relevant for sticklebacks and humans. DNA fragility may thus help explain why the same loci are often used repeatedly during parallel adaptive evolution.

...read moreread less

157 citations

Journal Article•DOI•

Genetic predisposition to mosaic Y chromosome loss in blood.

[...]

Deborah J. Thompson¹, Giulio Genovese², Giulio Genovese³, Jonatan Halvardson⁴, Jacob C. Ulirsch², Jacob C. Ulirsch³, Daniel J Wright¹, Daniel J Wright⁵, Chikashi Terao, Olafur B. Davidsson⁶, Felix R. Day¹, Felix R. Day⁷, Patrick Sulem⁶, Yunxuan Jiang, Marcus Danielsson⁴, Hanna Davies⁴, Joe Dennis¹, Malcolm G. Dunlop⁸, Douglas F. Easton¹, Victoria A Fisher, Florian Zink⁶, Richard S. Houlston⁹, Martin Ingelsson¹⁰, Siddhartha Kar¹, Nicola D. Kerrison¹, Ben Kinnersley⁹, Ragnar P. Kristjansson⁶, Philip J. Law⁹, Rong Li¹¹, Chey Loveday⁹, Jonas Mattisson⁴, Steven A. McCarroll³, Steven A. McCarroll², Yoshinori Murakami¹², Anna Murray¹³, Paweł Olszewski¹⁴, Edyta Rychlicka-Buniowska¹⁴, Edyta Rychlicka-Buniowska⁴, Robert A. Scott¹, Unnur Thorsteinsdottir⁶, Unnur Thorsteinsdottir¹⁵, Ian Tomlinson¹⁶, Behrooz Torabi Moghadam⁴, Clare Turnbull⁹, Clare Turnbull¹⁷, Nicholas J. Wareham¹, Daniel F. Gudbjartsson¹⁵, Daniel F. Gudbjartsson⁶, Yoichiro Kamatani¹⁸, Eva Hoffmann¹, Steve P Jackson¹⁵, Steve P Jackson⁶, Kari Stefansson, Adam Auton¹, Ken K. Ong, Mitchell J. Machiela³, Mitchell J. Machiela¹⁹, Po-Ru Loh⁴, Po-Ru Loh¹⁴, Jan P. Dumanski, Stephen J. Chanock¹⁰, Stephen J. Chanock⁴, Lars Forsberg¹, Lars Forsberg⁷, John R. B. Perry¹, John R. B. Perry⁷ - Show less +62 more•Institutions (19)

University of Cambridge¹, Harvard University², Broad Institute³, Science for Life Laboratory⁴, Wellcome Trust Sanger Institute⁵, deCODE genetics⁶, Erasmus University Rotterdam⁷, Western General Hospital⁸, Institute of Cancer Research⁹, Uppsala University¹⁰, Johns Hopkins University School of Medicine¹¹, University of Tokyo¹², University of Exeter¹³, Gdańsk Medical University¹⁴, University of Iceland¹⁵, University of Birmingham¹⁶, Queen Mary University of London¹⁷, University of Copenhagen¹⁸, Brigham and Women's Hospital¹⁹

28 Nov 2019-Nature

TL;DR: A genome-wide association study of mosaic loss of chromosome Y in UK Biobank participants identifies 156 genetic determinants of LOY, showing that LOY is associated with cancer and non-haematological health outcomes and supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues.

...read moreread less

Abstract: Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism1–5, yet our knowledge of the causes and consequences of this is limited. Here, using a computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases. A genome-wide association study of mosaic loss of chromosome Y (LOY) in UK Biobank participants identifies 156 genetic determinants of LOY, showing that LOY is associated with cancer and non-haematological health outcomes.

...read moreread less

157 citations

Journal Article•DOI•

UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.

[...]

Alex Diaz-Papkovich¹, Luke Anderson-Trocmé¹, Chief Ben-Eghan¹, Simon Gravel¹•Institutions (1)

McGill University¹

01 Nov 2019-PLOS Genetics

TL;DR: Uniform manifold approximation and projection (UMAP), a non-linear dimension reduction tool, is applied to three well-studied genotype datasets and discover overlooked subpopulations within the American Hispanic population, fine-scale relationships between geography, genotypes, and phenotypes in the UK population, and cryptic structure in the Thousand Genomes Project data.

...read moreread less

Abstract: Human populations feature both discrete and continuous patterns of variation. Current analysis approaches struggle to jointly identify these patterns because of modelling assumptions, mathematical constraints, or numerical challenges. Here we apply uniform manifold approximation and projection (UMAP), a non-linear dimension reduction tool, to three well-studied genotype datasets and discover overlooked subpopulations within the American Hispanic population, fine-scale relationships between geography, genotypes, and phenotypes in the UK population, and cryptic structure in the Thousand Genomes Project data. This approach is well-suited to the influx of large and diverse data and opens new lines of inquiry in population-scale datasets.

...read moreread less

157 citations

Journal Article•DOI•

Renal compartment–specific genetic variation analyses identify new pathways in chronic kidney disease

[...]

Chengxiang Qiu¹, Shizheng Huang¹, Jihwan Park¹, YoSon Park¹, Yi-An Ko¹, Matthew J. Seasock¹, Joshua S. Bryer¹, Xiang Xi Xu², Wen-Chao Song¹, Matthew Palmer³, Jon Hill⁴, Paolo Guarnieri⁴, Julie Hawkins⁴, Carine M. Boustany-Kari⁴, Steven S. Pullen⁴, Christopher D. Brown¹, Katalin Susztak¹ - Show less +13 more•Institutions (4)

University of Pennsylvania¹, University of Miami², Hospital of the University of Pennsylvania³, Boehringer Ingelheim⁴

01 Oct 2018-Nature Medicine

TL;DR: Kidney compartment–specific eQTL analysis goes beyond GWAS to reveal causal genes and pathways involved in renal disease development, and reduces Dab2 expression in renal tubules protected mice from CKD.

...read moreread less

Abstract: Chronic kidney disease (CKD), a condition in which the kidneys are unable to clear waste products, affects 700 million people globally. Genome-wide association studies (GWASs) have identified sequence variants for CKD; however, the biological basis of these GWAS results remains poorly understood. To address this issue, we created an expression quantitative trait loci (eQTL) atlas for the glomerular and tubular compartments of the human kidney. Through integrating the CKD GWAS with eQTL, single-cell RNA sequencing and regulatory region maps, we identified novel genes for CKD. Putative causal genes were enriched for proximal tubule expression and endolysosomal function, where DAB2, an adaptor protein in the TGF-β pathway, formed a central node. Functional experiments confirmed that reducing Dab2 expression in renal tubules protected mice from CKD. In conclusion, compartment-specific eQTL analysis is an important avenue for the identification of novel genes and cellular pathways involved in CKD development and thus potential new opportunities for its treatment.

...read moreread less

157 citations

Journal Article•DOI•

Evidence that RNA Viruses Drove Adaptive Introgression between Neanderthals and Modern Humans.

[...]

David Enard¹, Dmitri A. Petrov²•Institutions (2)

University of Arizona¹, Stanford University²

04 Oct 2018-Cell

TL;DR: It is found that long, frequent-and more likely adaptive-segments of Neanderthal ancestry in modern humans are enriched for proteins that interact with viruses (VIPs) and that VIPs that interact specifically with RNA viruses were more likely to belong to introgressed segments in modern Europeans.

...read moreread less

156 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
…
80
81
82
83
84
85
86
…
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations