A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago.

[...]

Carina M. Schlebusch¹, Carina M. Schlebusch², Helena Malmström¹, Helena Malmström², Torsten Günther², Per Sjödin², Alexandra Coutinho², Hanna Edlund², Arielle R. Munters², Mário Vicente², Maryna Steyn³, Himla Soodyall⁴, Marlize Lombard⁵, Marlize Lombard¹, Mattias Jakobsson¹, Mattias Jakobsson⁶, Mattias Jakobsson² - Show less +13 more•Institutions (6)

University of Johannesburg¹, Uppsala University², University of the Witwatersrand³, National Health Laboratory Service⁴, Stellenbosch University⁵, Science for Life Laboratory⁶

03 Nov 2017-Science

TL;DR: The first modern human population divergence time is estimated to be between 350,000 and 260,000 years ago, which increases the deepest divergence among modern humans, coinciding with anatomical developments of archaic humans intomodern humans, as represented in the local fossil record.

...read moreread less

Abstract: Southern Africa is consistently placed as a potential region for the evolution of Homo sapiens . We present genome sequences, up to 13x coverage, from seven ancient individuals from KwaZulu-Natal, South Africa. The remains of three Stone Age hunter-gatherers (about 2000 years old) were genetically similar to current-day southern San groups, and those of four Iron Age farmers (300 to 500 years old) were genetically similar to present-day Bantu-language speakers. We estimate that all modern-day Khoe-San groups have been influenced by 9 to 30% genetic admixture from East Africans/Eurasians. Using traditional and new approaches, we estimate the first modern human population divergence time to between 350,000 and 260,000 years ago. This estimate increases the deepest divergence among modern humans, coinciding with anatomical developments of archaic humans into modern humans, as represented in the local fossil record.

...read moreread less

296 citations

Journal Article•DOI•

The impact of structural variation on human gene expression

[...]

Colby Chiang¹, Alexandra J. Scott¹, Joe R. Davis², Emily K. Tsang², Xin Li², Yungil Kim³, Tarik Hadzic¹, Farhan N. Damani³, Liron Ganel¹, Stephen B. Montgomery², Alexis Battle³, Donald F. Conrad¹, Ira M. Hall¹ - Show less +9 more•Institutions (3)

Washington University in St. Louis¹, Stanford University², Johns Hopkins University³

03 Apr 2017-Nature Genetics

TL;DR: It is estimated that SVs are causal at 3.5–6.8% of eQTLs—a substantially higher fraction than prior estimates—and that expression-altering SVs have larger effect sizes than do SNVs and indels.

...read moreread less

Abstract: Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5-6.8% of eQTLs-a substantially higher fraction than prior estimates-and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.

...read moreread less

296 citations

Journal Article•DOI•

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies

[...]

Mashaal Sohail¹, Mashaal Sohail², Robert Maier², Robert Maier¹, Andrea Ganna, Alex Bloemendal², Alex Bloemendal¹, Alicia R. Martin¹, Alicia R. Martin², Michael C. Turchin³, Charleston W. K. Chiang⁴, Joel N. Hirschhorn⁵, Joel N. Hirschhorn², Joel N. Hirschhorn¹, Mark J. Daly, Nick Patterson², Nick Patterson¹, Benjamin M. Neale¹, Benjamin M. Neale², Iain Mathieson⁶, David Reich¹, David Reich², Shamil R. Sunyaev¹, Shamil R. Sunyaev⁷, Shamil R. Sunyaev² - Show less +21 more•Institutions (7)

Harvard University¹, Broad Institute², Brown University³, University of Southern California⁴, Boston Children's Hospital⁵, University of Pennsylvania⁶, Brigham and Women's Hospital⁷

21 Mar 2019-eLife

TL;DR: It is shown that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification and that population-level differences should be interpreted with caution.

...read moreread less

Abstract: Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

...read moreread less

294 citations

Journal Article•DOI•

A Quarter Century of APOE and Alzheimer’s Disease: Progress to Date and the Path Forward

[...]

Michael E. Belloy¹, Valerio Napolioni¹, Michael D. Greicius¹•Institutions (1)

Stanford University¹

06 Mar 2019-Neuron

TL;DR: This review ranges across a variety of APOE-related pathologies, touching on evolutionary genetics and risk mitigation by ethnicity and sex, and addresses one of the most fundamental question pertaining to APOE4 and AD: doesAPOE4 increase AD risk via a loss or gain of function?

...read moreread less

293 citations

Posted Content•DOI•

Multi-platform discovery of haplotype-resolved structural variation in human genomes

[...]

David Porubsky¹, Victor Guryev², Diana C.J. Spierings², Peter M. Lansdorp³, Peter M. Lansdorp⁴ - Show less +1 more•Institutions (4)

University of Washington¹, University of Groningen², University of California, San Francisco³, Drug Abuse Resistance Education⁴

23 Sep 2017-bioRxiv

TL;DR: A suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms are applied to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner.

...read moreread less

Abstract: The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥50 bp) per human genome, a seven fold increase in structural variation compared to previous reports, including from the 1000 Genomes Project. We also discovered 156 inversions per genome, most of which previously escaped detection, as well as large unbalanced chromosomal rearrangements. We provide near-complete, haplotype-resolved structural variation for three genomes that can now be used as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.

...read moreread less

292 citations

Cites background from "A global reference for human geneti..."

...Because a substantial fraction of human genetic variation occurs in regions of segmental duplication (Bailey and Eichler 2006), which are often missing from de novo assemblies (Chaisson et al. 2015), we compared the variation detected in regions of segmental duplication through read-depth to the…...
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
…
36
37
38
39
40
41
42
…
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations