A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Genomic landscape of human diversity across Madagascar.

[...]

Denis Pierron¹, Margit Heiske¹, Harilanto Razafindrazaka¹, Ignace Rakoto², Nelly Ranaivo Rabetokotany², Bodo Ravololomanga², Lucien M.A. Rakotozafy², Mireille Mialy Rakotomalala², Michel Razafiarivony², Bako Rasoarifetra², Miakabola Andriamampianina Raharijesy², Lolona Razafindralambo², Ramilisonina², Fulgence Fanony², Sendra Lejamble³, Olivier Thomas³, Ahmed Mohamed Abdallah³, Christophe Rocher³, Amal Arachiche³, Laure Tonaso¹, Veronica Pereda-Loth¹, Stéphanie Schiavinato¹, Nicolas Brucato¹, François-Xavier Ricaut¹, Pradiptajati Kusuma⁴, Pradiptajati Kusuma¹, Pradiptajati Kusuma⁵, Herawati Sudoyo⁴, Herawati Sudoyo⁵, Shengyu Ni⁶, Anne Boland, Jean-François Deleuze, Philippe Beaujard, Philippe Grange⁷, Sander Adelaar⁸, Mark Stoneking⁶, Jean-Aimé Rakotoarisoa², Chantal Radimilahy², Thierry Letellier¹ - Show less +35 more•Institutions (8)

University of Toulouse¹, University of Antananarivo², University of Bordeaux³, University of Indonesia⁴, Eijkman Institute for Molecular Biology⁵, Max Planck Society⁶, University of La Rochelle⁷, University of Melbourne⁸

08 Aug 2017-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A common Bantu and Austronesian descent is found for all Malagasy individuals with a limited paternal contribution from Europe and the Middle East, suggesting independent colonization of Madagascar from Africa and Asia rather than settlement by an already admixed population.

...read moreread less

Abstract: Although situated ∼400 km from the east coast of Africa, Madagascar exhibits cultural, linguistic, and genetic traits from both Southeast Asia and Eastern Africa. The settlement history remains contentious; we therefore used a grid-based approach to sample at high resolution the genomic diversity (including maternal lineages, paternal lineages, and genome-wide data) across 257 villages and 2,704 Malagasy individuals. We find a common Bantu and Austronesian descent for all Malagasy individuals with a limited paternal contribution from Europe and the Middle East. Admixture and demographic growth happened recently, suggesting a rapid settlement of Madagascar during the last millennium. However, the distribution of African and Asian ancestry across the island reveals that the admixture was sex biased and happened heterogeneously across Madagascar, suggesting independent colonization of Madagascar from Africa and Asia rather than settlement by an already admixed population. In addition, there are geographic influences on the present genomic diversity, independent of the admixture, showing that a few centuries is sufficient to produce detectable genetic structure in human populations.

...read moreread less

81 citations

Journal Article•DOI•

Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans.

[...]

Kimberly F. McManus¹, Angela M. Taravella², Brenna M. Henn², Carlos Bustamante¹, Martin Sikora¹, Omar E. Cornejo³, Omar E. Cornejo¹ - Show less +3 more•Institutions (3)

Stanford University¹, Stony Brook University², Washington State University³

10 Mar 2017-PLOS Genetics

TL;DR: Se sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human genomes, are used to perform a fine-scale investigation of the evolutionary history of DARC and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and ≠Khomani San hunter-gatherers share the same FY*A haplotypes.

...read moreread less

Abstract: The human DARC (Duffy antigen receptor for chemokines) gene encodes a membrane-bound chemokine receptor crucial for the infection of red blood cells by Plasmodium vivax, a major causative agent of malaria. Of the three major allelic classes segregating in human populations, the FY*O allele has been shown to protect against P. vivax infection and is at near fixation in sub-Saharan Africa, while FY*B and FY*A are common in Europe and Asia, respectively. Due to the combination of strong geographic differentiation and association with malaria resistance, DARC is considered a canonical example of positive selection in humans. Despite this, details of the timing and mode of selection at DARC remain poorly understood. Here, we use sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human genomes, to perform a fine-scale investigation of the evolutionary history of DARC. We estimate the time to most recent common ancestor (TMRCA) of the most common FY*O haplotype to be 42 kya (95% CI: 34-49 kya). We infer the FY*O null mutation swept to fixation in Africa from standing variation with very low initial frequency (0.1%) and a selection coefficient of 0.043 (95% CI:0.011-0.18), which is among the strongest estimated in the human genome. We estimate the TMRCA of the FY*A mutation in non-Africans to be 57 kya (95% CI: 48-65 kya) and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and ≠Khomani San hunter-gatherers share the same FY*A haplotypes. We test multiple models of admixture that may account for this observation and reject recent Asian or European admixture as the cause.

...read moreread less

81 citations

Journal Article•DOI•

Dense and accurate whole-chromosome haplotyping of individual genomes.

[...]

David Porubsky¹, David Porubsky², Shilpa Garg³, Shilpa Garg¹, Ashley D. Sanders⁴, Ashley D. Sanders⁵, Jan O. Korbel⁴, Victor Guryev², Peter M. Lansdorp⁵, Peter M. Lansdorp⁶, Peter M. Lansdorp², Tobias Marschall³, Tobias Marschall¹ - Show less +9 more•Institutions (6)

Max Planck Society¹, University Medical Center Groningen², Saarland University³, European Bioinformatics Institute⁴, BC Cancer Agency⁵, University of British Columbia⁶

03 Nov 2017-Nature Communications

TL;DR: An integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing is introduced.

...read moreread less

Abstract: The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.

...read moreread less

81 citations

Journal Article•DOI•

Predicting the Landscape of Recombination Using Deep Learning

[...]

Jeffrey R. Adrion¹, Jared Galloway¹, Andrew D. Kern¹•Institutions (1)

University of Oregon¹

01 Jun 2020-Molecular Biology and Evolution

TL;DR: ReLERNN is described, a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility.

...read moreread less

Abstract: Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here, we describe recombination landscape estimation using recurrent neural networks (ReLERNN), a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, although largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.

...read moreread less

80 citations

Journal Article•DOI•

Interindividual variability in response to continuous theta-burst stimulation in healthy adults.

[...]

Ali Jannati¹, Gabrielle Block¹, Lindsay M. Oberman², Alexander Rotenberg³, Alvaro Pascual-Leone¹, Alvaro Pascual-Leone⁴ - Show less +2 more•Institutions (4)

Beth Israel Deaconess Medical Center¹, Brown University², Boston Children's Hospital³, Autonomous University of Barcelona⁴

01 Nov 2017-Clinical Neurophysiology

TL;DR: Data-driven cluster analysis can identify healthy subpopulations with distinct cTBS responses that are influenced by genetics and the roles of brain-derived neurotrophic factor (BDNF) and apolipoprotein E (APOE) polymorphisms in the cT BS response.

...read moreread less

80 citations

Cites methods from "A global reference for human geneti..."

...…tests were used to compare the minor allele frequencies of rs6265 (BDNF), rs429358 (APOE), and rs7412 (APOE) SNPs against the corresponding frequencies in the admixed American (AMR) population in the 1000 Genomes Project (1KGP) (Auton et al., 2015), i.e., 0.1527, 0.1037, and 0.0476, respectively....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
…
167
168
169
170
171
172
173
…
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations