A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Graphtyper enables population-scale genotyping using pangenome graphs

[...]

Hannes P. Eggertsson¹, Hannes P. Eggertsson², Hakon Jonsson¹, Snaedis Kristmundsdottir¹, Snaedis Kristmundsdottir³, Eirikur Hjartarson¹, Birte Kehr¹, Gisli Masson¹, Florian Zink¹, Kristjan E. Hjorleifsson¹, Aslaug Jonasdottir¹, Adalbjorg Jonasdottir¹, Ingileif Jonsdottir¹, Ingileif Jonsdottir⁴, Daniel F. Gudbjartsson², Daniel F. Gudbjartsson¹, Páll Melsted¹, Páll Melsted², Kari Stefansson¹, Kari Stefansson⁴, Bjarni V. Halldorsson¹, Bjarni V. Halldorsson³ - Show less +18 more•Institutions (4)

Amgen¹, University of Iceland², Reykjavík University³, RMIT University⁴

25 Sep 2017-Nature Genetics

TL;DR: Graphtyper as discussed by the authors realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths.

...read moreread less

Abstract: A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.

...read moreread less

172 citations

Journal Article•DOI•

Item-level analyses reveal genetic heterogeneity in neuroticism.

[...]

Mats Nagel, Kyoko Watanabe¹, Sven Stringer¹, Danielle Posthuma¹, Sophie van der Sluis - Show less +1 more•Institutions (1)

VU University Amsterdam¹

02 Mar 2018-Nature Communications

TL;DR: The items used to measure neuroticism are genetically heterogeneous, and that biological understanding can be gained by studying them in genetically more homogeneous clusters, and genetic heterogeneity at the item-level is demonstrated.

...read moreread less

Abstract: Genome-wide association studies (GWAS) of psychological traits are generally conducted on (dichotomized) sums of items or symptoms (e.g., case-control status), and not on the individual items or symptoms themselves. We conduct large-scale GWAS on 12 neuroticism items and observe notable and replicable variation in genetic signal between items. Within samples, genetic correlations among the items range between 0.38 and 0.91 (mean rg = .63), indicating genetic heterogeneity in the full item set. Meta-analyzing the two samples, we identify 255 genome-wide significant independent genomic regions, of which 138 are item-specific. Genetic analyses and genetic correlations with 33 external traits support genetic differences between the items. Hierarchical clustering analysis identifies two genetically homogeneous item clusters denoted depressed affect and worry. We conclude that the items used to measure neuroticism are genetically heterogeneous, and that biological understanding can be gained by studying them in genetically more homogeneous clusters.

...read moreread less

172 citations

Journal Article•DOI•

Moderate-to-severe asthma in individuals of European ancestry: a genome-wide association study

[...]

Nick Shrine¹, Michael A. Portelli², Catherine John¹, María Soler Artigas¹, Neil Bennett¹, Robert J. Hall², Jon Lewis², Amanda P. Henry², Charlotte K. Billington², Azaz Ahmad², Richard Packer¹, Dominick E. Shaw², Zara Pogson³, Andrew M. Fogarty³, Tricia M. McKeever³, Amisha Singapuri¹, Liam G Heaney⁴, Adel H. Mansur⁵, Rekha Chaudhuri⁶, Neil C. Thomson⁶, John W. Holloway⁷, Gabrielle A. Lockett⁷, Peter H. Howarth⁷, Ratko Djukanovic⁷, Jenny Hankinson⁸, Robert Niven⁸, Angela Simpson⁸, Kian Fan Chung⁹, Peter J. Sterk¹⁰, John D Blakey¹¹, Ian M. Adcock⁹, Sile Hu¹², Yike Guo¹², Ma'en Obeidat¹³, Don D. Sin¹³, Maarten van den Berge¹⁴, David C. Nickle¹⁵, Yohan Bossé¹⁶, Martin D. Tobin², Martin D. Tobin¹, Ian P. Hall², Christopher E. Brightling², Louise V. Wain¹, Louise V. Wain², Ian Sayers² - Show less +41 more•Institutions (16)

University of Leicester¹, National Institute for Health Research², University of Nottingham³, Queen's University Belfast⁴, University of Birmingham⁵, University of Glasgow⁶, University of Southampton⁷, Manchester Academic Health Science Centre⁸, National Institutes of Health⁹, University of Amsterdam¹⁰, Sir Charles Gairdner Hospital¹¹, Imperial College London¹², University of British Columbia¹³, University Medical Center Groningen¹⁴, Merck & Co.¹⁵, Laval University¹⁶

01 Jan 2019-The Lancet Respiratory Medicine

TL;DR: It is found that substantial shared genetic architecture between mild and moderate-to-severe asthma is found and candidate causal genes in these loci are identified and provide increased insight into this difficult to treat population.

...read moreread less

171 citations

Journal Article•DOI•

An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis.

[...]

Slavé Petrovski¹, Slavé Petrovski², Jamie L. Todd³, Jamie L. Todd⁴, Michael T. Durheim³, Michael T. Durheim⁴, Quanli Wang², Jason W. Chien, Francine L. Kelly⁴, Courtney W. Frankel⁴, Caroline Mebane², Zhong Ren², Joshua Bridgers², Thomas J. Urban⁵, Colin D. Malone², Ashley Finlen Copeland⁴, Christie Brinkley⁴, Andrew S. Allen⁴, Thomas G. O'Riordan, John G. McHutchison, Scott M. Palmer³, Scott M. Palmer⁴, David Goldstein² - Show less +19 more•Institutions (5)

University of Melbourne¹, Columbia University Medical Center², Durham University³, Duke University⁴, University of North Carolina at Chapel Hill⁵

01 Jul 2017-American Journal of Respiratory and Critical Care Medicine

TL;DR: The idea that telomere dysfunction is involved in IPF pathogenesis is supported, as whole‐exome sequencing data identified TERT, RTEL1, and PARN—three telomeres‐related genes previously implicated in familial pulmonary fibrosis—as significant contributors to sporadic IPF.

...read moreread less

Abstract: Rationale: Idiopathic pulmonary fibrosis (IPF) is an increasingly recognized, often fatal lung disease of unknown etiology.Objectives: The aim of this study was to use whole-exome sequencing to improve understanding of the genetic architecture of pulmonary fibrosis.Methods: We performed a case–control exome-wide collapsing analysis including 262 unrelated individuals with pulmonary fibrosis clinically classified as IPF according to American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association guidelines (81.3%), usual interstitial pneumonia secondary to autoimmune conditions (11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The majority (87%) of case subjects reported no family history of pulmonary fibrosis.Measurements and Main Results: We searched 18,668 protein-coding genes for an excess of rare deleterious genetic variation using whole-exome sequence data from 262 case subjects with pulmonary fibrosis and 4,141 control subjects d...

...read moreread less

170 citations

Journal Article•DOI•

Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation.

[...]

Julien Jouganous¹, Will Long¹, Aaron P. Ragsdale¹, Simon Gravel¹•Institutions (1)

McGill University¹

01 Jul 2017-Genetics

TL;DR: A tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations is proposed.

...read moreread less

Abstract: Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.

...read moreread less

168 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
…
72
73
74
75
76
77
78
…
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations