A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Discovery and genotyping of structural variation from long-read haploid genome sequence data

[...]

John Huddleston¹, Mark Chaisson¹, Karyn Meltz Steinberg², Wes Warren², Kendra Hoekzema¹, David Gordon¹, Tina A. Graves-Lindsay², Katherine M. Munson¹, Zev N. Kronenberg¹, Laura Vives¹, Paul Peluso³, Matthew Boitano³, Chen-Shin Chin³, Jonas Korlach³, Richard K. Wilson⁴, Evan E. Eichler¹ - Show less +12 more•Institutions (4)

University of Washington¹, Washington University in St. Louis², Pacific Biosciences³, University of Pittsburgh⁴

01 May 2017-Genome Research

TL;DR: Interestingly, when the authors repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, it is found that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV, indicating that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.

...read moreread less

Abstract: In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.

...read moreread less

318 citations

Journal Article•DOI•

Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations

[...]

Henry R. Kranzler¹, Henry R. Kranzler², Hang Zhou³, Hang Zhou¹, Rachel L. Kember¹, Rachel L. Kember², Rachel Vickers Smith¹, Rachel Vickers Smith⁴, Amy C. Justice³, Amy C. Justice¹, Scott M. Damrauer², Scott M. Damrauer¹, Philip S. Tsao⁵, Philip S. Tsao⁶, Derek Klarin⁷, Aris Baras, Jeffrey S. Reid, John D. Overton, Daniel J. Rader², Zhongshan Cheng³, Zhongshan Cheng¹, Janet P. Tate¹, Janet P. Tate³, William C. Becker³, William C. Becker¹, John Concato¹, John Concato³, Ke Xu³, Ke Xu¹, Renato Polimanti¹, Renato Polimanti³, Hongyu Zhao³, Joel Gelernter³, Joel Gelernter¹ - Show less +30 more•Institutions (7)

Veterans Health Administration¹, University of Pennsylvania², Yale University³, University of Louisville⁴, VA Palo Alto Healthcare System⁵, Stanford University⁶, Harvard University⁷

02 Apr 2019-Nature Communications

TL;DR: It is concluded that, although heavy drinking is a key risk factor for AUD, it is not a sufficient cause of the disorder and a total of 18 associated loci are identified.

...read moreread less

Abstract: Alcohol consumption level and alcohol use disorder (AUD) diagnosis are moderately heritable traits. We conduct genome-wide association studies of these traits using longitudinal Alcohol Use Disorder Identification Test-Consumption (AUDIT-C) scores and AUD diagnoses in a multi-ancestry Million Veteran Program sample (N = 274,424). We identify 18 genome-wide significant loci: 5 associated with both traits, 8 associated with AUDIT-C only, and 5 associated with AUD diagnosis only. Polygenic Risk Scores (PRS) for both traits are associated with alcohol-related disorders in two independent samples. Although a significant genetic correlation reflects the overlap between the traits, genetic correlations for 188 non-alcohol-related traits differ significantly for the two traits, as do the phenotypes associated with the traits’ PRS. Cell type group partitioning heritability enrichment analyses also differentiate the two traits. We conclude that, although heavy drinking is a key risk factor for AUD, it is not a sufficient cause of the disorder. The genetic underpinnings of alcohol use disorder and consumption are incompletely understood. Here, the authors perform GWAS for Alcohol Use Disorder (AUD) Identification Test-Consumption scores and AUD diagnosis from electronic health records of 274,424 individuals and identify a total of 18 associated loci.

...read moreread less

317 citations

Journal Article•DOI•

Structure, function, and genetics of lipoprotein (a)

[...]

Konrad Schmidt¹, Asma Noureen¹, Florian Kronenberg¹, Gerd Utermann¹•Institutions (1)

Innsbruck Medical University¹

13 Apr 2016-Journal of Lipid Research

TL;DR: This review summarizes present knowledge of the structure, function, and genetics of Lp(a) with emphasis on the molecular and population genetics of the Lp (a)/LPA trait, as well as aspects of genetic epidemiology.

...read moreread less

317 citations

Journal Article•DOI•

The role of metabolism (and the microbiome) in defining the clinical efficacy of dietary flavonoids

[...]

Aedin Cassidy¹, Anne Marie Minihane¹•Institutions (1)

University of East Anglia¹

01 Jan 2017-The American Journal of Clinical Nutrition

TL;DR: This review will focus on the current knowledge for the main subclasses of flavonoids, including anthocyanins, flavonols, flavan-3-ols, and flavanones, for which there is growing evidence from prospective studies of beneficial effects on health.

...read moreread less

316 citations

Cites background from "A global reference for human geneti..."

...1000 Genome Consortium, which was published in October 2015 (64), indicated that there are typically 88 million variants in a human genome, and with knowledge that the penetrance of...
[...]

Journal Article•DOI•

Reduced signal for polygenic adaptation of height in UK Biobank.

[...]

Jeremy J. Berg¹, Arbel Harpak², Arbel Harpak¹, Nasa Sinnott-Armstrong², Anja Moltke Joergensen³, Hakhamanesh Mostafavi¹, Yair Field², Evan A. Boyle², Xinjun Zhang⁴, Fernando Racimo³, Jonathan K. Pritchard², Graham Coop⁴ - Show less +8 more•Institutions (4)

Columbia University¹, Stanford University², University of Copenhagen³, University of California, Davis⁴

21 Mar 2019-eLife

TL;DR: A new analysis based on the the UK Biobank, a large, independent dataset, finds that the signals of selection using UKB effect estimates are strongly attenuated or absent and the conclusion of strong polygenic adaptation now lacks support.

...read moreread less

Abstract: Several recent papers have reported strong signals of selection on European polygenic height scores. These analyses used height effect estimates from the GIANT consortium and replication studies. Here, we describe a new analysis based on the the UK Biobank (UKB), a large, independent dataset. We find that the signals of selection using UKB effect estimates are strongly attenuated or absent. We also provide evidence that previous analyses were confounded by population stratification. Therefore, the conclusion of strong polygenic adaptation now lacks support. Moreover, these discrepancies highlight (1) that methods for correcting for population stratification in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of differences in polygenic scores between populations should be treated with caution until these issues are better understood. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

...read moreread less

314 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
…
32
33
34
35
36
37
38
…
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations