A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Evaluation of polygenic prediction methodology within a reference-standardized framework

[...]

Oliver Pain¹, Kylie P. Glanville¹, Saskia P. Hagenaars¹, Saskia Selzam¹, Anna E. Fürtjes¹, Helena Gaspar¹, Jonathan R. I. Coleman¹, Kaili Rimfeld¹, Gerome Breen¹, Robert Plomin¹, Lasse Folkersen², Cathryn M. Lewis¹ - Show less +8 more•Institutions (2)

King's College London¹, Sankt Hans Hospital²

04 May 2021-PLOS Genetics

TL;DR: In this paper, the authors evaluated the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores.

...read moreread less

Abstract: The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.

...read moreread less

73 citations

Journal Article•DOI•

Global diversity in the TAS2R38 bitter taste receptor: revisiting a classic evolutionary PROPosal.

[...]

Davide Risso¹, Massimo Mezzavilla², Luca Pagani³, Antonietta Robino², Gabriella Morini⁴, Sergio Tofanelli⁵, Maura Carrai⁵, Daniele Campa⁵, Roberto Barale⁵, Fabio Caradonna⁶, Paolo Gasparini², Donata Luiselli³, Stephen Wooding⁷, Dennis Drayna¹ - Show less +10 more•Institutions (7)

National Institutes of Health¹, University of Trieste², University of Bologna³, University of Gastronomic Sciences⁴, University of Pisa⁵, University of Palermo⁶, University of California, Merced⁷

03 May 2016-Scientific Reports

TL;DR: Evidence is provided for a relaxation of recent selective forces acting on this gene and a revised hypothesis for the origins of the present-day worldwide distribution of TAS2R38 haplotypes is revised.

...read moreread less

Abstract: The ability to taste phenylthiocarbamide (PTC) and 6-n-propylthiouracil (PROP) is a polymorphic trait mediated by the TAS2R38 bitter taste receptor gene. It has long been hypothesized that global genetic diversity at this locus evolved under pervasive pressures from balancing natural selection. However, recent high-resolution population genetic studies of TAS2Rs suggest that demographic events have played a critical role in the evolution of these genes. We here utilized the largest TAS2R38 database yet analyzed, consisting of 5,589 individuals from 105 populations, to examine natural selection, haplotype frequencies and linkage disequilibrium to estimate the effects of both selection and demography on contemporary patterns of variation at this locus. We found signs of an ancient balancing selection acting on this gene but no post Out-Of-Africa departures from neutrality, implying that the current observed patterns of variation can be predominantly explained by demographic, rather than selective events. In addition, we found signatures of ancient selective forces acting on different African TAS2R38 haplotypes. Collectively our results provide evidence for a relaxation of recent selective forces acting on this gene and a revised hypothesis for the origins of the present-day worldwide distribution of TAS2R38 haplotypes.

...read moreread less

73 citations

Journal Article•DOI•

Accurate, scalable cohort variant calls using DeepVariant and GLnexus.

[...]

Taedong Yun¹, Helen Li¹, Pi-Chuan Chang¹, Michael F. Lin, Andrew Carroll¹, Cory Y. McLean¹ - Show less +2 more•Institutions (1)

Google¹

05 Jan 2021-Bioinformatics

TL;DR: In this paper, an open-source cohort-calling method that uses the highly-accurate caller DeepVariant and scalable merging tool GLnexus is introduced, using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios.

...read moreread less

Abstract: Motivation Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. Results We introduce an open-source cohort-calling method that uses the highly-accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimized the method across a range of cohort sizes, sequencing methods, and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently-generated GATK Best Practices pipeline. Availability and implementation We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-sourced, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

73 citations

Journal Article•DOI•

The Divergence of Neandertal and Modern Human Y Chromosomes

[...]

Fernando L. Mendez¹, G. David Poznik¹, Sergi Castellano², Carlos Bustamante¹•Institutions (2)

Stanford University¹, Max Planck Society²

07 Apr 2016-American Journal of Human Genetics

TL;DR: The estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) suggests that the Y-chromosome divergence mirrors the population divergence of Ne andertals andmodern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neanderthal Y chromosomes.

...read moreread less

Abstract: Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ∼120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidron, Spain. We investigate its divergence from orthologous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes—including A00, the highly divergent basal haplogroup. We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) (95% confidence interval [CI]: 447–806 kya). This is ∼2.1 (95% CI: 1.7–2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.

...read moreread less

73 citations

Journal Article•DOI•

Pro-inflammatory fatty acid profile and colorectal cancer risk: A Mendelian randomisation analysis

[...]

Sebastian May-Wilson¹, Amit Sud¹, Philip J. Law¹, Kimmo Palin², Sari Tuupanen², Alexandra E. Gylfe², Ulrika A. Hänninen², Tatiana Cajuso², Tomas Tanskanen², Johanna Kondelin², Eevi Kaasinen², Antti-Pekka Sarin², Johan G. Eriksson³, Harri Rissanen³, Paul Knekt³, Eero Pukkala⁴, Pekka Jousilahti³, Veikko Salomaa³, Samuli Ripatti⁵, Aarno Palotie⁶, Laura Renkonen-Sinisalo², Anna Lepistö², Jan Böhm, Jukka-Pekka Mecklin⁷, Nada Al-Tassan, Claire Palles⁸, Susan M. Farrington⁹, Maria Timofeeva⁹, Brian F. Meyer, Salma M. Wakil, Harry Campbell¹⁰, Christopher Smith¹¹, Shelley Idziaszczyk¹¹, Tim Maughan¹², David Fisher, Rachel Kerr¹², David J. Kerr¹³, Michael N. Passarelli¹⁴, Jane C. Figueiredo¹⁵, Daniel D. Buchanan¹⁶, Aung Ko Win¹⁶, John L. Hopper¹⁶, Mark A. Jenkins¹⁶, Noralane M. Lindor¹⁷, Polly A. Newcomb¹⁸, Steven Gallinger¹⁹, David V. Conti¹⁵, Fred Schumacher¹⁵, Graham Casey²⁰, Lauri A. Aaltonen², Jeremy Peter Cheadle¹¹, Ian Tomlinson⁸, Malcolm G. Dunlop⁹, Richard S. Houlston¹ - Show less +50 more•Institutions (20)

Institute of Cancer Research¹, University of Helsinki², National Institutes of Health³, RMIT University⁴, Wellcome Trust Sanger Institute⁵, Harvard University⁶, University of Eastern Finland⁷, Wellcome Trust Centre for Human Genetics⁸, Western General Hospital⁹, University of Edinburgh¹⁰, Cardiff University¹¹, University of Oxford¹², John Radcliffe Hospital¹³, Dartmouth–Hitchcock Medical Center¹⁴, University of Southern California¹⁵, University of Melbourne¹⁶, Mayo Clinic¹⁷, Fred Hutchinson Cancer Research Center¹⁸, Lunenfeld-Tanenbaum Research Institute¹⁹, University of Virginia²⁰

01 Oct 2017-European Journal of Cancer

TL;DR: In this article, Mendelian randomisation (MR) was used to evaluate associations between PUFA, monounsaturated (MUFA) and saturated FAs (SFAs) and CRC risk.

...read moreread less

73 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
…
187
188
189
190
191
192
193
…
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations