A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

[...]

James J. Lee¹, Robbee Wedow², Aysu Okbay³, Edward Kong⁴, Omeed Maghzian⁴, Meghan Zacher⁴, Tuan Anh Nguyen-Viet⁵, Peter Bowers⁴, Julia Sidorenko⁶, Julia Sidorenko⁷, Richard Karlsson Linnér⁸, Richard Karlsson Linnér³, Mark Alan Fontana⁹, Mark Alan Fontana⁵, Tushar Kundu⁵, Chanwook Lee⁴, Hui Li⁴, Ruoxi Li⁵, Rebecca Royer⁵, Pascal Timshel¹⁰, Pascal Timshel¹¹, Raymond K. Walters⁴, Raymond K. Walters¹², Emily A. Willoughby¹, Loic Yengo⁶, Maris Alver⁷, Yanchun Bao¹³, David W. Clark¹⁴, Felix R. Day¹⁵, Nicholas A. Furlotte, Peter K. Joshi¹⁶, Peter K. Joshi¹⁴, Kathryn E. Kemper⁶, Aaron Kleinman, Claudia Langenberg¹⁵, Reedik Mägi⁷, Joey W. Trampush⁵, Shefali S. Verma¹⁷, Yang Wu⁶, Max Lam, Jing Hua Zhao¹⁵, Zhili Zheng⁶, Zhili Zheng¹⁸, Jason D. Boardman², Harry Campbell¹⁴, Jeremy Freese¹⁹, Kathleen Mullan Harris²⁰, Caroline Hayward¹⁴, Pamela Herd¹³, Pamela Herd²¹, Meena Kumari¹³, Todd Lencz²², Todd Lencz²³, Jian'an Luan¹⁵, Anil K. Malhotra²², Anil K. Malhotra²³, Andres Metspalu⁷, Lili Milani⁷, Ken K. Ong¹⁵, John R. B. Perry¹⁵, David J. Porteous¹⁴, Marylyn D. Ritchie¹⁷, Melissa C. Smart¹⁴, Blair H. Smith²⁴, Joyce Y. Tung, Nicholas J. Wareham¹⁵, James F. Wilson¹⁴, Jonathan P. Beauchamp²⁵, Dalton Conley²⁶, Tõnu Esko⁷, Steven F. Lehrer²⁷, Steven F. Lehrer²⁸, Steven F. Lehrer²⁹, Patrik K. E. Magnusson³⁰, Sven Oskarsson³¹, Tune H. Pers¹¹, Tune H. Pers¹⁰, Matthew R. Robinson⁶, Matthew R. Robinson³², Kevin Thom³³, Chelsea Watson⁵, Christopher F. Chabris¹⁷, Michelle N. Meyer¹⁷, David Laibson⁴, Jian Yang⁶, Magnus Johannesson³⁴, Philipp Koellinger³, Philipp Koellinger⁸, Patrick Turley⁴, Patrick Turley¹², Peter M. Visscher⁶, Daniel J. Benjamin⁵, Daniel J. Benjamin²⁸, David Cesarini³³, David Cesarini²⁸ - Show less +91 more•Institutions (34)

University of Minnesota¹, University of Colorado Boulder², VU University Amsterdam³, Harvard University⁴, University of Southern California⁵, University of Queensland⁶, University of Tartu⁷, Erasmus University Rotterdam⁸, Hospital for Special Surgery⁹, Statens Serum Institut¹⁰, University of Copenhagen¹¹, Broad Institute¹², University of Essex¹³, University of Edinburgh¹⁴, University of Cambridge¹⁵, University Hospital of Lausanne¹⁶, Geisinger Health System¹⁷, Wenzhou Medical College¹⁸, Stanford University¹⁹, University of North Carolina at Chapel Hill²⁰, University of Wisconsin-Madison²¹, Hofstra University²², The Feinstein Institute for Medical Research²³, University of Dundee²⁴, University of Toronto²⁵, Princeton University²⁶, Queen's University²⁷, National Bureau of Economic Research²⁸, New York University Shanghai²⁹, Karolinska Institutet³⁰, Uppsala University³¹, University of Lausanne³², New York University³³, Stockholm School of Economics³⁴

23 Jul 2018-Nature Genetics

TL;DR: A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance ineducational attainment and 7–10% ofthe variance in cognitive performance, which substantially increases the utility ofpolygenic scores as tools in research.

...read moreread less

Abstract: Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

...read moreread less

1,658 citations

Journal Article•DOI•

Pan-cancer analysis of whole genomes

[...]

Peter J. Campbell¹, Gad Getz², Jan O. Korbel³, Joshua M. Stuart⁴ +1329 more•Institutions (238)

06 Feb 2020-Nature

TL;DR: The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.

...read moreread less

Abstract: Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1,2,3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10,11,12,13,14,15,16,17,18.

...read moreread less

1,600 citations

Posted Content•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek¹, Konrad J. Karczewski¹, Eric Vallabh Minikel¹, Kaitlin E. Samocha¹, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria¹, James S. Ware², Andrew J. Hill¹, Beryl B. Cummings¹, Taru Tukiainen¹, Daniel P. Birnbaum¹, Jack A. Kosmicki¹, Laramie E. Duncan¹, Karol Estrada¹, Fengmei Zhao¹, James Zou², Emma Pierce-Hoffman¹, David Neil Cooper³, Mark A. DePristo², Ron Do⁴, Jason Flannick², Menachem Fromer¹, Laura D. Gauthier², Jackie Goldstein¹, Namrata Gupta², Daniel P. Howrigan¹, Adam Kiezun², Mitja I. Kurki², Ami Levy Moonshine², Pradeep Natarajan², Lorena Orozco, Gina M. Peloso², Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Douglas M. Ruderfer⁴, Khalid Shakir², Peter D. Stenson³, Christine Stevens², Brett Thomas¹, Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won², Dongmei Yu², David Altshuler², Diego Ardissino, Michael Boehnke⁵, John Danesh⁶, Roberto Elosua, Jose C. Florez², Stacey Gabriel², Gad Getz², Christina M. Hultman⁷, Sekar Kathiresan², Markku Laakso⁸, Steven A. McCarroll², Mark I. McCarthy⁹, Dermot P.B. McGovern¹⁰, Ruth McPherson¹¹, Benjamin M. Neale¹, Aarno Palotie¹², Shaun Purcell⁴, Danish Saleheen¹³, Jeremiah M. Scharf², Pamela Sklar⁴, Patrick F. Sullivan¹⁴, Jaakko Tuomilehto¹², Hugh Watkins⁹, James G. Wilson¹⁵, Mark J. Daly¹, Daniel G. MacArthur¹ - Show less +69 more•Institutions (15)

Harvard University¹, Broad Institute², Cardiff University³, Icahn School of Medicine at Mount Sinai⁴, University of Michigan⁵, University of Cambridge⁶, Karolinska Institutet⁷, University of Eastern Finland⁸, University of Oxford⁹, Cedars-Sinai Medical Center¹⁰, University of Ottawa¹¹, University of Helsinki¹², University of Pennsylvania¹³, University of North Carolina at Chapel Hill¹⁴, University of Mississippi Medical Center¹⁵

30 Oct 2015-bioRxiv

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.

...read moreread less

1,552 citations

Journal Article•DOI•

Phased diploid genome assembly with single-molecule real-time sequencing

[...]

Chen-Shan Chin¹, Paul Peluso¹, Fritz J. Sedlazeck², Maria Nattestad³, Gregory T. Concepcion¹, Alicia Clum⁴, Christopher Dunn¹, Ronan C. O'Malley⁵, Rosa Figueroa-Balderas⁶, Abraham Morales-Cruz⁶, Grant R. Cramer⁷, Massimo Delledonne⁸, Chongyuan Luo⁵, Joseph R. Ecker⁵, Dario Cantu⁶, David R. Rank¹, Michael C. Schatz², Michael C. Schatz³ - Show less +14 more•Institutions (8)

Pacific Biosciences¹, Johns Hopkins University², Cold Spring Harbor Laboratory³, Joint Genome Institute⁴, Salk Institute for Biological Studies⁵, University of California, Davis⁶, University of Nevada, Reno⁷, University of Verona⁸

01 Dec 2016-Nature Methods

TL;DR: The open-source FALCON and FALcon-Unzip algorithms are introduced to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes.

...read moreread less

Abstract: While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

...read moreread less

1,490 citations

Journal Article•DOI•

Clinical use of current polygenic risk scores may exacerbate health disparities.

[...]

Alicia R. Martin¹, Masahiro Kanai, Yoichiro Kamatani², Yukinori Okada³, Benjamin M. Neale⁴, Benjamin M. Neale¹, Mark J. Daly - Show less +3 more•Institutions (4)

Harvard University¹, Kyoto University², Osaka University³, Broad Institute⁴

29 Mar 2019-Nature Genetics

TL;DR: To realize the full and equitable potential of polygenic risk scores, greater diversity must be prioritized in genetic studies, and summary statistics must be publically disseminated to ensure that health disparities are not increased for those individuals already most underserved.

...read moreread less

Abstract: Polygenic risk scores (PRS) are poised to improve biomedical outcomes via precision medicine. However, the major ethical and scientific challenge surrounding clinical implementation of PRS is that those available today are several times more accurate in individuals of European ancestry than other ancestries. This disparity is an inescapable consequence of Eurocentric biases in genome-wide association studies, thus highlighting that-unlike clinical biomarkers and prescription drugs, which may individually work better in some populations but do not ubiquitously perform far better in European populations-clinical uses of PRS today would systematically afford greater improvement for European-descent populations. Early diversifying efforts show promise in leveling this vast imbalance, even when non-European sample sizes are considerably smaller than the largest studies to date. To realize the full and equitable potential of PRS, greater diversity must be prioritized in genetic studies, and summary statistics must be publically disseminated to ensure that health disparities are not increased for those individuals already most underserved.

...read moreread less

1,472 citations

1
…
2
3
4
5
6
7
8
…
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations