A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The ExAC browser: displaying reference data information from over 60 000 exomes

[...]

Konrad J. Karczewski¹, Ben Weisburd¹, Brett Thomas¹, Brett Thomas², Matthew Solomonson², Matthew Solomonson¹, Douglas M. Ruderfer³, David H. Kavanagh³, Tymor Hamamsy³, Monkol Lek², Monkol Lek¹, Kaitlin E. Samocha², Kaitlin E. Samocha¹, Beryl B. Cummings², Beryl B. Cummings¹, Daniel P. Birnbaum¹, Daniel P. Birnbaum², Mark J. Daly², Mark J. Daly¹, Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +17 more•Institutions (3)

Harvard University¹, Broad Institute², Icahn School of Medicine at Mount Sinai³

04 Jan 2017-Nucleic Acids Research

TL;DR: The ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications, and provides a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant.

...read moreread less

Abstract: Worldwide, hundreds of thousands of humans have had their genomes or exomes sequenced, and access to the resulting data sets can provide valuable information for variant interpretation and understanding gene function. Here, we present a lightweight, flexible browser framework to display large population datasets of genetic variation. We demonstrate its use for exome sequence data from 60 706 individuals in the Exome Aggregation Consortium (ExAC). The ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications. Additionally, we provide a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. This browser is open-source, freely available at http://exac.broadinstitute.org, and has already been used extensively by clinical laboratories worldwide.

...read moreread less

518 citations

Cites background from "A global reference for human geneti..."

...Recently, large reference datasets, such as those from the 1000 Genomes Project Consortium (1), Exome Sequencing Project (ESP) (2) and Exome Aggregation Consortium (ExAC) (3), have become publicly available for the benefit of the biomedical community....
[...]

Journal Article•DOI•

Integrated Proteogenomic Characterization of HBV-Related Hepatocellular Carcinoma

[...]

Qiang Gao¹, Hongwen Zhu², Liangqing Dong¹, Weiwei Shi, Ran Chen², Zhijian Song, Chen Huang³, Junqiang Li, Xiaowei Dong, Yanting Zhou², Qian Liu², Lijie Ma¹, Xiaoying Wang¹, Jian Zhou¹, Yansheng Liu⁴, Emily S. Boja, Ana I. Robles, Weiping Ma⁵, Pei Wang⁵, Yize Li⁶, Li Ding⁶, Bo Wen³, Bing Zhang³, Henry Rodriguez, Daming Gao², Hu Zhou², Jia Fan¹ - Show less +23 more•Institutions (6)

Fudan University¹, Chinese Academy of Sciences², Baylor College of Medicine³, Yale University⁴, Icahn School of Medicine at Mount Sinai⁵, Washington University in St. Louis⁶

03 Oct 2019-Cell

TL;DR: The first proteogenomic characterization of hepatitis B virus-related hepatocellular carcinoma using paired tumor and adjacent liver tissues from 159 patients provides a valuable resource that significantly expands the knowledge of HBV-related HCC and may eventually benefit clinical practice.

...read moreread less

509 citations

Journal Article•DOI•

Telomere-to-telomere assembly of a complete human X chromosome

[...]

Karen H. Miga¹, Sergey Koren², Arang Rhie², Mitchell R. Vollger³, Ariel Gershman⁴, Andrey Bzikadze⁵, Shelise Brooks², Edmund Howe⁶, David Porubsky³, Glennis A. Logsdon³, Valerie A. Schneider², Tamara A. Potapova⁶, Jonathan Wood⁷, William Chow⁷, Joel Armstrong¹, Jeanne Fredrickson³, Evgenia Pak², Kristof Tigyi¹, Milinn Kremitzki⁸, Christopher Markovic⁸, Valerie Maduro², Amalia Dutra², Gerard G. Bouffard², Alexander M. Chang², Nancy F. Hansen², Amy B. Wilfert³, Françoise Thibaud-Nissen², Anthony D. Schmitt, Jon Matthew Belton, Siddarth Selvaraj, Megan Y. Dennis⁹, Daniela C. Soto⁹, Ruta Sahasrabudhe⁹, Gulhan Kaya⁹, Josh Quick¹⁰, Nicholas J. Loman¹⁰, Nadine Holmes¹¹, Matthew Loose¹¹, Urvashi Surti¹², Rosa Ana Risques³, Tina A. Graves Lindsay⁸, Robert S. Fulton⁸, Ira M. Hall⁸, Benedict Paten¹, Kerstin Howe⁷, Winston Timp⁴, Alice Young², James C. Mullikin², Pavel A. Pevzner⁵, Jennifer L. Gerton⁶, Beth A. Sullivan¹³, Evan E. Eichler³, Adam M. Phillippy² - Show less +49 more•Institutions (13)

University of California, Santa Cruz¹, National Institutes of Health², University of Washington³, Johns Hopkins University⁴, University of California, San Diego⁵, Stowers Institute for Medical Research⁶, Wellcome Trust Sanger Institute⁷, Washington University in St. Louis⁸, University of California, Davis⁹, University of Birmingham¹⁰, University of Nottingham¹¹, University of Pittsburgh¹², Duke University¹³

03 Sep 2020-Nature

TL;DR: High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

...read moreread less

Abstract: After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes. High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

...read moreread less

502 citations

Journal Article•DOI•

Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression.

[...]

Benjamin J Schmiedel¹, Divya Singh¹, Ariel Madrigal¹, Alan G. Valdovino-Gonzalez¹, Brandie White¹, Jose Zapardiel-Gonzalo¹, Brendan Ha¹, Gökmen Altay¹, Jason A. Greenbaum¹, Graham McVicker², Grégory Seumois¹, Anjana Rao¹, Mitchell Kronenberg¹, Bjoern Peters¹, Pandurangan Vijayanand³, Pandurangan Vijayanand⁴, Pandurangan Vijayanand¹ - Show less +13 more•Institutions (4)

La Jolla Institute for Allergy and Immunology¹, Salk Institute for Biological Studies², University of California, San Diego³, University of Southampton⁴

29 Nov 2018-Cell

TL;DR: The DICE project identified cis-eQTLs for a total of 12,254 unique genes, which represent 61% of all protein-coding genes expressed in these cell types and found that biological sex is associated with major differences in immune cell gene expression in a highly cell-specific manner.

...read moreread less

499 citations

Cites background from "A global reference for human geneti..."

...Genomic surveys of individuals from multiple populations have revealed significant genetic heterogeneity, with over 80 million autosomal single nucleotide polymorphisms (SNPs), including 8million common variants (Auton et al., 2015)....
[...]

Journal Article•DOI•

Multiscale Analysis of Independent Alzheimer's Cohorts Finds Disruption of Molecular, Genetic, and Clinical Networks by Human Herpesvirus.

[...]

Benjamin Readhead, Jean-Vianney Haure-Mirande¹, Cory C. Funk², Matthew A. Richards², Paul Shannon², Vahram Haroutunian¹, Vahram Haroutunian³, Mary Sano¹, Mary Sano³, Winnie S. Liang⁴, Noam D. Beckmann¹, Nathan D. Price², Eric M. Reiman, Eric E. Schadt¹, Michelle E. Ehrlich, Sam Gandy, Joel T. Dudley - Show less +13 more•Institutions (4)

Icahn School of Medicine at Mount Sinai¹, Institute for Systems Biology², Veterans Health Administration³, Translational Genomics Research Institute⁴

11 Jul 2018-Neuron

TL;DR: This study constructs multiscale networks of the late-onset AD-associated virome, and elucidates networks linking molecular, clinical, and neuropathological features with viral activity and indicates viral activity constituting a general feature of AD.

...read moreread less

495 citations

Cites methods from "A global reference for human geneti..."

...Common variants were imputed using IMPUTE2 (Howie et al., 2009, 2011) using 1000 Genomes Phase 3 reference genotypes (1000 Genomes Project Consortium et al., 2015)....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
…
17
18
19
20
21
22
23
…
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations