A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Genetic disease risks can be misestimated across global populations

[...]

Michelle S Kim¹, Kane P Patel¹, Andrew K Teng¹, Ali J. Berens¹, Joseph Lachance¹ - Show less +1 more•Institutions (1)

Georgia Institute of Technology¹

14 Nov 2018-Genome Biology

TL;DR: It is found that risk allele frequencies at known disease loci are significantly different for African populations compared to other continents and that caution must be taken when extrapolating GWAS results from one population to predict disease risks in another population.

...read moreread less

Abstract: Accurate assessment of health disparities requires unbiased knowledge of genetic risks in different populations. Unfortunately, most genome-wide association studies use genotyping arrays and European samples. Here, we integrate whole genome sequence data from global populations, results from thousands of genome-wide association studies (GWAS), and extensive computer simulations to identify how genetic disease risks can be misestimated. In contrast to null expectations, we find that risk allele frequencies at known disease loci are significantly different for African populations compared to other continents. Strikingly, ancestral risk alleles are found at 9.51% higher frequency in Africa, and derived risk alleles are found at 5.40% lower frequency in Africa. By simulating GWAS with different study populations, we find that non-African cohorts yield disease associations that have biased allele frequencies and that African cohorts yield disease associations that are relatively free of bias. We also find empirical evidence that genotyping arrays and SNP ascertainment bias contribute to continental differences in risk allele frequencies. Because of these causes, polygenic risk scores can be grossly misestimated for individuals of African descent. Importantly, continental differences in risk allele frequencies are only moderately reduced if GWAS use whole genome sequences and hundreds of thousands of cases and controls. Finally, comparisons between uncorrected and corrected genetic risk scores reveal the benefits of considering whether risk alleles are ancestral or derived. Our results imply that caution must be taken when extrapolating GWAS results from one population to predict disease risks in another population.

...read moreread less

135 citations

Journal Article•DOI•

Next-Generation Sequencing and Emerging Technologies.

[...]

Kishore R. Kumar¹, Kishore R. Kumar², Kishore R. Kumar³, Mark J. Cowley², Mark J. Cowley⁴, Ryan L. Davis¹, Ryan L. Davis² - Show less +3 more•Institutions (4)

Royal North Shore Hospital¹, Garvan Institute of Medical Research², Concord Hospital³, University of New South Wales⁴

16 May 2019-Seminars in Thrombosis and Hemostasis

TL;DR: This review provides an updated overview of next-generation sequencing (NGS) and emerging methodologies and describes short-read sequencing approaches, such as sequencing by synthesis, ion semiconductor sequencing, and nanoball sequencing.

...read moreread less

Abstract: Genetic sequencing technologies are evolving at a rapid pace with major implications for research and clinical practice. In this review, the authors provide an updated overview of next-generation sequencing (NGS) and emerging methodologies. NGS has tremendously improved sequencing output while being more time and cost-efficient in comparison to Sanger sequencing. The authors describe short-read sequencing approaches, such as sequencing by synthesis, ion semiconductor sequencing, and nanoball sequencing. Third-generation long-read sequencing now promises to overcome many of the limitations of short-read sequencing, such as the ability to reliably resolve repeat sequences and large genomic rearrangements. By combining complementary methods with massively parallel DNA sequencing, a greater insight into the biological context of disease mechanisms is now possible. Emerging methodologies, such as advances in nanopore technology, in situ nucleic acid sequencing, and microscopy-based sequencing, will continue the rapid evolution of this area. These new technologies hold many potential applications for hematological disorders, with the promise of precision and personalized medical care in the future.

...read moreread less

135 citations

Journal Article•DOI•

Genotype Imputation from Large Reference Panels.

[...]

Sayantan Das¹, Gonçalo R. Abecasis¹, Brian L. Browning²•Institutions (2)

University of Michigan¹, University of Washington²

31 Aug 2018-Annual Review of Genomics and Human Genetics

TL;DR: An overview of genotype imputation is presented and the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals are described.

...read moreread less

Abstract: Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.

...read moreread less

135 citations

Journal Article•DOI•

Proteogenomic characterization of pancreatic ductal adenocarcinoma

[...]

Liwei Cao¹, Chen Huang², Daniel Cui Zhou³, Yingwei Hu¹, T. Mamie Lih¹, Sara R. Savage², Karsten Krug⁴, David J. Clark¹, Michael Schnaubelt¹, Lijun Chen¹, Felipe da Veiga Leprevost⁵, Rodrigo Vargas Eguez¹, Weiming Yang¹, Jianbo Pan¹, Bo Wen², Yongchao Dou², Wen Jiang², Yuxing Liao², Zhiao Shi², Nadezhda V. Terekhanova³, Song Cao³, Rita Jui-Hsien Lu³, Yize Li³, Ruiyang Liu³, Houxiang Zhu³, Peter Ronning³, Yige Wu³, Matthew A. Wyczalkowski³, Hariharan Easwaran¹, Ludmila Danilova¹, Arvind Singh Mer⁶, Seungyeul Yoo⁷, Joshua M. Wang, Wenke Liu, Benjamin Haibe-Kains⁸, Benjamin Haibe-Kains⁶, Mathangi Thiagarajan⁹, Scott D. Jewell¹⁰, Galen Hostetter¹⁰, Chelsea J. Newton¹⁰, Qing Kay Li¹, Michael H.A. Roehrl¹¹, David Fenyö, Pei Wang⁷, Alexey I. Nesvizhskii⁵, D. R. Mani⁴, Gilbert S. Omenn⁵, Emily S. Boja, Mehdi Mesri, Ana I. Robles, Henry Rodriguez, Oliver F. Bathe¹², Daniel W. Chan¹, Ralph H. Hruban¹, Li Ding³, Bing Zhang², Hui Zhang¹, Mitual Amin, Eunkyung An, Christina Ayad, Thomas L. Bauer, Chet Birger, Michael J. Birrer, Simina M. Boca, William Bocik, Melissa Borucki, Shuang Cai, Steven A. Carr, Sandra Cerda, Huan Chen, Steven Chen, David Chesla, Arul M. Chinnaiyan, Antonio Colaprico, Sandra Cottingham, Magdalena Derejska, Saravana M. Dhanasekaran, Marcin J. Domagalski, Brian J. Druker, Elizabeth R. Duffy, Maureen Dyer, Nathan Edwards, Matthew J. Ellis, Jennifer M. Eschbacher, Alicia Francis, Jesse Francis, Stacey Gabriel, Nikolay Gabrovski, Johanna Gardner, Gad Getz, Michael A. Gillette, Charles A. Goldthwaite, Pamela Grady, Shuai Guo, Pushpa Hariharan, Tara Hiltke, Barbara Hindenach, Katherine A. Hoadley, Jasmine Huang, Corbin D. Jones, Karen A. Ketchum, Christopher R. Kinsinger, Jennifer M. Koziak, Katarzyna Kusnierz, Tao Liu, Jiang Long, David Mallery, Sailaja Mareedu, Ronald Matteotti, Nicollette Maunganidze, Peter B. McGarvey, Parham Minoo, Oxana Paklina, Amanda G. Paulovich, Samuel H. Payne, Olga Potapova, Barbara Pruetz, Liqun Qi, Nancy Roche, Karin D. Rodland, Daniel C. Rohrer, Eric E. Schadt, Alexey Shabunin, Troy Shelton, Yvonne Shutack, Shilpi Singh, Michael Smith, Richard D. Smith, Lori J. Sokoll, James Suh, Ratna R. Thangudu, Shirley Tsang, Ki Sung Um, Dana R. Valley, Negin Vatanian, Wenyi Wang, George D. Wilson, Maciej Wiznerowicz, Zhen Zhang, Grace Zhao - Show less +136 more•Institutions (12)

Johns Hopkins University¹, Baylor College of Medicine², Washington University in St. Louis³, Massachusetts Institute of Technology⁴, University of Michigan⁵, Princess Margaret Cancer Centre⁶, Mount Sinai Hospital⁷, University of Toronto⁸, Leidos⁹, Van Andel Institute¹⁰, Memorial Sloan Kettering Cancer Center¹¹, University of Calgary¹²

16 Sep 2021-Cell

TL;DR: In this article, a comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues was conducted to understand the underlying molecular alterations that drive PDAC oncogenesis.

...read moreread less

135 citations

Journal Article•DOI•

ExpansionHunter: A sequence-graph based tool to analyze variation in short tandem repeat regions.

[...]

Egor Dolzhenko¹, Viraj Deshpande¹, Felix Schlesinger¹, Peter Krusche¹, Roman Petrovski¹, Sai Chen¹, Dorothea Emig-Agius¹, Andrew M. Gross¹, Giuseppe Narzisi, Brett Bowman¹, Konrad Scheffler¹, Joke J.F.A. van Vugt², Courtney E. French³, Alba Sanchis-Juan³, Alba Sanchis-Juan⁴, Kristina Ibáñez⁵, Arianna Tucci⁵, Bryan R. Lajoie¹, Jan H. Veldink², Lucy Raymond³, Ryan J. Taft¹, David R. Bentley¹, Michael A. Eberle¹ - Show less +19 more•Institutions (5)

Illumina¹, Utrecht University², NHS Blood and Transplant³, Cambridge University Hospitals NHS Foundation Trust⁴, Queen Mary University of London⁵

01 Nov 2019-Bioinformatics

TL;DR: A new version of Illumina's repeat genotyping software, ExpansionHunter, is introduced that uses a novel computational method to perform targeted genotypes of a broad class of such loci.

...read moreread less

Abstract: SUMMARY We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. AVAILABILITY AND IMPLEMENTATION ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

...read moreread less

135 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
…
97
98
99
100
101
102
103
…
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations