A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel², Eric Vallabh Minikel¹, Kaitlin E. Samocha, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria³, Anne H. O’Donnell-Luria¹, James S. Ware, Andrew J. Hill¹, Andrew J. Hill², Andrew J. Hill⁴, Beryl B. Cummings², Beryl B. Cummings¹, Taru Tukiainen¹, Taru Tukiainen², Daniel P. Birnbaum², Jack A. Kosmicki, Laramie E. Duncan², Laramie E. Duncan¹, Karol Estrada¹, Karol Estrada², Fengmei Zhao¹, Fengmei Zhao², James Zou², Emma Pierce-Hoffman², Emma Pierce-Hoffman¹, Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo², Ron Do, Jason Flannick², Jason Flannick¹, Menachem Fromer, Laura D. Gauthier², Jackie Goldstein², Jackie Goldstein¹, Namrata Gupta², Daniel P. Howrigan², Daniel P. Howrigan¹, Adam Kiezun², Mitja I. Kurki², Mitja I. Kurki¹, Ami Levy Moonshine², Pradeep Natarajan, Lorena Orozco, Gina M. Peloso², Gina M. Peloso¹, Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Samuel A. Rose², Douglas M. Ruderfer⁸, Khalid Shakir², Peter D. Stenson⁶, Christine Stevens², Brett Thomas¹, Brett Thomas², Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won⁹, Dongmei Yu, David Altshuler², David Altshuler¹⁰, Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly², Roberto Elosua, Jose C. Florez¹, Jose C. Florez², Stacey Gabriel², Gad Getz¹, Gad Getz², Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll², Steven A. McCarroll¹, Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale², Benjamin M. Neale¹, Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan²¹, Patrick F. Sullivan¹⁴, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁶, Hugh Watkins¹⁷, James G. Wilson²⁴, Mark J. Daly¹, Mark J. Daly², Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +103 more•Institutions (24)

Harvard University¹, Broad Institute², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, University of Oxford¹⁶, Wellcome Trust Centre for Human Genetics¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

Journal Article•DOI•

The UK Biobank resource with deep phenotyping and genomic data

[...]

Clare Bycroft¹, Colin Freeman¹, Desislava Petkova², Desislava Petkova¹, Gavin Band¹, Lloyd T. Elliott¹, Kevin Sharp¹, Allan Motyer³, Damjan Vukcevic³, Olivier Delaneau⁴, Olivier Delaneau⁵, Jared O'Connell⁶, Adrian Cortes⁷, Adrian Cortes¹, Samantha Welsh, Alan Young¹, Mark Effingham, Gil McVean¹, Stephen Leslie³, Naomi E. Allen¹, Peter Donnelly¹, Jonathan Marchini¹ - Show less +18 more•Institutions (7)

University of Oxford¹, Procter & Gamble², University of Melbourne³, University of Geneva⁴, Swiss Institute of Bioinformatics⁵, Illumina⁶, John Radcliffe Hospital⁷

11 Oct 2018-Nature

TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

...read moreread less

Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

...read moreread less

4,489 citations

Journal Article•DOI•

Genetic effects on gene expression across human tissues.

[...]

Enhancing GTEx (eGTEx) groups¹, Nih Common Fund², Nhgri, Biospecimen Core Resource—VARI, Elsi study, Genome Browser Data Integration Visualization—EBI, Lead analysts, Alexis Battle³, Christopher D. Brown⁴, Barbara E. Engelhardt¹, Stephen B. Montgomery² - Show less +7 more•Institutions (4)

Princeton University¹, Stanford University², Johns Hopkins University³, University of Pennsylvania⁴

12 Oct 2017-Nature

TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

...read moreread less

Abstract: Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

...read moreread less

3,289 citations

Journal Article•DOI•

Coming of age: ten years of next-generation sequencing technologies

[...]

Sara Goodwin¹, John Douglas Mcpherson², W. Richard McCombie¹•Institutions (2)

Cold Spring Harbor Laboratory¹, University of California, Davis²

01 Jun 2016-Nature Reviews Genetics

TL;DR: These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

...read moreread less

Abstract: Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

...read moreread less

3,096 citations

Journal Article•DOI•

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

[...]

Annalisa Buniello¹, Jacqueline A. L. MacArthur¹, Maria Cerezo¹, Laura W. Harris¹, James D. Hayhurst¹, Cinzia Malangone¹, Aoife McMahon¹, Joannella Morales¹, Edward Mountjoy², Edward Mountjoy³, Elliot Sollis¹, Daniel Suveges¹, Olga Vrousgou¹, Patricia L. Whetzel¹, M. Ridwan Amode¹, Jose A. Guillen¹, Harpreet Singh Riat¹, Stephen J. Trevanion¹, Peggy Hall⁴, Heather Junkins⁴, Paul Flicek¹, Tony Burdett¹, Lucia A. Hindorff⁴, Fiona Cunningham¹, Helen Parkinson¹ - Show less +21 more•Institutions (4)

European Bioinformatics Institute¹, University of Oxford², Wellcome Trust Sanger Institute³, National Institutes of Health⁴

08 Jan 2019-Nucleic Acids Research

TL;DR: Improved data access is improved with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database.

...read moreread less

Abstract: The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

...read moreread less

2,878 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Posted Content•

Haplotype-based variant detection from short-read sequencing

[...]

Erik Garrison, Gabor T. Marth

17 Jul 2012-arXiv: Genomics

TL;DR: A Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number is developed and its implementation in a haplotype-based variant detector, FreeBayes is described.

...read moreread less

Abstract: The direct detection of haplotypes from short-read DNA sequencing data requires changes to existing small-variant detection methods. Here, we develop a Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number. We then describe our implementation of this framework in a haplotype-based variant detector, FreeBayes.

...read moreread less

3,460 citations

Journal Article•DOI•

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

[...]

Sharon R. Browning¹, Brian L. Browning¹•Institutions (1)

University of Auckland¹

01 Nov 2007-American Journal of Human Genetics

TL;DR: This work presents a new method and software for inference of haplotypes phase and missing data that can accurately phase data from whole-genome association studies, and presents the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals.

...read moreread less

Abstract: Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.

...read moreread less

2,849 citations

Journal Article•DOI•

An integrated map of structural variation in 2,504 human genomes

[...]

Peter H. Sudmant¹, Tobias Rausch, Eugene J. Gardner², Robert E. Handsaker³, Robert E. Handsaker⁴, Alexej Abyzov⁵, John Huddleston¹, Yan Zhang⁶, Kai Ye⁷, Goo Jun⁸, Goo Jun⁹, Markus His Yang Fritz, Miriam K. Konkel¹⁰, Ankit Malhotra, Adrian M. Stütz, Xinghua Shi¹¹, Francesco Paolo Casale¹², Jieming Chen⁶, Fereydoun Hormozdiari¹, Gargi Dayama⁹, Ken Chen¹³, Maika Malig¹, Mark Chaisson¹, Klaudia Walter¹², Sascha Meiers, Seva Kashin⁴, Seva Kashin³, Erik Garrison¹⁴, Adam Auton¹⁵, Hugo Y. K. Lam, Xinmeng Jasmine Mu⁶, Xinmeng Jasmine Mu⁴, Can Alkan¹⁶, Danny Antaki¹⁷, Taejeong Bae⁵, Eliza Cerveira, Peter S. Chines¹⁸, Zechen Chong¹³, Laura Clarke¹², Elif Dal¹⁶, Li Ding⁷, S. Emery⁹, Xian Fan¹³, Madhusudan Gujral¹⁷, Fatma Kahveci¹⁶, Jeffrey M. Kidd⁹, Yu Kong¹⁵, Eric-Wubbo Lameijer¹⁹, Shane A. McCarthy¹², Paul Flicek¹², Richard A. Gibbs²⁰, Gabor T. Marth¹⁴, Christopher E. Mason²¹, Androniki Menelaou²², Androniki Menelaou²³, Donna M. Muzny²⁴, Bradley J. Nelson¹, Amina Noor¹⁷, Nicholas F. Parrish²⁵, Matthew Pendleton²⁴, Andrew Quitadamo¹¹, Benjamin Raeder, Eric E. Schadt²⁴, Mallory Romanovitch, Andreas Schlattl, Robert Sebra²⁴, Andrey A. Shabalin²⁶, Andreas Untergasser²⁷, Jerilyn A. Walker¹⁰, Min Wang²⁰, Fuli Yu²⁰, Chengsheng Zhang, Jing Zhang⁶, Xiangqun Zheng-Bradley¹², Wanding Zhou¹³, Thomas Zichner, Jonathan Sebat¹⁷, Mark A. Batzer¹⁰, Steven A. McCarroll⁴, Steven A. McCarroll³, Ryan E. Mills⁹, Mark Gerstein⁶, Ali Bashir²⁴, Oliver Stegle¹², Scott E. Devine², Charles Lee²⁸, Evan E. Eichler¹, Jan O. Korbel¹² - Show less +84 more•Institutions (28)

University of Washington¹, University of Maryland, Baltimore², Harvard University³, Broad Institute⁴, Mayo Clinic⁵, Yale University⁶, Washington University in St. Louis⁷, University of Texas Health Science Center at Houston⁸, University of Michigan⁹, Louisiana State University¹⁰, University of North Carolina at Charlotte¹¹, Wellcome Trust¹², University of Texas MD Anderson Cancer Center¹³, Boston College¹⁴, Yeshiva University¹⁵, Bilkent University¹⁶, University of California, San Diego¹⁷, National Institutes of Health¹⁸, Leiden University¹⁹, Baylor College of Medicine²⁰, Cornell University²¹, University of Oxford²², Utrecht University²³, Icahn School of Medicine at Mount Sinai²⁴, Kyoto University²⁵, Virginia Commonwealth University²⁶, Heidelberg University²⁷, Ewha Womans University²⁸

01 Oct 2015-Nature

TL;DR: In this paper, the authors describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which are constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations.

...read moreread less

Abstract: Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

...read moreread less

1,971 citations

Journal Article•DOI•

Inference of human population history from individual whole-genome sequences

[...]

Heng Li¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

13 Jul 2011-Nature

TL;DR: A more detailed history of human population sizes between approximately ten thousand and a million years ago is presented, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male, a Korean male, three European individuals, and two Yoruba males.

...read moreread less

Abstract: The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.

...read moreread less

1,943 citations

Journal Article•DOI•

Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads

[...]

Kai Ye¹, Marcel H. Schulz², Quan Long³, Rolf Apweiler³, Zemin Ning³ - Show less +1 more•Institutions (3)

European Bioinformatics Institute¹, Max Planck Society², Leiden University Medical Center³

01 Nov 2009-Bioinformatics

TL;DR: Pindel, a pattern growth approach, is presented, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads and to demonstrate the efficiency of the computer program and accuracy of the results.

...read moreread less

Abstract: Motivation There is a strong demand in the genomic community to develop effective algorithms to reliably identify genomic variants. Indel detection using next-gen data is difficult and identification of long structural variations is extremely challenging. Results We present Pindel, a pattern growth approach, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads. We use both simulated reads and real data to demonstrate the efficiency of the computer program and accuracy of the results. Availability The binary code and a short user manual can be freely downloaded from http://www.ebi.ac.uk/ approximately kye/pindel/. Contact k.ye@lumc.nl; zn1@sanger.ac.uk.

...read moreread less

1,930 citations