A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Radiogenomics Consortium Genome-Wide Association Study Meta-analysis of Late Toxicity after Prostate Cancer Radiotherapy

[...]

Sarah L. Kerns¹, Laura Fachal, Leila Dorling, Gillian C. Barnett², Andrea Baran¹, Derick R. Peterson¹, Michelle Hollenberg³, Ke Hao⁴, Antonio Fabio Di Narzo⁴, Mehmet Eren Ahsen⁴, Gaurav Pandey⁴, Søren M. Bentzen⁵, Michelle C. Janelsins¹, Rebecca Elliott⁶, Paul D.P. Pharoah², Neil G. Burnet⁶, David P. Dearnaley⁷, Sarah L. Gulliford⁷, Emma Hall⁷, Matthew R. Sydes⁸, Miguel E. Aguado-Barrera, Antonio Gómez-Caamaño, Ana Carballo, Paula Peleteiro, Ramón Lobato-Busto, Richard G. Stock, Nelson N. Stone, Harry Ostrer⁴, Nawaid Usmani⁹, Sandeep Singhal¹⁰, Hiroshi Tsuji, Takashi Imai, Shiro Saito¹⁰, Rosalind A. Eeles, Kim DeRuyck¹¹, Matthew Parliament⁹, Alison M. Dunning, Ana Vega¹², Barry S. Rosenstein⁴, Catharine M L West⁶ - Show less +36 more•Institutions (12)

University of Rochester Medical Center¹, Cambridge University Hospitals NHS Foundation Trust², University of Rochester³, Icahn School of Medicine at Mount Sinai⁴, University of Maryland Marlene and Stewart Greenebaum Cancer Center⁵, Manchester Academic Health Science Centre⁶, Institute of Cancer Research⁷, University College London⁸, University of Alberta⁹, Columbia University¹⁰, Ghent University Hospital¹¹, University of Santiago de Compostela¹²

01 Feb 2020-Journal of the National Cancer Institute

TL;DR: This study increases the understanding of the architecture of common genetic variants affecting radiotoxicity, points to novel radio-pathogenic mechanisms, and develops risk models for testing in clinical studies.

...read moreread less

Abstract: Background A total of 10%-20% of patients develop long-term toxicity following radiotherapy for prostate cancer. Identification of common genetic variants associated with susceptibility to radiotoxicity might improve risk prediction and inform functional mechanistic studies. Methods We conducted an individual patient data meta-analysis of six genome-wide association studies (n = 3871) in men of European ancestry who underwent radiotherapy for prostate cancer. Radiotoxicities (increased urinary frequency, decreased urinary stream, hematuria, rectal bleeding) were graded prospectively. We used grouped relative risk models to test associations with approximately 6 million genotyped or imputed variants (time to first grade 2 or higher toxicity event). Variants with two-sided Pmeta less than 5 × 10-8 were considered statistically significant. Bayesian false discovery probability provided an additional measure of confidence. Statistically significant variants were evaluated in three Japanese cohorts (n = 962). All statistical tests were two-sided. Results Meta-analysis of the European ancestry cohorts identified three genomic signals: single nucleotide polymorphism rs17055178 with rectal bleeding (Pmeta = 6.2 × 10-10), rs10969913 with decreased urinary stream (Pmeta = 2.9 × 10-10), and rs11122573 with hematuria (Pmeta = 1.8 × 10-8). Fine-scale mapping of these three regions was used to identify another independent signal (rs147121532) associated with hematuria (Pconditional = 4.7 × 10-6). Credible causal variants at these four signals lie in gene-regulatory regions, some modulating expression of nearby genes. Previously identified variants showed consistent associations (rs17599026 with increased urinary frequency, rs7720298 with decreased urinary stream, rs1801516 with overall toxicity) in new cohorts. rs10969913 and rs17599026 had similar effects in the photon-treated Japanese cohorts. Conclusions This study increases the understanding of the architecture of common genetic variants affecting radiotoxicity, points to novel radio-pathogenic mechanisms, and develops risk models for testing in clinical studies. Further multinational radiogenomics studies in larger cohorts are worthwhile.

...read moreread less

71 citations

Journal Article•DOI•

Evolutionary history of Tibetans inferred from whole-genome sequencing.

[...]

Hao Hu¹, Nayia Petousi², Gustavo Glusman³, Yao Yu¹, Ryan James Bohlender⁴, Tsewang Tashi⁴, Jonathan M. Downie⁴, Jared C. Roach³, Amy M. Cole⁵, Felipe R. Lorenzo⁴, Alan R. Rogers⁴, Mary E. Brunkow³, Gianpiero L. Cavalleri⁵, Leroy Hood³, Sama M. Alpatty⁶, Josef T. Prchal⁴, Lynn B. Jorde⁴, Peter A. Robbins², Tatum S. Simonson⁶, Chad D. Huff¹ - Show less +16 more•Institutions (6)

University of Texas MD Anderson Cancer Center¹, University of Oxford², Institute for Systems Biology³, University of Utah⁴, Royal College of Surgeons in Ireland⁵, University of California, San Diego⁶

27 Apr 2017-PLOS Genetics

TL;DR: A detailed history of demography and natural selection of this population of Tibetans is inferred and evidence of population structure between the ancestral Han and Tibetan subpopulations as early as 44 to 58 thousand years ago, but with high rates of gene flow until approximately 9 thousands years ago.

...read moreread less

Abstract: The indigenous people of the Tibetan Plateau have been the subject of much recent interest because of their unique genetic adaptations to high altitude. Recent studies have demonstrated that the Tibetan EPAS1 haplotype is involved in high altitude-adaptation and originated in an archaic Denisovan-related population. We sequenced the whole-genomes of 27 Tibetans and conducted analyses to infer a detailed history of demography and natural selection of this population. We detected evidence of population structure between the ancestral Han and Tibetan subpopulations as early as 44 to 58 thousand years ago, but with high rates of gene flow until approximately 9 thousand years ago. The CMS test ranked EPAS1 and EGLN1 as the top two positive selection candidates, and in addition identified PTGIS, VDR, and KCTD12 as new candidate genes. The advantageous Tibetan EPAS1 haplotype shared many variants with the Denisovan genome, with an ancient gene tree divergence between the Tibetan and Denisovan haplotypes of about 1 million years ago. With the exception of EPAS1, we observed no evidence of positive selection on Denisovan-like haplotypes.

...read moreread less

71 citations

Cites methods from "A global reference for human geneti..."

...To identify additional variants enriched for higher genotyping error rates, we compared 62 genomes that had been sequenced in both CG public genome data[33,68] and 1KG Project[18] Phase I data....
[...]

Journal Article•DOI•

Neurobiological functions of transcriptional enhancers

[...]

Alexander Nord¹, Anne E. West²•Institutions (2)

University of California, Davis¹, Duke University²

01 Jan 2020-Nature Neuroscience

TL;DR: A primer on transcriptional enhancers in the CNS is offered, using examples of enhancer regulation in the maturing brain and the role of non-coding variation in brain disorders to explain the concepts emerging from functional neurogenomics.

...read moreread less

Abstract: Transcriptional enhancers are regulatory DNA elements that underlie the specificity and dynamic patterns of gene expression. Over the past decade, large-scale functional genomics projects have driven transformative progress in our understanding of enhancers. These data have relevance for identifying mechanisms of gene regulation in the CNS, elucidating the function of non-coding regulatory sequences in neurobiology and linking sequence variation within enhancers to genetic risk for neurological and psychiatric disorders. However, the sheer volume and complexity of genomic data presents a challenge to interpreting enhancer function in normal and pathogenic neurobiological processes. Here, to advance the application of genome-scale enhancer data, we offer a primer on current models of enhancer function in the CNS, we review how enhancers regulate gene expression across the neuronal lifespan, and we suggest how emerging findings regarding the role of non-coding sequence variation offer opportunities for understanding brain disorders and developing new technologies for neuroscience.

...read moreread less

71 citations

Posted Content•DOI•

Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations

[...]

Ying Wang¹, Jing Guo¹, Guiyan Ni¹, Jian Yang¹, Jian Yang², Peter M. Visscher¹, Loic Yengo¹ - Show less +3 more•Institutions (2)

University of Queensland¹, Wenzhou Medical College²

15 Jan 2020-bioRxiv

TL;DR: A new theory to predict the relative accuracy (RA) of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability is developed.

...read moreread less

Abstract: Polygenic scores (PGS) have been widely used to predict complex traits and risk of diseases using variants identified from genome-wide association studies (GWASs). To date, most GWASs have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European populations. Here, we develop a new theory to predict the relative accuracy (RA, relative to the accuracy in populations of the same ancestry as the discovery population) of PGS across ancestries. We used simulations and real data from the UK Biobank to evaluate our results. We found across various simulation scenarios that the RA of PGS based on trait-associated SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of SNP effect sizes and heritability. Altogether, we find that LD and MAF differences between ancestries explain alone up to ~70% of the loss of RA using European-based PGS in African ancestry for traits like body mass index and height. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWASs are mostly shared across continents.

...read moreread less

71 citations

Journal Article•DOI•

Recent developments in genetic/genomic medicine.

[...]

Rachel Horton¹, Anneke Lucassen¹•Institutions (1)

University of Southampton¹

15 Mar 2019-Clinical Science

TL;DR: The ways in which genetic medicine is developing in light of technological advances are outlined, including the landscape of treatment options for genetic conditions is shifting, which has evolving implications for clinical discussions around previously untreatable disorders.

...read moreread less

Abstract: Advances in genetic technology are having a major impact in the clinic, and mean that many perceptions of the role and scope of genetic testing are having to change. Genomic testing brings with it a greater opportunity for diagnosis, or predictions of future diagnoses, but also an increased chance of uncertain or unexpected findings, many of which may have impacts for multiple members of a person’s family. In the past, genetic testing was rarely able to provide rapid results, but the increasing speed and availability of genomic testing is changing this, meaning that genomic information is increasingly influencing decisions around patient care in the acute inpatient setting. The landscape of treatment options for genetic conditions is shifting, which has evolving implications for clinical discussions around previously untreatable disorders. Furthermore, the point of access to testing is changing with increasing provision direct to the consumer outside the formal healthcare setting. This review outlines the ways in which genetic medicine is developing in light of technological advances.

...read moreread less

71 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
…
194
195
196
197
198
199
200
…

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations