A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis

[...]

VA Million Veteran Program¹•Institutions (1)

University of Massachusetts Lowell¹

01 Jul 2020-Nature Genetics

TL;DR: The genetic etiology of T2D-related vascular outcomes in the MVP and observed statistical SNP-T2D interactions at 13 variants, including coronary heart disease (CHD), CKD, PAD and neuropathy may help to identify potential therapeutic targets for T1D and genomic pathways that link T2d to vascular outcomes.

...read moreread less

Abstract: We investigated type 2 diabetes (T2D) genetic susceptibility via multi-ancestry meta-analysis of 228,499 cases and 1,178,783 controls in the Million Veteran Program (MVP), DIAMANTE, Biobank Japan and other studies. We report 568 associations, including 286 autosomal, 7 X-chromosomal and 25 identified in ancestry-specific analyses that were previously unreported. Transcriptome-wide association analysis detected 3,568 T2D associations with genetically predicted gene expression in 687 novel genes; of these, 54 are known to interact with FDA-approved drugs. A polygenic risk score (PRS) was strongly associated with increased risk of T2D-related retinopathy and modestly associated with chronic kidney disease (CKD), peripheral artery disease (PAD) and neuropathy. We investigated the genetic etiology of T2D-related vascular outcomes in the MVP and observed statistical SNP-T2D interactions at 13 variants, including coronary heart disease (CHD), CKD, PAD and neuropathy. These findings may help to identify potential therapeutic targets for T2D and genomic pathways that link T2D to vascular outcomes.

...read moreread less

376 citations

Journal Article•DOI•

Genome-wide association study results for educational attainment aid in identifying genetic heterogeneity of schizophrenia

[...]

Vikas Bansal¹, Marina Mitjans¹, Casper A.P. Burik², Casper A.P. Burik³, Richard Karlsson Linnér³, Richard Karlsson Linnér², Aysu Okbay², Cornelius A. Rietveld³, Martin Begemann¹, Stefan Bonn⁴, Stefan Bonn⁵, Stephan Ripke⁶, Stephan Ripke⁷, Stephan Ripke⁸, Ronald de Vlaming², Michel G. Nivard², Hannelore Ehrenreich¹, Philipp Koellinger³, Philipp Koellinger² - Show less +15 more•Institutions (8)

Max Planck Society¹, VU University Amsterdam², Erasmus University Rotterdam³, German Center for Neurodegenerative Diseases⁴, University of Hamburg⁵, Harvard University⁶, Broad Institute⁷, Charité⁸

06 Aug 2018-Nature Communications

TL;DR: Strong genetic dependence between EA and SZ is found that cannot be explained by chance, linkage disequilibrium, or assortative mating, and multiple genes have pleiotropic effects on both without a systematic pattern of sign concordance.

...read moreread less

Abstract: Higher educational attainment (EA) is negatively associated with schizophrenia (SZ). However, recent studies found a positive genetic correlation between EA and SZ. We investigate possible causes of this counterintuitive finding using genome-wide association study results for EA and SZ (N = 443,581) and a replication cohort (1169 controls; 1067 cases) with deeply phenotyped SZ patients. We find strong genetic dependence between EA and SZ that cannot be explained by chance, linkage disequilibrium, or assortative mating. Instead, several genes seem to have pleiotropic effects on EA and SZ, but without a clear pattern of sign concordance. Using EA as a proxy phenotype, we isolate FOXO6 and SLITRK1 as novel candidate genes for SZ. Our results reveal that current SZ diagnoses aggregate over at least two disease subtypes: one part resembles high intelligence and bipolar disorder (BIP), while the other part is a cognitive disorder that is independent of BIP.

...read moreread less

366 citations

Journal Article•DOI•

Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection.

[...]

Steven Gazal¹, Hilary K. Finucane¹, Hilary K. Finucane², Hilary K. Finucane³, Nicholas A. Furlotte, Po-Ru Loh¹, Po-Ru Loh², Pier Francesco Palamara², Pier Francesco Palamara¹, Xuanyao Liu², Xuanyao Liu¹, Armin P. Schoech¹, Armin P. Schoech², Brendan Bulik-Sullivan², Benjamin M. Neale¹, Benjamin M. Neale², Alexander Gusev¹, Alexander Gusev², Alkes L. Price², Alkes L. Price¹ - Show less +16 more•Institutions (3)

Harvard University¹, Broad Institute², Massachusetts Institute of Technology³

11 Sep 2017-Nature Genetics

TL;DR: It is determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs).

...read moreread less

Abstract: Recent work has hinted at the linkage disequilibrium (LD)-dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability. Here we analyzed summary statistics from 56 complex traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs). The remaining signal is largely driven by our finding that more recent common variants tend to have lower LLD and to explain more heritability (P = 2.38 × 10-104); the youngest 20% of common SNPs explain 3.9 times more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that they jointly predict deleterious effects.

...read moreread less

365 citations

Journal Article•DOI•

Paediatric genomics: diagnosing rare disease in children.

[...]

Caroline F. Wright¹, David R. FitzPatrick², Helen V. Firth³•Institutions (3)

Royal Devon and Exeter Hospital¹, University of Edinburgh², University of Cambridge³

19 Feb 2018-Nature Reviews Genetics

TL;DR: For affected families, a better understanding of the genetic basis of rare disease translates to more accurate prognosis, management, surveillance and genetic advice; stimulates research into new therapies; and enables provision of better support.

...read moreread less

Abstract: The majority of rare diseases affect children, most of whom have an underlying genetic cause for their condition However, making a molecular diagnosis with current technologies and knowledge is often still a challenge Paediatric genomics is an immature but rapidly evolving field that tackles this issue by incorporating next-generation sequencing technologies, especially whole-exome sequencing and whole-genome sequencing, into research and clinical workflows This complex multidisciplinary approach, coupled with the increasing availability of population genetic variation data, has already resulted in an increased discovery rate of causative genes and in improved diagnosis of rare paediatric disease Importantly, for affected families, a better understanding of the genetic basis of rare disease translates to more accurate prognosis, management, surveillance and genetic advice; stimulates research into new therapies; and enables provision of better support

...read moreread less

364 citations

Journal Article•DOI•

Novel genes associated with amyotrophic lateral sclerosis: diagnostic and clinical implications.

[...]

Ruth Chia¹, Adriano Chiò², Bryan J. Traynor³•Institutions (3)

National Institutes of Health¹, University of Turin², Johns Hopkins University³

01 Jan 2018-Lancet Neurology

TL;DR: The identification of these seven novel genes has been important in unravelling the molecular mechanisms underlying ALS, and therapeutics targeting these pathways could be useful for a broad group of patients stratified by genotype.

...read moreread less

Abstract: Summary Background The disease course of amyotrophic lateral sclerosis (ALS) is rapid and, because its pathophysiology is unclear, few effective treatments are available. Genetic research aims to understand the underlying mechanisms of ALS and identify potential therapeutic targets. The first gene associated with ALS was SOD1 , identified in 1993 and, by early 2014, more than 20 genes had been identified as causative of, or highly associated with, ALS. These genetic discoveries have identified key disease pathways that are therapeutically testable and could potentially lead to the development of better treatments for people with ALS. Recent developments Since 2014, seven additional genes have been associated with ALS ( MATR3, CHCHD10, TBK1, TUBA4A, NEK1, C21orf2 , and CCNF ), all of which were identified by genome-wide association studies, whole genome studies, or exome sequencing technologies. Each of the seven novel genes code for proteins associated with one or more molecular pathways known to be involved in ALS. These pathways include dysfunction in global protein homoeostasis resulting from abnormal protein aggregation or a defect in the protein clearance pathway, mitochondrial dysfunction, altered RNA metabolism, impaired cytoskeletal integrity, altered axonal transport dynamics, and DNA damage accumulation due to defective DNA repair. Because these novel genes share common disease pathways with other genes implicated in ALS, therapeutics targeting these pathways could be useful for a broad group of patients stratified by genotype. However, the effects of these novel genes have not yet been investigated in animal models, which will be a key step to translating these findings into clinical practice. Where next? The identification of these seven novel genes has been important in unravelling the molecular mechanisms underlying ALS. However, our understanding of what causes ALS is not complete, and further genetic research will provide additional detail about its causes. Increased genetic knowledge will also identify potential therapeutic targets and could lead to the development of individualised medicine for patients with ALS. These developments will have a direct effect on clinical practice when genome sequencing becomes a routine and integral part of disease diagnosis and management.

...read moreread less

364 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
…
26
27
28
29
30
31
32
…
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations