A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Characterizing the Major Structural Variant Alleles of the Human Genome

[...]

Peter A. Audano¹, Arvis Sulovari¹, Tina A. Graves-Lindsay², Stuart Cantsilieris¹, Melanie Sorensen¹, AnneMarie E. Welch¹, Max L. Dougherty¹, Bradley J. Nelson¹, Ankeeta Shah³, Susan K. Dutcher², Wesley C. Warren², Vincent Magrini⁴, Vincent Magrini⁵, Sean McGrath⁵, Yang I. Li³, Richard K. Wilson⁵, Richard K. Wilson⁴, Evan E. Eichler¹ - Show less +14 more•Institutions (5)

University of Washington¹, Washington University in St. Louis², University of Chicago³, Ohio State University⁴, Nationwide Children's Hospital⁵

24 Jan 2019-Cell

TL;DR: A ninefold SV bias toward the last 5 Mbp of human chromosomes is reported with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome.

...read moreread less

339 citations

Journal Article•DOI•

Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis

[...]

Ian T. Fiddes¹, Gerrald A Lodewijk², Meghan Mooring¹, Colleen M. Bosworth¹, Adam D. Ewing¹, Gary L. Mantalas¹, Adam M. Novak¹, Anouk van den Bout², Alex Bishara³, Jimi L. Rosenkrantz⁴, Jimi L. Rosenkrantz¹, Ryan Lorig-Roach¹, Andrew R. Field¹, Maximilian Haeussler¹, Lotte Russo², Aparna Bhaduri⁵, Tomasz J. Nowakowski⁵, Alex A. Pollen⁵, Max L. Dougherty⁶, Xander Nuttle⁷, Xander Nuttle⁸, Marie-Claude Addor, Simon Zwolinski, Sol Katzman¹, Arnold R. Kriegstein⁵, Evan E. Eichler⁶, Sofie R. Salama⁴, Sofie R. Salama¹, Frank M. J. Jacobs², Frank M. J. Jacobs¹, David Haussler⁴, David Haussler¹ - Show less +28 more•Institutions (8)

University of California, Santa Cruz¹, University of Amsterdam², Stanford University³, Howard Hughes Medical Institute⁴, University of California, San Francisco⁵, University of Washington⁶, Harvard University⁷, Broad Institute⁸

31 May 2018-Cell

TL;DR: The emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, accompanied by loss of genomic stability at the 1q21.1 locus and resulting recurrent neurodevelopmental disorders.

...read moreread less

334 citations

Journal Article•DOI•

Genetic identification of brain cell types underlying schizophrenia

[...]

Nathan G. Skene¹, Nathan G. Skene², Julien Bryois², Trygve E. Bakken³, Gerome Breen⁴, Gerome Breen⁵, James J. Crowley⁶, Helena Gaspar⁴, Helena Gaspar⁵, Paola Giusti-Rodríguez⁶, Rebecca D. Hodge³, Jeremy A. Miller³, Ana B. Muñoz-Manchado², Michael Conlon O'Donovan⁷, Michael J. Owen⁷, Antonio F. Pardiñas⁷, Jesper Ryge⁸, James T.R. Walters⁷, Sten Linnarsson², Ed S. Lein³, Patrick F. Sullivan², Patrick F. Sullivan⁶, Jens Hjerling-Leffler² - Show less +19 more•Institutions (8)

UCL Institute of Neurology¹, Karolinska Institutet², Allen Institute for Brain Science³, King's College London⁴, National Institute for Health Research⁵, University of North Carolina at Chapel Hill⁶, Cardiff University⁷, École Polytechnique Fédérale de Lausanne⁸

21 May 2018-Nature Genetics

TL;DR: It is found that the common-variant genomic results consistently mapped to pyramidal cells, medium spiny neurons (MSNs) and certain interneurons, but far less consistently to embryonic, progenitor or glial cells.

...read moreread less

Abstract: With few exceptions, the marked advances in knowledge about the genetic basis of schizophrenia have not converged on findings that can be confidently used for precise experimental modeling. By applying knowledge of the cellular taxonomy of the brain from single-cell RNA sequencing, we evaluated whether the genomic loci implicated in schizophrenia map onto specific brain cell types. We found that the common-variant genomic results consistently mapped to pyramidal cells, medium spiny neurons (MSNs) and certain interneurons, but far less consistently to embryonic, progenitor or glial cells. These enrichments were due to sets of genes that were specifically expressed in each of these cell types. We also found that many of the diverse gene sets previously associated with schizophrenia (genes involved in synaptic function, those encoding mRNAs that interact with FMRP, antipsychotic targets, etc.) generally implicated the same brain cell types. Our results suggest a parsimonious explanation: the common-variant genetic results for schizophrenia point at a limited set of neurons, and the gene sets point to the same cells. The genetic risk associated with MSNs did not overlap with that of glutamatergic pyramidal cells and interneurons, suggesting that different cell types have biologically distinct roles in schizophrenia.

...read moreread less

331 citations

Journal Article•DOI•

Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis.

[...]

Miriam S. Udler, Jaegil Kim¹, Marcin von Grotthuss¹, Sílvia Bonàs-Guarch², Joanne B. Cole³, Joanne B. Cole¹, Joshua Chiou⁴, Michael Boehnke⁵, Markku Laakso⁶, Markku Laakso⁷, Gil Atzmon⁸, Benjamin Glaser, Josep M. Mercader⁴, Kyle J. Gaulton⁹, Kyle J. Gaulton¹, Jason Flannick¹, Gad Getz, Jose C. Florez - Show less +14 more•Institutions (9)

Broad Institute¹, Barcelona Supercomputing Center², Harvard University³, University of California, San Diego⁴, University of Eastern Finland⁵, University of Haifa⁶, Albert Einstein College of Medicine⁷, Hebrew University of Jerusalem⁸, Boston Children's Hospital⁹

21 Sep 2018-PLOS Medicine

TL;DR: The approach identifies salient T2D genetically anchored and physiologically informed pathways, and supports the use of genetics to deconstruct T1D heterogeneity.

...read moreread less

Abstract: Background Type 2 diabetes (T2D) is a heterogeneous disease for which (1) disease-causing pathways are incompletely understood and (2) subclassification may improve patient management Unlike other biomarkers, germline genetic markers do not change with disease progression or treatment In this paper, we test whether a germline genetic approach informed by physiology can be used to deconstruct T2D heterogeneity First, we aimed to categorize genetic loci into groups representing likely disease mechanistic pathways Second, we asked whether the novel clusters of genetic loci we identified have any broad clinical consequence, as assessed in four separate subsets of individuals with T2D Methods and findings In an effort to identify mechanistic pathways driven by established T2D genetic loci, we applied Bayesian nonnegative matrix factorization (bNMF) clustering to genome-wide association study (GWAS) results for 94 independent T2D genetic variants and 47 diabetes-related traits We identified five robust clusters of T2D loci and traits, each with distinct tissue-specific enhancer enrichment based on analysis of epigenomic data from 28 cell types Two clusters contained variant-trait associations indicative of reduced beta cell function, differing from each other by high versus low proinsulin levels The three other clusters displayed features of insulin resistance: obesity mediated (high body mass index [BMI] and waist circumference [WC]), "lipodystrophy-like" fat distribution (low BMI, adiponectin, and high-density lipoprotein [HDL] cholesterol, and high triglycerides), and disrupted liver lipid metabolism (low triglycerides) Increased cluster genetic risk scores were associated with distinct clinical outcomes, including increased blood pressure, coronary artery disease (CAD), and stroke We evaluated the potential for clinical impact of these clusters in four studies containing individuals with T2D (Metabolic Syndrome in Men Study [METSIM], N = 487; Ashkenazi, N = 509; Partners Biobank, N = 2,065; UK Biobank [UKBB], N = 14,813) Individuals with T2D in the top genetic risk score decile for each cluster reproducibly exhibited the predicted cluster-associated phenotypes, with approximately 30% of all individuals assigned to just one cluster top decile Limitations of this study include that the genetic variants used in the cluster analysis were restricted to those associated with T2D in populations of European ancestry Conclusion Our approach identifies salient T2D genetically anchored and physiologically informed pathways, and supports the use of genetics to deconstruct T2D heterogeneity Classification of patients by these genetic pathways may offer a step toward genetically informed T2D patient management

...read moreread less

330 citations

Journal Article•DOI•

Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure.

[...]

Sonia Shah¹, Albert Henry², Carolina Roselli³, Honghuang Lin⁴ +164 more•Institutions (58)

09 Jan 2020-Nature Communications

TL;DR: Mendelian randomisation analysis supports causal roles for several HF risk factors, and demonstrates CAD-independent effects for atrial fibrillation, body mass index, and hypertension.

...read moreread less

Abstract: Heart failure (HF) is a leading cause of morbidity and mortality worldwide. A small proportion of HF cases are attributable to monogenic cardiomyopathies and existing genome-wide association studies (GWAS) have yielded only limited insights, leaving the observed heritability of HF largely unexplained. We report results from a GWAS meta-analysis of HF comprising 47,309 cases and 930,014 controls. Twelve independent variants at 11 genomic loci are associated with HF, all of which demonstrate one or more associations with coronary artery disease (CAD), atrial fibrillation, or reduced left ventricular function, suggesting shared genetic aetiology. Functional analysis of non-CAD-associated loci implicate genes involved in cardiac development (MYOZ1, SYNPO2L), protein homoeostasis (BAG3), and cellular senescence (CDKN1A). Mendelian randomisation analysis supports causal roles for several HF risk factors, and demonstrates CAD-independent effects for atrial fibrillation, body mass index, and hypertension. These findings extend our knowledge of the pathways underlying HF and may inform new therapeutic strategies.

...read moreread less

326 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
…
30
31
32
33
34
35
36
…
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations