A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Genome-wide profiling of heritable and de novo STR variations

[...]

Thomas Willems, Dina Zielinski, Jie Yuan¹, Assaf Gordon, Melissa Gymrek², Yaniv Erlich¹ - Show less +2 more•Institutions (2)

Columbia University¹, Harvard University²

01 Jun 2017-Nature Methods

TL;DR: HipSTR, a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data, is described and a genome-wide analysis and validation of de novo STR mutations are reported.

...read moreread less

Abstract: Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, it has proven problematic to genotype STRs from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data, and we report a genome-wide analysis and validation of de novo STR mutations. HipSTR is freely available at https://hipstr-tool.github.io/HipSTR.

...read moreread less

208 citations

Journal Article•DOI•

Harnessing genomic information for livestock improvement

[...]

Michel Georges¹, Carole Charlier¹, Ben J. Hayes²•Institutions (2)

University of Liège¹, University of Queensland²

01 Mar 2019-Nature Reviews Genetics

TL;DR: Genomic information of increasing complexity (including genomic, epigenomic, transcriptomic and microbiome data), combined with technological advances for its cost-effective collection and use, will make a major contribution to tackling the looming food crisis.

...read moreread less

Abstract: The world demand for animal-based food products is anticipated to increase by 70% by 2050. Meeting this demand in a way that has a minimal impact on the environment will require the implementation of advanced technologies, and methods to improve the genetic quality of livestock are expected to play a large part. Over the past 10 years, genomic selection has been introduced in several major livestock species and has more than doubled genetic progress in some. However, additional improvements are required. Genomic information of increasing complexity (including genomic, epigenomic, transcriptomic and microbiome data), combined with technological advances for its cost-effective collection and use, will make a major contribution.

...read moreread less

208 citations

Journal Article•DOI•

Genome-wide association meta-analysis highlights light-induced signaling as a driver for refractive error

[...]

Cream¹, UK Biobank Eye¹•Institutions (1)

Erasmus University Rotterdam¹

01 Jun 2018-Nature Genetics

TL;DR: The notion that refractive errors are caused by a light-dependent retina-to-sclera signaling cascade is supported and potential pathobiological molecular drivers are delineated.

...read moreread less

Abstract: Refractive errors, including myopia, are the most frequent eye disorders worldwide and an increasingly common cause of blindness. This genome-wide association meta-analysis in 160,420 participants and replication in 95,505 participants increased the number of established independent signals from 37 to 161 and showed high genetic correlation between Europeans and Asians (>0.78). Expression experiments and comprehensive in silico analyses identified retinal cell physiology and light processing as prominent mechanisms, and also identified functional contributions to refractive-error development in all cell types of the neurosensory retina, retinal pigment epithelium, vascular endothelium and extracellular matrix. Newly identified genes implicate novel mechanisms such as rod-and-cone bipolar synaptic neurotransmission, anterior-segment morphology and angiogenesis. Thirty-one loci resided in or near regions transcribing small RNAs, thus suggesting a role for post-transcriptional regulation. Our results support the notion that refractive errors are caused by a light-dependent retina-to-sclera signaling cascade and delineate potential pathobiological molecular drivers.

...read moreread less

207 citations

Posted Content•DOI•

Phased Diploid Genome Assembly with Single Molecule Real-Time Sequencing

[...]

Chen-Shan Chin¹, Paul Peluso¹, Fritz J. Sedlazeck², Maria Nattestad³, Gregory T. Concepcion¹, Alicia Clum⁴, Christopher Dunn¹, Ronan C. O'Malley⁵, Rosa Figueroa-Balderas⁶, Abraham Morales-Cruz⁶, Grant R. Cramer⁷, Massimo Delledonne⁸, Chongyuan Luo⁵, Joseph R. Ecker⁵, Dario Cantu⁶, David R. Rank¹, Michael C. Schatz² - Show less +13 more•Institutions (8)

Pacific Biosciences¹, Johns Hopkins University², Cold Spring Harbor Laboratory³, Joint Genome Institute⁴, Salk Institute for Biological Studies⁵, University of California, Davis⁶, University of Nevada, Reno⁷, University of Verona⁸

03 Jun 2016-bioRxiv

TL;DR: The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches, and enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.

...read moreread less

Abstract: While genome assembly projects have been successful in a number of haploid or inbred species, one of the current main challenges is assembling non-inbred or rearranged heterozygous genomes. To address this critical need, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble Single Molecule Real-Time (SMRT(R)) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We demonstrate the quality of this approach by assembling new reference sequences for three heterozygous samples, including an F1 hybrid of the model species Arabidopsis thaliana, the widely cultivated V. vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata that have challenged short-read assembly approaches. The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches. The phased diploid assembly enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.

...read moreread less

205 citations

Journal Article•DOI•

Dysregulation of the epigenetic landscape of normal aging in Alzheimer's disease.

[...]

Raffaella Nativio¹, Greg Donahue¹, Amit Berson¹, Yemin Lan¹, Alexandre Amlie-Wolf¹, Ferit Tuzer², Jon B. Toledo¹, Sager J. Gosai¹, Brian D. Gregory¹, Claudio Torres², John Q. Trojanowski¹, Li-San Wang¹, F. Brad Johnson¹, Nancy M. Bonini¹, Shelley L. Berger¹ - Show less +11 more•Institutions (2)

University of Pennsylvania¹, Drexel University²

05 Mar 2018-Nature Neuroscience

TL;DR: By comparing the genome-wide profile of H4K16ac in AD with younger and elder controls, the authors propose a mechanism for how age is a risk factor for AD: a histone modification, whose accumulation is associated with aging, is dysregulated in AD.

...read moreread less

Abstract: Aging is the strongest risk factor for Alzheimer’s disease (AD), although the underlying mechanisms remain unclear. The chromatin state, in particular through the mark H4K16ac, has been implicated in aging and thus may play a pivotal role in age-associated neurodegeneration. Here we compare the genome-wide enrichment of H4K16ac in the lateral temporal lobe of AD individuals against both younger and elderly cognitively normal controls. We found that while normal aging leads to H4K16ac enrichment, AD entails dramatic losses of H4K16ac in the proximity of genes linked to aging and AD. Our analysis highlights the presence of three classes of AD-related changes with distinctive functional roles. Furthermore, we discovered an association between the genomic locations of significant H4K16ac changes with genetic variants identified in prior AD genome-wide association studies and with expression quantitative trait loci. Our results establish the basis for an epigenetic link between aging and AD.

...read moreread less

205 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
…
57
58
59
60
61
62
63
…
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations