A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Telomere length and genetic variant associations with interstitial lung disease progression and survival

[...]

Chad A. Newton¹, Justin M. Oldham², Brett Ley³, Vikram Anand¹, Ayodeji Adegunsoye⁴, Gabrielle Y. Liu³, Kiran Batra¹, Jose R. Torrealba¹, Julia Kozlitina¹, Craig S. Glazer¹, Mary E. Strek⁴, Paul J. Wolters³, Imre Noth⁵, Christine Kim Garcia¹ - Show less +10 more•Institutions (5)

University of Texas Southwestern Medical Center¹, University of California, Davis², University of California, San Francisco³, University of Chicago⁴, University of Virginia⁵

01 Apr 2019-European Respiratory Journal

TL;DR: Leukocyte telomere length and MUC5B minor allele frequency are similar for IPAF and the combined CTD-ILD group; however, the associations between these genomic markers and clinical outcomes are different for these two types of ILD.

...read moreread less

Abstract: Leukocyte telomere length (LTL), MUC5B rs35705950 and TOLLIP rs5743890 have been associated with idiopathic pulmonary fibrosis (IPF). In this observational cohort study, we assessed the associations between these genomic markers and outcomes of survival and rate of disease progression in patients with interstitial pneumonia with autoimmune features (IPAF, n=250) and connective tissue disease-associated interstitial lung disease (CTD-ILD, n=248). IPF (n=499) was used as a comparator. The LTL of IPAF and CTD-ILD patients (mean age-adjusted log-transformed T/S of −0.05±0.29 and −0.04±0.25, respectively) is longer than that of IPF patients (−0.17±0.32). For IPAF patients, LTL LTL and MUC5B MAF have different associations with lung function progression and survival for IPAF and CTD-ILD.

...read moreread less

107 citations

Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls

[...]

Jason Flannick, Josep M. Mercader, Christian Fuchsberger, Miriam S. Udler +160 more

01 Jan 2019

TL;DR: The authors used exome-sequencing analyses of a large cohort of patients with Type 2 diabetes and control individuals without diabetes from five ancestries to identify gene-level associations of rare variants that are associated with type 2 diabetes.

...read moreread less

Abstract: Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10−3) and candidate genes from knockout mice (P = 5.2 × 10−3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000–185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.Exome-sequencing analyses of a large cohort of patients with type 2 diabetes and control individuals without diabetes from five ancestries are used to identify gene-level associations of rare variants that are associated with type 2 diabetes.

...read moreread less

107 citations

Journal Article•DOI•

SNPnexus: a web server for functional annotation of human genome sequence variation (2020 update).

[...]

Jorge Oscanoa¹, Lavanya Sivapalan¹, Emanuela Gadaleta¹, Abu Z. Dayem Ullah¹, Nicholas R. Lemoine¹, Claude Chelala¹ - Show less +2 more•Institutions (1)

Queen Mary University of London¹

02 Jul 2020-Nucleic Acids Research

TL;DR: The scope for data annotation has been substantially expanded to enhance biological interpretations of queried variants and this includes the addition of pathway analysis for the identification of enriched biological pathways and molecular processes.

...read moreread less

Abstract: SNPnexus is a web-based annotation tool for the analysis and interpretation of both known and novel sequencing variations. Since its last release, SNPnexus has received continual updates to expand the range and depth of annotations provided. SNPnexus has undergone a complete overhaul of the underlying infrastructure to accommodate faster computational times. The scope for data annotation has been substantially expanded to enhance biological interpretations of queried variants. This includes the addition of pathway analysis for the identification of enriched biological pathways and molecular processes. We have further expanded the range of user directed annotation fields available for the study of cancer sequencing data. These new additions facilitate investigations into cancer driver variants and targetable molecular alterations within input datasets. New user directed filtering options have been coupled with the addition of interactive graphical and visualization tools. These improvements streamline the analysis of variants derived from large sequencing datasets for the identification of biologically and clinically significant subsets in the data. SNPnexus is the most comprehensible web-based application currently available and these new set of updates ensures that it remains a state-of-the-art tool for researchers. SNPnexus is freely available at https://www.snp-nexus.org.

...read moreread less

107 citations

Journal Article•DOI•

The importance of p53 pathway genetics in inherited and somatic cancer genomes

[...]

Giovanni Stracquadanio¹, Xuting Wang², Marsha D. Wallace¹, Anna M. Grawenda¹, Ping Zhang¹, Juliet Hewitt¹, Jorge Zeron-Medina³, Francesc Castro-Giner⁴, Ian Tomlinson⁴, Colin R. Goding¹, Kamil J. Cygan⁵, William G. Fairbrother⁵, Laurent F. Thomas⁶, Pål Sætrom⁶, Federica Gemignani⁷, Stefano Landi⁷, Benjamin Schuster-Böckler¹, Douglas A. Bell², Gareth L. Bond¹ - Show less +15 more•Institutions (7)

Ludwig Institute for Cancer Research¹, National Institutes of Health², Hebron University³, Wellcome Trust Centre for Human Genetics⁴, Brown University⁵, Norwegian University of Science and Technology⁶, University of Pisa⁷

01 Apr 2016-Nature Reviews Cancer

TL;DR: Using newly abundant genomic data, it is demonstrated that commonly inherited genetic variants in the p53 pathway also affect the incidence of a broad range of cancers more than variants in other pathways.

...read moreread less

Abstract: Decades of research have shown that mutations in the p53 stress response pathway affect the incidence of diverse cancers more than mutations in other pathways. However, most evidence is limited to somatic mutations and rare inherited mutations. Using newly abundant genomic data, we demonstrate that commonly inherited genetic variants in the p53 pathway also affect the incidence of a broad range of cancers more than variants in other pathways. The cancer-associated single nucleotide polymorphisms (SNPs) of the p53 pathway have strikingly similar genetic characteristics to well-studied p53 pathway cancer-causing somatic mutations. Our results enable insights into p53-mediated tumour suppression in humans and into p53 pathway-based cancer surveillance and treatment strategies.

...read moreread less

107 citations

Journal Article•DOI•

Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences.

[...]

Fanny Pouyet¹, Fanny Pouyet², Simon Aeschbacher³, Simon Aeschbacher², Simon Aeschbacher¹, Alexandre Thiéry², Alexandre Thiéry¹, Laurent Excoffier², Laurent Excoffier¹ - Show less +5 more•Institutions (3)

University of Bern¹, Swiss Institute of Bioinformatics², University of Zurich³

23 Aug 2018-eLife

TL;DR: High-quality human genomic data is used to show that purifying selection at linked sites and GC-biased gene conversion together affect as much as 95% of the variants of the genome, and identifies a set of SNPs that are mostly unaffected by BGS or gBGC.

...read moreread less

Abstract: Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.

...read moreread less

107 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
…
124
125
126
127
128
129
130
…
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations