A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Next-generation genotype imputation service and methods.

[...]

Sayantan Das¹, Lukas Forer², Sebastian Schönherr², Carlo Sidore¹, Carlo Sidore³, Adam E. Locke¹, Alan Kwong¹, Scott I. Vrieze⁴, Emily Y. Chew⁵, Shawn Levy, Matt McGue⁶, David Schlessinger⁵, Dwight Stambolian⁷, Po-Ru Loh⁸, William G. Iacono⁶, Anand Swaroop⁵, Laura J. Scott¹, Francesco Cucca³, Florian Kronenberg², Michael Boehnke¹, Gonçalo R. Abecasis¹, Christian Fuchsberger¹, Christian Fuchsberger⁹, Christian Fuchsberger² - Show less +20 more•Institutions (9)

University of Michigan¹, Innsbruck Medical University², University of Sassari³, University of Colorado Boulder⁴, National Institutes of Health⁵, University of Minnesota⁶, University of Pennsylvania⁷, Harvard University⁸, University of Lübeck⁹

01 Oct 2016-Nature Genetics

TL;DR: Improvements to imputation machinery are described that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools.

...read moreread less

Abstract: Christian Fuchsberger, Goncalo Abecasis and colleagues describe a new web-based imputation service that enables rapid imputation of large numbers of samples and allows convenient access to large reference panels of sequenced individuals. Their state space reduction provides a computationally efficient solution for genotype imputation with no loss in imputation accuracy.

...read moreread less

2,556 citations

Journal Article•DOI•

The MR-Base platform supports systematic causal inference across the human phenome

[...]

Gibran Hemani¹, Jie Zheng¹, Benjamin Elsworth¹, Kaitlin H Wade¹, Valeriia Haberland¹, Denis Baird¹, Charles Laurin¹, Stephen Burgess², Jack Bowden¹, Ryan Langdon¹, Vanessa Y Tan¹, James Yarmolinsky¹, Hashem A Shihab¹, Nicholas J. Timpson¹, David M. Evans¹, David M. Evans³, Caroline L Relton¹, Richard M. Martin¹, George Davey Smith¹, Tom R. Gaunt¹, Philip C Haycock¹ - Show less +17 more•Institutions (3)

Medical Research Council¹, University of Cambridge², University of Queensland³

30 May 2018-eLife

TL;DR: MR-Base is a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR, and includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions.

...read moreread less

Abstract: Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base ( http://www.mrbase.org ): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.

...read moreread less

2,520 citations

Journal Article•DOI•

Twelve years of SAMtools and BCFtools.

[...]

Petr Danecek¹, James K. Bonfield¹, Jennifer Liddle¹, John Marshall², Valeriu Ohan¹, Martin O. Pollard¹, Andrew Whitwham¹, Thomas M. Keane³, Shane A. McCarthy¹, Robert L. Davies¹, Heng Li⁴ - Show less +7 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of Glasgow², European Bioinformatics Institute³, Harvard University⁴

01 Feb 2021-GigaScience

TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.

...read moreread less

Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

...read moreread less

2,448 citations

Journal Article•DOI•

Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases.

[...]

Marie Verbanck¹, Chia-Yen Chen², Benjamin M. Neale³, Benjamin M. Neale², Ron Do¹ - Show less +1 more•Institutions (3)

Icahn School of Medicine at Mount Sinai¹, Harvard University², Broad Institute³

23 Apr 2018-Nature Genetics

TL;DR: The MR-PRESSO test detects and corrects horizontal pleiotropy in multi-instrument Mendelian randomization (MR) analyses and introduces distortions in the causal estimates in MR that ranged on average from –131% to 201%; it is shown using simulations that the MR-pressO test is best suited when horizontal Pleiotropy occurs in <50% of instruments.

...read moreread less

Abstract: Horizontal pleiotropy occurs when the variant has an effect on disease outside of its effect on the exposure in Mendelian randomization (MR). Violation of the ‘no horizontal pleiotropy’ assumption can cause severe bias in MR. We developed the Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) test to identify horizontal pleiotropic outliers in multi-instrument summary-level MR testing. We showed using simulations that the MR-PRESSO test is best suited when horizontal pleiotropy occurs in 48% of causal relationships.

...read moreread less

2,362 citations

Journal Article•DOI•

ClinVar: improving access to variant interpretations and supporting evidence.

[...]

Melissa J. Landrum¹, Jennifer M. Lee¹, Mark L. Benson¹, Garth Brown¹, Chen Chao¹, Shanmuga Chitipiralla¹, Baoshan Gu¹, Jennifer Hart¹, Douglas W. Hoffman¹, Wonhee Jang¹, Karen Karapetyan¹, Kenneth S. Katz¹, Chunlei Liu¹, Zenith Maddipatla¹, Malheiro Aj¹, Kurt McDaniel¹, Michael Ovetsky¹, George R. Riley¹, George Zhou¹, J. Bradley Holmes¹, Brandi L. Kattman¹, Donna Maglott¹ - Show less +18 more•Institutions (1)

National Institutes of Health¹

04 Jan 2018-Nucleic Acids Research

TL;DR: ClinVar continues to make improvements to its search and retrieval functions.

...read moreread less

Abstract: ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease, maintained at the National Institutes of Health. Interpretations of the clinical significance of variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups. ClinVar aggregates data by variant-disease pairs, and by variant (or set of variants). Data aggregated by variant are accessible on the website, in an improved set of variant call format files and as a new comprehensive XML report. ClinVar recently started accepting submissions that are focused primarily on providing phenotypic information for individuals who have had genetic testing. Submissions may come from clinical providers providing their own interpretation of the variant ('provider interpretation') or from groups such as patient registries that primarily provide phenotypic information from patients ('phenotyping only'). ClinVar continues to make improvements to its search and retrieval functions. Several new fields are now indexed for more precise searching, and filters allow the user to narrow down a large set of search results.

...read moreread less

2,345 citations

Cites background from "A global reference for human geneti..."

...(ii) The former AF INFO tag was split into three tags, one for each source of allele frequency data: AF ESP for GO-ESP [https://esp.gs.washington.edu/ drupal/]; AF EXAC for the ExAC Consortium (6); and AF TGP for the 1000 Genomes Project (7)....
[...]
...edu/ drupal/]; AF EXAC for the ExAC Consortium (6); and AF TGP for the 1000 Genomes Project (7)....
[...]

1
2
3
4
5
…
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations