A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Estimating the human mutation rate from autozygous segments reveals population differences in human mutational processes

[...]

Vagheesh M. Narasimhan¹, Raheleh Rahbari¹, Aylwyn Scally², Arthur Wuster¹, Arthur Wuster³, Dan Mason⁴, Yali Xue¹, John Wright⁴, Richard C. Trembath⁵, Eamonn R. Maher², David A. van Heel⁶, Adam Auton⁷, Matthew E. Hurles¹, Chris Tyler-Smith¹, Richard Durbin¹ - Show less +11 more•Institutions (7)

Wellcome Trust Sanger Institute¹, University of Cambridge², Genentech³, National Health Service⁴, King's College London⁵, Queen Mary University of London⁶, Albert Einstein College of Medicine⁷

21 Aug 2017-Nature Communications

TL;DR: A multi-generational estimate from the autozygous segment in a non-European population that gives insight into the contribution of post-zygotic mutations and population-specific mutational processes is presented.

...read moreread less

Abstract: Heterozygous mutations within homozygous sequences descended from a recent common ancestor offer a way to ascertain de novo mutations across multiple generations. Using exome sequences from 3222 British-Pakistani individuals with high parental relatedness, we estimate a mutation rate of 1.45 ± 0.05 × 10−8 per base pair per generation in autosomal coding sequence, with a corresponding non-crossover gene conversion rate of 8.75 ± 0.05 × 10−6 per base pair per generation. This is at the lower end of exome mutation rates previously estimated in parent–offspring trios, suggesting that post-zygotic mutations contribute little to the human germ-line mutation rate. We find frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in a 5ʹ CCG 3ʹ to 5ʹ CTG 3ʹ context in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations. Estimates of human mutation rates differ substantially based on the approach. Here, the authors present a multi-generational estimate from the autozygous segment in a non-European population that gives insight into the contribution of post-zygotic mutations and population-specific mutational processes.

...read moreread less

92 citations

Journal Article•DOI•

Actionable Activating Oncogenic ERBB2/HER2 Transmembrane and Juxtamembrane Domain Mutations

[...]

Kanika Bajaj Pahuja¹, Thong T. Nguyen¹, Bijay S. Jaiswal¹, Kumar Prabhash², Tarjani M. Thaker³, Kate Senger¹, Subhra Chaudhuri¹, Noelyn M. Kljavin¹, Aju Antony, Sameer Phalke, Prasanna Kumar, Marco Mravic³, Eric Stawiski¹, Derek Vargas, Steffen Durinck¹, Ravi Gupta, Arati Khanna-Gupta, Sally E. Trabucco⁴, Ethan Sokol⁴, Ryan J. Hartmaier⁴, Ashish Singh⁵, Anuradha Chougule², Vaishakhi Trivedi², Amit Dutt⁶, Vijay Patil², Amit Joshi², Vanita Noronha², James Ziai¹, S D Banavali², Vedam L. Ramprasad, William F. DeGrado³, Raphael Bueno⁷, Natalia Jura³, Somasekar Seshagiri¹ - Show less +30 more•Institutions (7)

Genentech¹, Tata Memorial Hospital², University of California, San Francisco³, Foundation Medicine⁴, Christian Medical College & Hospital⁵, Homi Bhabha National Institute⁶, Brigham and Women's Hospital⁷

12 Nov 2018-Cancer Cell

TL;DR: Structural modeling and analysis showed that the TMD/JMD mutations function by improving the active dimer interface or stabilizing an activating conformation, and it was found that HER2 G660D employed asymmetric kinase dimerization for activation and signaling.

...read moreread less

92 citations

Journal Article•DOI•

Large-Scale Identification of Common Trait and Disease Variants Affecting Gene Expression.

[...]

Mads E. Hauberg, Wen Zhang¹, Claudia Giambartolomei¹, Oscar Franzén¹, David L. Morris², Timothy J. Vyse², Arno Ruusalepp³, Arno Ruusalepp⁴, Menachem Fromer, Solveig K. Sieberts¹, Jessica S. Johnson¹, Douglas M. Ruderfer¹, Hardik Shah¹, Lambertus Klei, Kristen K. Dang, Thanneer M. Perumal, Benjamin A. Logsdon, Milind Mahajan, Lara M. Mangravite, Laurent Essioux, Hiroyoshi Toyoshiba, Raquel E. Gur, Chang-Gyu Hahn, David A. Lewis, Vahram Haroutunian, Mette A. Peters, Barbara K. Lipska, Joseph D. Buxbaum, Keisuke Hirai, Enrico Domenici, Bernie Devlin, Pamela Sklar¹, Eric E. Schadt¹, Johan Björkegren, Panos Roussos¹, Panos Roussos⁵ - Show less +32 more•Institutions (5)

Icahn School of Medicine at Mount Sinai¹, King's College London², University of Tartu³, Tartu University Hospital⁴, Veterans Health Administration⁵

01 Jun 2017-American Journal of Human Genetics

TL;DR: This lexicon of how common trait-associated genetic variants alter gene expression in various tissues as the online database GWAS2Genes is provided to facilitate interpretation.

...read moreread less

Abstract: Genome-wide association studies (GWASs) have identified a multitude of genetic loci involved with traits and diseases However, it is often unclear which genes are affected in such loci and whether the associated genetic variants lead to increased or decreased gene function To mitigate this, we integrated associations of common genetic variants in 57 GWASs with 24 studies of expression quantitative trait loci (eQTLs) from a broad range of tissues by using a Mendelian randomization approach We discovered a total of 3,484 instances of gene-trait-associated changes in expression at a false-discovery rate < 005 These genes were often not closest to the genetic variant and were primarily identified in eQTLs derived from pathophysiologically relevant tissues For instance, genes with expression changes associated with lipid traits were mostly identified in the liver, and those associated with cardiovascular disease were identified in arterial tissue The affected genes additionally point to biological processes implicated in the interrogated traits, such as the interleukin-27 pathway in rheumatoid arthritis Further, comparing trait-associated gene expression changes across traits suggests that pleiotropy is a widespread phenomenon and points to specific instances of both agonistic and antagonistic pleiotropy For instance, expression of SNX19 and ABCB9 is positively correlated with both the risk of schizophrenia and educational attainment To facilitate interpretation, we provide this lexicon of how common trait-associated genetic variants alter gene expression in various tissues as the online database GWAS2Genes

...read moreread less

91 citations

Journal Article•DOI•

Multi-omics differentially classify disease state and treatment outcome in pediatric Crohn’s disease

[...]

Gavin M. Douglas¹, Richard Hansen, Casey M. A. Jones¹, Katherine A. Dunn¹, André M. Comeau¹, Joseph P. Bielawski¹, Rachel Tayler, Emad M. El-Omar², Richard K. Russell, Georgina L. Hold², Morgan G. I. Langille¹, Johan Van Limbergen¹ - Show less +8 more•Institutions (2)

Dalhousie University¹, University of New South Wales²

15 Jan 2018-Microbiome

TL;DR: It is demonstrated for the first time that useful predictors of CD treatment response can be produced from shotgun MGS sequencing of biopsy samples despite the complications related to large proportions of host DNA.

...read moreread less

Abstract: Crohn’s disease (CD) has an unclear etiology, but there is growing evidence of a direct link with a dysbiotic microbiome. Many gut microbes have previously been associated with CD, but these have mainly been confounded with patients’ ongoing treatments. Additionally, most analyses of CD patients’ microbiomes have focused on microbes in stool samples, which yield different insights than profiling biopsy samples. We sequenced the 16S rRNA gene (16S) and carried out shotgun metagenomics (MGS) from the intestinal biopsies of 20 treatment-naive CD and 20 control pediatric patients. We identified the abundances of microbial taxa and inferred functional categories within each dataset. We also identified known human genetic variants from the MGS data. We then used a machine learning approach to determine the classification accuracy when these datasets, collapsed to different hierarchical groupings, were used independently to classify patients by disease state and by CD patients’ response to treatment. We found that 16S-identified microbes could classify patients with higher accuracy in both cases. Based on follow-ups with these patients, we identified which microbes and functions were best for predicting disease state and response to treatment, including several previously identified markers. By combining the top features from all significant models into a single model, we could compare the relative importance of these predictive features. We found that 16S-identified microbes are the best predictors of CD state whereas MGS-identified markers perform best for classifying treatment response. We demonstrate for the first time that useful predictors of CD treatment response can be produced from shotgun MGS sequencing of biopsy samples despite the complications related to large proportions of host DNA. The top predictive features that we identified in this study could be useful for building an improved classifier for CD and treatment response based on sufferers’ microbiome in the future. The BISCUIT project is funded by a Clinical Academic Fellowship from the Chief Scientist Office (Scotland)—CAF/08/01.

...read moreread less

91 citations

Journal Article•DOI•

Placenta and appetite genes GDF15 and IGFBP7 are associated with hyperemesis gravidarum.

[...]

Marlena S. Fejzo¹, Marlena S. Fejzo², Olga V. Sazonova, J. Fah Sathirapongsasuti, Ingileif B. Hallgrímsdóttir³, Vladimir Vacic, Kimber MacGibbon, Frederic Paik Schoenberg¹, Nicholas Mancuso¹, Dennis J. Slamon¹, Patrick M. Mullin² - Show less +7 more•Institutions (3)

University of California, Los Angeles¹, University of Southern California², Amgen³

21 Mar 2018-Nature Communications

TL;DR: A genome-wide association study for binary (HG) and ordinal (severity of nausea and vomiting) phenotypes of pregnancy complications and identifies genetic associations at two loci implicating the genes GDF15 and IGFBP7, providing insights into the genetic risk factors contributing to the disease.

...read moreread less

Abstract: Hyperemesis gravidarum (HG), severe nausea and vomiting of pregnancy, occurs in 0.3-2% of pregnancies and is associated with maternal and fetal morbidity. The cause of HG remains unknown, but familial aggregation and results of twin studies suggest that understanding the genetic contribution is essential for comprehending the disease etiology. Here, we conduct a genome-wide association study (GWAS) for binary (HG) and ordinal (severity of nausea and vomiting) phenotypes of pregnancy complications. Two loci, chr19p13.11 and chr4q12, are genome-wide significant (p < 5 × 10-8) in both association scans and are replicated in an independent cohort. The genes implicated at these two loci are GDF15 and IGFBP7 respectively, both known to be involved in placentation, appetite, and cachexia. While proving the casual roles of GDF15 and IGFBP7 in nausea and vomiting of pregnancy requires further study, this GWAS provides insights into the genetic risk factors contributing to the disease.

...read moreread less

91 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
…
148
149
150
151
152
153
154
…
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations