A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution.

[...]

Reedik Mägi¹, Momoko Horikoshi², Tamar Sofer³, Anubha Mahajan², Hidetoshi Kitajima², Nora Franceschini⁴, Mark I. McCarthy², Mark I. McCarthy⁵, Andrew P. Morris - Show less +5 more•Institutions (5)

University of Tartu¹, Wellcome Trust Centre for Human Genetics², University of Washington³, University of North Carolina at Chapel Hill⁴, University of Oxford⁵

15 Sep 2017-Human Molecular Genetics

TL;DR: A novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry and increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis is demonstrated.

...read moreread less

Abstract: Trans-ethnic meta-analysis of genome-wide association studies (GWAS) across diverse populations can increase power to detect complex trait loci when the underlying causal variants are shared between ancestry groups. However, heterogeneity in allelic effects between GWAS at these loci can occur that is correlated with ancestry. Here, a novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry. We employ trans-ethnic meta-regression to model allelic effects as a function of axes of genetic variation, derived from a matrix of mean pairwise allele frequency differences between GWAS, and implemented in the MR-MEGA software. Through detailed simulations, we demonstrate increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis across a range of scenarios of heterogeneity in allelic effects between ethnic groups. We also demonstrate improved fine-mapping resolution, in loci containing a single causal variant, compared to these meta-analysis approaches and PAINTOR, and equivalent performance to MANTRA at reduced computational cost. Application of MR-MEGA to trans-ethnic GWAS of kidney function in 71,461 individuals indicates stronger signals of association than fixed-effects meta-analysis when heterogeneity in allelic effects is correlated with ancestry. Application of MR-MEGA to fine-mapping four type 2 diabetes susceptibility loci in 22,086 cases and 42,539 controls highlights: (i) strong evidence for heterogeneity in allelic effects that is correlated with ancestry only at the index SNP for the association signal at the CDKAL1 locus; and (ii) 99% credible sets with six or fewer variants for five distinct association signals.

...read moreread less

142 citations

Journal Article•DOI•

Host genetic variation and its microbiome interactions within the Human Microbiome Project

[...]

Raivo Kolde¹, Eric A. Franzosa¹, Eric A. Franzosa², Gholamali Rahnavard¹, Gholamali Rahnavard², Andrew Brantley Hall², Hera Vlamakis², Christine Stevens², Mark J. Daly², Mark J. Daly¹, Ramnik J. Xavier³, Ramnik J. Xavier², Ramnik J. Xavier¹, Curtis Huttenhower², Curtis Huttenhower¹ - Show less +11 more•Institutions (3)

Harvard University¹, Broad Institute², Massachusetts Institute of Technology³

29 Jan 2018-Genome Medicine

TL;DR: The authors in this paper performed whole-genome sequencing of 298 donors from the Human Microbiome Project (HMP) healthy cohort study to accompany existing deep characterization of their microbiomes at various body sites.

...read moreread less

Abstract: Despite the increasing recognition that microbial communities within the human body are linked to health, we have an incomplete understanding of the environmental and molecular interactions that shape the composition of these communities. Although host genetic factors play a role in these interactions, these factors have remained relatively unexplored given the requirement for large population-based cohorts in which both genotyping and microbiome characterization have been performed. We performed whole-genome sequencing of 298 donors from the Human Microbiome Project (HMP) healthy cohort study to accompany existing deep characterization of their microbiomes at various body sites. This analysis yielded an average sequencing depth of 32x, with which we identified 27 million (M) single nucleotide variants and 2.3 M insertions-deletions. Taxonomic composition and functional potential of the microbiome covaried significantly with genetic principal components in the gastrointestinal tract and oral communities, but not in the nares or vaginal microbiota. Example associations included validation of known associations between FUT2 secretor status, as well as a variant conferring hypolactasia near the LCT gene, with Bifidobacterium longum abundance in stool. The associations of microbial features with both high-level genetic attributes and single variants were specific to particular body sites, highlighting the opportunity to find unique genetic mechanisms controlling microbiome properties in the microbial communities from multiple body sites. This study adds deep sequencing of host genomes to the body-wide microbiome sequences already extant from the HMP healthy cohort, creating a unique, versatile, and well-controlled reference for future studies seeking to identify host genetic modulators of the microbiome.

...read moreread less

142 citations

Journal Article•DOI•

Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity.

[...]

Janie F. Shelton, Anjali J. Shastri, Chelsea Ye, Catherine H. Weldon, Teresa Filshtein-Sonmez, Daniella Coker, Antony Symons, Jorge Esparza-Gordillo, Stella Aslibekyan, Adam Auton - Show less +6 more

22 Apr 2021-Nature Genetics

TL;DR: In this paper, a study of 1,051,032 23andMe research participants was conducted to identify genetic and nongenetic associations with testing positive for SARS-CoV-2, respiratory symptoms and hospitalization.

...read moreread less

Abstract: COVID-19 presents with a wide range of severity, from asymptomatic in some individuals to fatal in others. Based on a study of 1,051,032 23andMe research participants, we report genetic and nongenetic associations with testing positive for SARS-CoV-2, respiratory symptoms and hospitalization. Using trans-ancestry genome-wide association studies, we identified a strong association between blood type and COVID-19 diagnosis, as well as a gene-rich locus on chromosome 3p21.31 that is more strongly associated with outcome severity. Hospitalization risk factors include advancing age, male sex, obesity, lower socioeconomic status, non-European ancestry and preexisting cardiometabolic conditions. While non-European ancestry was a significant risk factor for hospitalization after adjusting for sociodemographics and preexisting health conditions, we did not find evidence that these two primary genetic associations explain risk differences between populations for severe COVID-19 outcomes.

...read moreread less

141 citations

Journal Article•DOI•

SweGen : a whole-genome data resource of genetic variability in a cross-section of the Swedish population

[...]

Adam Ameur¹, Johan Dahlberg¹, Pall I. Olason¹, Francesco Vezzi¹, Rose-Marie Karlsson², Marcel Martin¹, Johan Viklund¹, Andreas Kähäri¹, Pär Lundin¹, Huiwen Che¹, Jessada Thutkawkorapin², Jesper Eisfeldt², Samuel Lampa¹, Samuel Lampa³, Mats Dahlberg¹, Jonas Hagberg¹, Niclas Jareborg¹, Ulrika Liljedahl¹, Inger Jonasson¹, Åsa Johansson¹, Lars Feuk¹, Joakim Lundeberg⁴, Joakim Lundeberg¹, Ann-Christine Syvänen¹, Sverker Lundin⁴, Daniel Nilsson², Björn Nystedt¹, Patrik K. E. Magnusson², Ulf Gyllensten¹ - Show less +25 more•Institutions (4)

Science for Life Laboratory¹, Karolinska Institutet², Uppsala University³, Royal Institute of Technology⁴

23 Aug 2017-European Journal of Human Genetics

TL;DR: The SweGen data set is described, a comprehensive map of genetic variation in the Swedish population that represents a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts.

...read moreread less

Abstract: Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for seq ...

...read moreread less

141 citations

Journal Article•DOI•

microRNAs in the Same Clusters Evolve to Coordinately Regulate Functionally Related Genes

[...]

Yirong Wang¹, Junjie Luo¹, Hong Zhang¹, Jian Lu¹•Institutions (1)

Peking University¹

28 Apr 2016-Molecular Biology and Evolution

TL;DR: It is suggested that positive Darwinian selection might be the driving force underlying the formation and evolution of miRNA clustering and the functional co-adaptation between new and old miRNAs in the miR-17–92 cluster.

...read moreread less

Abstract: MicroRNAs (miRNAs) are endogenously expressed small noncoding RNAs. The genomic locations of animal miRNAs are significantly clustered in discrete loci. We found duplication and de novo formation were important mechanisms to create miRNA clusters and the clustered miRNAs tend to be evolutionarily conserved. We proposed a "functional co-adaptation" model to explain how clustering helps newly emerged miRNAs survive and develop functions. We presented evidence that abundance of miRNAs in the same clusters were highly correlated and those miRNAs exerted cooperative repressive effects on target genes in human tissues. By transfecting miRNAs into human and fly cells and extensively profiling the transcriptome alteration with deep-sequencing, we further demonstrated the functional co-adaptation between new and old miRNAs in the miR-17-92 cluster. Our population genomic analysis suggest that positive Darwinian selection might be the driving force underlying the formation and evolution of miRNA clustering. Our model provided novel insights into mechanisms and evolutionary significance of miRNA clustering.

...read moreread less

141 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
…
92
93
94
95
96
97
98
…
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations