A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A genomic perspective on HLA evolution

[...]

Diogo Meyer¹, Vitor R. C. Aguiar¹, Bárbara Domingues Bitarello², Débora Y. C. Brandt¹, Débora Y. C. Brandt³, Kelly Nunes¹ - Show less +2 more•Institutions (3)

University of São Paulo¹, Max Planck Society², University of California, Berkeley³

01 Jan 2018-Immunogenetics

TL;DR: It is argued that genomic datasets, in particular those generated by next-generation sequencing at the population scale, are transforming the authors' understanding of HLA evolution and show that genomewide data can be used to perform robust and powerful tests for selection, capable of identifying both positive and balancing selection at HLA genes.

...read moreread less

Abstract: Several decades of research have convincingly shown that classical human leukocyte antigen (HLA) loci bear signatures of natural selection. Despite this conclusion, many questions remain regarding the type of selective regime acting on these loci, the time frame at which selection acts, and the functional connections between genetic variability and natural selection. In this review, we argue that genomic datasets, in particular those generated by next-generation sequencing (NGS) at the population scale, are transforming our understanding of HLA evolution. We show that genomewide data can be used to perform robust and powerful tests for selection, capable of identifying both positive and balancing selection at HLA genes. Importantly, these tests have shown that natural selection can be identified at both recent and ancient timescales. We discuss how findings from genomewide association studies impact the evolutionary study of HLA genes, and how genomic data can be used to survey adaptive change involving interaction at multiple loci. We discuss the methodological developments which are necessary to correctly interpret genomic analyses involving the HLA region. These developments include adapting the NGS analysis framework so as to deal with the highly polymorphic HLA data, as well as developing tools and theory to search for signatures of selection, quantify differentiation, and measure admixture within the HLA region. Finally, we show that high throughput analysis of molecular phenotypes for HLA genes—namely transcription levels—is now a feasible approach and can add another dimension to the study of genetic variation.

...read moreread less

138 citations

Journal Article•DOI•

Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population.

[...]

Ken Suzuki¹, Ken Suzuki², Masato Akiyama³, Kazuyoshi Ishigaki, Masahiro Kanai⁴, Jun Hosoe², Nobuhiro Shojima², Atsushi Hozawa⁵, Aya Kadota⁶, Kiyonori Kuriki⁷, Mariko Naito⁸, Mariko Naito⁹, Kozo Tanno¹⁰, Yasushi Ishigaki¹⁰, Makoto Hirata², Koichi Matsuda², Nakao Iwata¹¹, Masashi Ikeda¹¹, Norie Sawada, Taiki Yamaji, Motoki Iwasaki, Shiro Ikegawa, Shiro Maeda¹², Yoshinori Murakami², Kenji Wakai⁸, Shoichiro Tsugane, Makoto Sasaki¹⁰, Masayuki Yamamoto⁵, Yukinori Okada¹, Michiaki Kubo, Yoichiro Kamatani¹³, Momoko Horikoshi, Toshimasa Yamauchi², Takashi Kadowaki², Takashi Kadowaki¹⁴ - Show less +31 more•Institutions (14)

Osaka University¹, University of Tokyo², Kyushu University³, Harvard University⁴, Tohoku University⁵, Shiga University of Medical Science⁶, University of Shizuoka⁷, Nagoya University⁸, Hiroshima University⁹, Iwate Medical University¹⁰, Fujita Health University¹¹, University of the Ryukyus¹², Kyoto University¹³, Teikyo University¹⁴

01 Mar 2019-Nature Genetics

TL;DR: Genome-wide association analyses identify 28 new susceptibility loci for type 2 diabetes in the Japanese population, including missense variants in genes related to pancreatic acinar cells (GP2) and insulin secretion (GLP1R).

...read moreread less

Abstract: To understand the genetics of type 2 diabetes in people of Japanese ancestry, we conducted A meta-analysis of four genome-wide association studies (GWAS; 36,614 cases and 155,150 controls of Japanese ancestry). We identified 88 type 2 diabetes–associated loci (P 0.6) with the lead variants. Among the 28 missense variants, three previously unreported variants had distinct minor allele frequency (MAF) spectra between people of Japanese and European ancestry (MAFJPN > 0.05 versus MAFEUR < 0.01), including missense variants in genes related to pancreatic acinar cells (GP2) and insulin secretion (GLP1R). Transethnic comparisons of the molecular pathways identified from the GWAS results highlight both ethnically shared and heterogeneous effects of a series of pathways on type 2 diabetes (for example, monogenic diabetes and beta cells). Genome-wide association analyses identify 28 new susceptibility loci for type 2 diabetes in the Japanese population. Transethnic comparisons highlight the key role of beta cell dysfunction in type 2 diabetes across different ancestry groups.

...read moreread less

138 citations

Journal Article•DOI•

An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank.

[...]

Stephen M. Smith¹, Gwenaëlle Douaud¹, Winfield Chen², Taylor Hanayik¹, Fidel Alfaro-Almagro¹, Kevin Sharp, Lloyd T. Elliott² - Show less +3 more•Institutions (2)

University of Oxford¹, Simon Fraser University²

19 Apr 2021-Nature Neuroscience

TL;DR: In this article, the authors presented a new open resource of genome-wide association study summary statistics, using the 2020 data release, almost tripling the discovery sample size, including the X chromosome and new classes of imaging-derived phenotypes.

...read moreread less

Abstract: UK Biobank is a major prospective epidemiological study, including multimodal brain imaging, genetics and ongoing health outcomes. Previously, we published genome-wide associations of 3,144 brain imaging-derived phenotypes, with a discovery sample of 8,428 individuals. Here we present a new open resource of genome-wide association study summary statistics, using the 2020 data release, almost tripling the discovery sample size. We now include the X chromosome and new classes of imaging-derived phenotypes (subcortical volumes and tissue contrast). Previously, we found 148 replicated clusters of associations between genetic variants and imaging phenotypes; in this study, we found 692, including 12 on the X chromosome. We describe some of the newly found associations, focusing on the X chromosome and autosomal associations involving the new classes of imaging-derived phenotypes. Our novel associations implicate, for example, pathways involved in the rare X-linked STAR (syndactyly, telecanthus and anogenital and renal malformations) syndrome, Alzheimer's disease and mitochondrial disorders.

...read moreread less

138 citations

Posted Content•DOI•

A method for genome-wide genealogy estimation for thousands of samples

[...]

Leo Speidel¹, Marie Forest², Sinan Shi¹, Simon Myers¹•Institutions (2)

University of Oxford¹, Université du Québec à Montréal²

14 Feb 2019-bioRxiv

TL;DR: This work developed a method, Relate, scaling to > 10,000 sequences while simultaneously estimating branch lengths, mutational ages, and variable historical population sizes, as well as allowing for data errors, to allow more powerful inferences of natural selection.

...read moreread less

Abstract: Knowledge of genome-wide genealogies for thousands of individuals would simplify most evolutionary analyses for humans and other species, but has remained computationally infeasible. We developed a method, Relate, scaling to > 10,000 sequences while simultaneously estimating branch lengths, mutational ages, and variable historical population sizes, as well as allowing for data errors. Application to 1000 Genomes Project haplotypes produces joint genealogical histories for 26 human populations. Highly diverged lineages are present in all groups, but most frequent in Africa. Outside Africa, these mainly reflect ancient introgression from groups related to Neanderthals and Denisovans, while African signals instead reflect unknown events, unique to that continent. Our approach allows more powerful inferences of natural selection than previously possible. We identify multiple novel regions under strong positive selection, and multi-allelic traits including hair colour, BMI, and blood pressure, showing strong evidence of directional selection, varying among human groups.

...read moreread less

138 citations

Journal Article•DOI•

Advances in therapeutic peptides targeting G protein-coupled receptors

[...]

Anthony P. Davenport¹, Conor C. G. Scully, Chris de Graaf, Alastair J. H. Brown, Janet J. Maguire¹ - Show less +1 more•Institutions (1)

University of Cambridge¹

19 Mar 2020-Nature Reviews Drug Discovery

TL;DR: A review of peptide drugs targeting G protein-coupled receptors (GPCRs) is presented in this paper, with a focus on evolving strategies to improve pharmacokinetic and pharmacodynamic properties.

...read moreread less

Abstract: Dysregulation of peptide-activated pathways causes a range of diseases, fostering the discovery and clinical development of peptide drugs. Many endogenous peptides activate G protein-coupled receptors (GPCRs) — nearly 50 GPCR peptide drugs have been approved to date, most of them for metabolic disease or oncology, and more than 10 potentially first-in-class peptide therapeutics are in the pipeline. The majority of existing peptide therapeutics are agonists, which reflects the currently dominant strategy of modifying the endogenous peptide sequence of ligands for peptide-binding GPCRs. Increasingly, novel strategies are being employed to develop both agonists and antagonists, to both introduce chemical novelty and improve drug-like properties. Pharmacodynamic improvements are evolving to allow biasing ligands to activate specific downstream signalling pathways, in order to optimize efficacy and reduce side effects. In pharmacokinetics, modifications that increase plasma half-life have been revolutionary. Here, we discuss the current status of the peptide drugs targeting GPCRs, with a focus on evolving strategies to improve pharmacokinetic and pharmacodynamic properties. Many G protein-coupled receptors (GPCRs) have endogenous peptide agonists, and modifying the sequence of these peptides has led to some successful therapeutics. In this Review, Davenport and colleagues discuss strategies to generate effective GPCR-targeted peptide therapeutics by introducing chemical novelty, extending plasma half-life, improving a therapeutic’s drug-like properties or generating biased ligands. These approaches could overcome some of the challenges in developing peptide therapeutics.

...read moreread less

137 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
…
94
95
96
97
98
99
100
…
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations