A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Signatures of archaic adaptive introgression in present-day human populations

[...]

Fernando Racimo¹, Davide Marnetto², Emilia Huerta-Sanchez³•Institutions (3)

University of California, Berkeley¹, University of Turin², University of California, Merced³

18 Oct 2016-Molecular Biology and Evolution

TL;DR: It is found that the number and allelic frequencies of sites that are uniquely shared between archaic humans and specific present-day populations are particularly useful for detecting adaptive introgression.

...read moreread less

Abstract: Comparisons of DNA from archaic and modern humans show that these groups interbred, and in some cases received an evolutionary advantage from doing so. This process-adaptive introgression-may lead to a faster rate of adaptation than is predicted from models with mutation and selection alone. Within the last couple of years, a series of studies have identified regions of the genome that are likely examples of adaptive introgression. In many cases, once a region was ascertained as being introgressed, commonly used statistics based on both haplotype as well as allele frequency information were employed to test for positive selection. Introgression by itself, however, changes both the haplotype structure and the distribution of allele frequencies, thus confounding traditional tests for detecting positive selection. Therefore, patterns generated by introgression alone may lead to false inferences of positive selection. Here we explore models involving both introgression and positive selection to investigate the behavior of various statistics under adaptive introgression. In particular, we find that the number and allelic frequencies of sites that are uniquely shared between archaic humans and specific present-day populations are particularly useful for detecting adaptive introgression. We then examine the 1000 Genomes dataset to characterize the landscape of uniquely shared archaic alleles in human populations. Finally, we identify regions that were likely subject to adaptive introgression and discuss some of the most promising candidate genes located in these regions.

...read moreread less

162 citations

Cites background or methods from "A global reference for human geneti..."

...Indeed, population structure analyses of the 1000 Genomes samples suggest that Peruvians have the largest amount of Native American ancestry (Auton et al. 2015) and show a bottleneck with a lack of recent population growth, which could explain this pattern....
[...]
...We then apply these statistics to real human genomic data from phase 3 of the 1000 Genomes Project (Auton et al. 2015), to detect AI in human populations, and find candidate genes....
[...]
...We used each of the non-African panels in the 1000 Genomes Project phase 3 data (Auton et al. 2015) as the “target” panel (B), and chose the outgroup panel (A) to be the combination of all African populations (YRI, LWK, GWD, MSL, and ESN), excluding admixed African-Americans....
[...]
...Candidate Regions for Adaptive Introgression To identify adaptively introgressed regions of the genome, we computed UA;B;C;Dðw; x; y; zÞ and Q95A;B;C;Dðw; y; zÞ in 40 kb nonoverlapping windows along the genome, using the Archaic Adaptive Introgression in Present-Day Human Populations . doi:10.1093/molbev/msw216 MBE low-coverage sequencing data from phase 3 of the 1000 Genomes Project (Auton et al. 2015)....
[...]
...By scanning the present-day human genomes from phase 3 of the 1000 Genomes Project (Auton et al. 2015) using these and other summary statistics, we were able to recapitulate previous AI findings (like the TLR [Dannemann et al. 2016; Deschamps et al. 2016] and OAS regions [Mendez et al. 2013]) as well as identify new candidate regions for AI in Eurasia (like the LIPA gene and the FAP/IFIH1 region)....
[...]

Journal Article•DOI•

Histone Lysine Methylases and Demethylases in the Landscape of Human Developmental Disorders

[...]

Víctor Faundes¹, Víctor Faundes², William G. Newman, Laura Bernardini³, Natalie Canham⁴, Jill Clayton-Smith, Bruno Dallapiccola, Sally J. Davies⁵, Michelle Demos⁶, Amy Goldman, Harinder Gill, Rachel Horton⁷, Bronwyn Kerr, Dhavendra Kumar⁵, Anna Lehman, Shane McKee⁸, Jenny Morton⁹, Michael Parker⁹, Julia Rankin, Lisa Robertson¹⁰, I. Karen Temple⁷, Siddharth Banka - Show less +18 more•Institutions (10)

University of Manchester¹, University of Chile², Casa Sollievo della Sofferenza³, Northwick Park Hospital⁴, University Hospital of Wales⁵, University of British Columbia⁶, Princess Anne Hospital⁷, Belfast City Hospital⁸, Boston Children's Hospital⁹, Wellcome Trust¹⁰

04 Jan 2018-American Journal of Human Genetics

TL;DR: The results demonstrate that systematic clinically oriented pathway-based analysis of genomic data can accelerate the discovery of rare genetic disorders.

...read moreread less

Abstract: Histone lysine methyltransferases (KMTs) and demethylases (KDMs) underpin gene regulation. Here we demonstrate that variants causing haploinsufficiency of KMTs and KDMs are frequently encountered in individuals with developmental disorders. Using a combination of human variation databases and existing animal models, we determine 22 KMTs and KDMs as additional candidates for dominantly inherited developmental disorders. We show that KMTs and KDMs that are associated with, or are candidates for, dominant developmental disorders tend to have a higher level of transcription, longer canonical transcripts, more interactors, and a higher number and more types of post-translational modifications than other KMT and KDMs. We provide evidence to firmly associate KMT2C, ASH1L, and KMT5B haploinsufficiency with dominant developmental disorders. Whereas KMT2C or ASH1L haploinsufficiency results in a predominantly neurodevelopmental phenotype with occasional physical anomalies, KMT5B mutations cause an overgrowth syndrome with intellectual disability. We further expand the phenotypic spectrum of KMT2B-related disorders and show that some individuals can have severe developmental delay without dystonia at least until mid-childhood. Additionally, we describe a recessive histone lysine-methylation defect caused by homozygous or compound heterozygous KDM5B variants and resulting in a recognizable syndrome with developmental delay, facial dysmorphism, and camptodactyly. Collectively, these results emphasize the significance of histone lysine methylation in normal human development and the importance of this process in human developmental disorders. Our results demonstrate that systematic clinically oriented pathway-based analysis of genomic data can accelerate the discovery of rare genetic disorders.

...read moreread less

162 citations

Journal Article•DOI•

Germline selection shapes human mitochondrial DNA diversity.

[...]

Wei Wei¹, Salih Tuna¹, Michael J. Keogh¹, Katherine R. Smith, Timothy J. Aitman², Timothy J. Aitman³, PL Beales⁴, PL Beales⁵, David L.H. Bennett⁶, Daniel P. Gale⁷, Bitner-Glindzicz Mak.⁴, Bitner-Glindzicz Mak.⁵, Graeme C.M. Black⁸, Graeme C.M. Black⁹, Paul Brennan¹⁰, Paul Brennan¹¹, Perry M. Elliott¹², Perry M. Elliott⁷, Frances Flinter¹³, R A Floto¹, R A Floto¹⁴, R A Floto¹⁵, Henry Houlden¹⁶, Melita Irving¹³, Ania Koziell¹³, Ania Koziell¹⁷, Eamonn R. Maher¹, Hugh S. Markus¹, Nicholas W. Morrell¹, William G. Newman⁸, William G. Newman⁹, Irene Roberts¹⁸, John A. Sayer¹¹, John A. Sayer¹⁰, Smith Kgc.¹, Jenny C. Taylor¹⁸, Hugh Watkins¹⁸, A. R. Webster¹⁹, A. R. Webster²⁰, Wilkie Aom.⁶, Wilkie Aom.¹⁸, Catherine Williamson¹⁷, Catherine Williamson²¹, Sofie Ashford¹, Christopher J. Penkett¹, Kathleen Stirrups¹, Augusto Rendon¹, Willem H. Ouwehand, John Bradley, F L Raymond¹, Mark J. Caulfield²², Ernest Turro¹, Patrick F. Chinnery¹ - Show less +49 more•Institutions (22)

University of Cambridge¹, Imperial College London², University of Edinburgh³, Great Ormond Street Hospital for Children NHS Foundation Trust⁴, UCL Institute of Child Health⁵, John Radcliffe Hospital⁶, University College London⁷, St Mary's Hospital⁸, University of Manchester⁹, Newcastle University¹⁰, Newcastle upon Tyne Hospitals NHS Foundation Trust¹¹, St Bartholomew's Hospital¹², Guy's and St Thomas' NHS Foundation Trust¹³, Papworth Hospital¹⁴, Cambridge University Hospitals NHS Foundation Trust¹⁵, UCL Institute of Neurology¹⁶, King's College London¹⁷, University of Oxford¹⁸, Moorfields Eye Hospital¹⁹, UCL Institute of Ophthalmology²⁰, Imperial College Healthcare²¹, Queen Mary University of London²²

24 May 2019-Science

TL;DR: The characteristics of mtDNA in the human population are shaped by selective forces acting on heteroplasmy within the female germ line and are influenced by the nuclear genetic background, as indicated by population genetic evidence that selection shapes the evolving mtDNA phylogeny.

...read moreread less

Abstract: Approximately 2.4% of the human mitochondrial DNA (mtDNA) genome exhibits common homoplasmic genetic variation. We analyzed 12,975 whole-genome sequences to show that 45.1% of individuals from 1526 mother-offspring pairs harbor a mixed population of mtDNA (heteroplasmy), but the propensity for maternal transmission differs across the mitochondrial genome. Over one generation, we observed selection both for and against variants in specific genomic regions; known variants were more likely to be transmitted than previously unknown variants. However, new heteroplasmies were more likely to match the nuclear genetic ancestry as opposed to the ancestry of the mitochondrial genome on which the mutations occurred, validating our findings in 40,325 individuals. Thus, human mtDNA at the population level is shaped by selective forces within the female germ line under nuclear genetic control, which ensures consistency between the two independent genetic lineages.

...read moreread less

162 citations

Journal Article•DOI•

Mapping and Characterization of Structural Variation in 17,795 Human Genomes

[...]

Haley J. Abel¹, David E. Larson¹, Allison A. Regier¹, Colby Chiang¹, Indraniel Das¹, Krishna L. Kanchi¹, Ryan M. Layer², Benjamin M. Neale³, Benjamin M. Neale⁴, William J Salerno⁵, Catherine Reeves, Steven Buyske⁶, Nhgri Centers for Common Disease Genomics⁶, Tara C. Matise⁶, Donna M. Muzny⁵, Michael C. Zody, Eric S. Lander⁴, Eric S. Lander³, Eric S. Lander⁷, Susan K. Dutcher¹, Nathan O. Stitziel¹, Ira M. Hall¹ - Show less +18 more•Institutions (7)

Washington University in St. Louis¹, University of Colorado Boulder², Broad Institute³, Harvard University⁴, Baylor College of Medicine⁵, Rutgers University⁶, Massachusetts Institute of Technology⁷

02 Jul 2020-Nature

TL;DR: A scalable pipeline is used to map and characterize structural variants in 17,795 deeply sequenced human genomes to create the largest, to the authors' knowledge, whole-genome-sequencing-based structural variant resource so far and infer the dosage sensitivity of genes and noncoding elements.

...read moreread less

Abstract: A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.

...read moreread less

162 citations

Journal Article•DOI•

Extensive heterogeneity in somatic mutation and selection in the human bladder

[...]

Andrew R. J. Lawson¹, Federico Abascal¹, Tim H. H. Coorens¹, Yvette Hooks¹, Laura O’Neill¹, Calli Latimer¹, Keiran Raine¹, Mathijs A. Sanders¹, Mathijs A. Sanders², Anne Y. Warren³, Krishnaa T. Mahbubani⁴, Bethany Bareham⁴, Tim Butler¹, Luke M. R. Harvey¹, Alex Cagan¹, Andrew Menzies¹, Luiza Moore¹, Luiza Moore³, Alexandra Colquhoun³, William Turner³, Benjamin Thomas⁵, Benjamin Thomas⁶, Vincent Gnanapragasam⁴, Nicholas Williams¹, Doris Rassl⁴, Harald Vöhringer⁷, Sonia Zumalave⁸, Jyoti Nangalia¹, Jose M. C. Tubio⁸, Jose M. C. Tubio⁹, Moritz Gerstung⁷, Kourosh Saeb-Parsy⁴, Michael R. Stratton¹, Peter J. Campbell¹, Peter J. Campbell⁴, Thomas J. Mitchell³, Thomas J. Mitchell¹, Inigo Martincorena¹ - Show less +34 more•Institutions (9)

Wellcome Trust Sanger Institute¹, Erasmus University Medical Center², Cambridge University Hospitals NHS Foundation Trust³, University of Cambridge⁴, University of Melbourne⁵, Royal Melbourne Hospital⁶, European Bioinformatics Institute⁷, University of Santiago de Compostela⁸, University of Vigo⁹

02 Oct 2020-Science

TL;DR: A rich landscape of mutational processes and selection in normal urothelium with large heterogeneity across clones and individuals is revealed, which suggests differential exposure to mutagens in the urine.

...read moreread less

Abstract: The extent of somatic mutation and clonal selection in the human bladder remains unknown. We sequenced 2097 bladder microbiopsies from 20 individuals using targeted (n = 1914 microbiopsies), whole-exome (n = 655), and whole-genome (n = 88) sequencing. We found widespread positive selection in 17 genes. Chromatin remodeling genes were frequently mutated, whereas mutations were absent in several major bladder cancer genes. There was extensive interindividual variation in selection, with different driver genes dominating the clonal landscape across individuals. Mutational signatures were heterogeneous across clones and individuals, which suggests differential exposure to mutagens in the urine. Evidence of APOBEC mutagenesis was found in 22% of the microbiopsies. Sequencing multiple microbiopsies from five patients with bladder cancer enabled comparisons with cancer-free individuals and across histological features. This study reveals a rich landscape of mutational processes and selection in normal urothelium with large heterogeneity across clones and individuals.

...read moreread less

162 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
…
77
78
79
80
81
82
83
…
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations