A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire

[...]

Daniel N. Harris¹, Wei Song¹, Amol C. Shetty¹, Kelly S. Levano, Omar Cáceres, Carlos Padilla, Victor Borda², David Tarazona, Omar Trujillo, Cesar Sanchez, Michael D. Kessler¹, Marco Galarza, Silvia Capristano, Harrison Montejo, Pedro O. Flores-Villanueva, Eduardo Tarazona-Santos², Timothy D. O’Connor¹, Heinner Guio - Show less +14 more•Institutions (2)

University of Maryland, Baltimore¹, Universidade Federal de Minas Gerais²

10 Jul 2018-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is found that the Native American populations possess distinct ancestral divisions, whereas the mestizo groups were admixtures of multiple Native American communities that occurred before and during the Inca Empire and Spanish rule.

...read moreread less

Abstract: Native Americans from the Amazon, Andes, and coastal geographic regions of South America have a rich cultural heritage but are genetically understudied, therefore leading to gaps in our knowledge of their genomic architecture and demographic history. In this study, we sequence 150 genomes to high coverage combined with an additional 130 genotype array samples from Native American and mestizo populations in Peru. The majority of our samples possess greater than 90% Native American ancestry, which makes this the most extensive Native American sequencing project to date. Demographic modeling reveals that the peopling of Peru began ∼12,000 y ago, consistent with the hypothesis of the rapid peopling of the Americas and Peruvian archeological data. We find that the Native American populations possess distinct ancestral divisions, whereas the mestizo groups were admixtures of multiple Native American communities that occurred before and during the Inca Empire and Spanish rule. In addition, the mestizo communities also show Spanish introgression largely following Peruvian Independence, nearly 300 y after Spain conquered Peru. Further, we estimate migration events between Peruvian populations from all three geographic regions with the majority of between-region migration moving from the high Andes to the low-altitude Amazon and coast. As such, we present a detailed model of the evolutionary dynamics which impacted the genomes of modern-day Peruvians and a Native American ancestry dataset that will serve as a beneficial resource to addressing the underrepresentation of Native American ancestry in sequencing studies.

...read moreread less

110 citations

Cites background or methods from "A global reference for human geneti..."

...American journal of human genetics 93(2):278-288....
[...]
...The median number of variants is 2 for all Peruvian populations, including PEL from 1000 genomes (2), as well as the Asian population CHB....
[...]
...Supplemental Information Methods ADMIXTURE Analysis Using the final combined dataset, we extracted all HGDP Native American individuals genotyped on the Human Origins Array (1), our samples, and the YRI, CEU, CHB, CLM, MXL, PEL, and PUR 1000 Genomes Project samples (2)....
[...]

Journal Article•DOI•

Molecular Genetic Anatomy and Risk Profile of Hirschsprung's Disease.

[...]

Joseph M. Tilghman¹, Albee Y. Ling², Tychele N. Turner³, Maria X. Sosa⁴, Niklas Krumm², Sumantra Chatterjee⁴, Ashish Kapoor⁴, Ashish Kapoor², Bradley P. Coe², Khanh-Dung H. Nguyen⁴, Khanh-Dung H. Nguyen³, Namrata Gupta³, Stacey Gabriel³, Evan E. Eichler², Courtney Berrios⁴, Aravinda Chakravarti⁵, Aravinda Chakravarti⁴ - Show less +13 more•Institutions (5)

Johns Hopkins University School of Medicine¹, University of Washington², Broad Institute³, Johns Hopkins University⁴, New York University⁵

10 Apr 2019-The New England Journal of Medicine

TL;DR: Among the patients in this study, Hirschsprung's disease arose from common noncoding variants, rare coding variants, and copy‐number variants affecting genes involved in enteric neural‐crest cell fate that exacerbate the widespread genetic susceptibility associated with RET.

...read moreread less

Abstract: Background Hirschsprung’s disease, or congenital aganglionosis, is a developmental disorder of the enteric nervous system and is the most common cause of intestinal obstruction in neonates...

...read moreread less

110 citations

Journal Article•DOI•

Erythrogene: a database for in-depth analysis of the extensive variation in 36 blood group systems in the 1000 Genomes Project

[...]

Mattias Möller¹, Magnus Jöud¹, Jill R. Storry¹, Martin L. Olsson¹•Institutions (1)

Lund University¹

27 Dec 2016-Blood Advances

TL;DR: A large-scale investigation into the blood group genotypes obtained by NGS in a multiethnic cohort is lacking and the established database deepens knowledge on blood group polymorphism globally and provides a long-sought platform for future research.

...read moreread less

110 citations

Journal Article•DOI•

3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome.

[...]

Shu Tadaka¹, Fumiki Katsuoka¹, Masao Ueki¹, Kaname Kojima¹, Satoshi Makino¹, Sakae Saito¹, Akihito Otsuki¹, Chinatsu Gocho¹, Mika Sakurai-Yageta¹, Inaho Danjoh¹, Ikuko N. Motoike¹, Yumi Yamaguchi-Kabata¹, Matsuyuki Shirota¹, Seizo Koshiba¹, Masao Nagasaki¹, Naoko Minegishi¹, Atsushi Hozawa¹, Shinichi Kuriyama¹, Atsushi Shimizu², Jun Yasuda¹, Nobuo Fuse¹, Gen Tamiya¹, Masayuki Yamamoto¹, Kengo Kinoshita - Show less +20 more•Institutions (2)

Tohoku University¹, Iwate Medical University²

18 Jun 2019-Human genome variation

TL;DR: A new database provides information on the frequency of genetic variations within 3552 Japanese individuals, and facilitates comparisons with other populations, and is the first large-scale panel providing the frequencies of variants present on the X chromosome and on the mitochondria in the Japanese population.

...read moreread less

Abstract: The first step towards realizing personalized healthcare is to catalog the genetic variations in a population. Since the dissemination of individual-level genomic information is strictly controlled, it will be useful to construct population-level allele frequency panels with easy-to-use interfaces. In the Tohoku Medical Megabank Project, we sequenced nearly 4000 individuals from a Japanese population and constructed an allele frequency panel of 3552 individuals after removing related samples. The panel is called the 3.5KJPNv2. It was constructed by using a standard pipeline including the 1KGP and gnomAD algorithms to reduce technical biases and to allow comparisons to other populations. Our database is the first large-scale panel providing the frequencies of variants present on the X chromosome and on the mitochondria in the Japanese population. All the data are available on our original database at https://jmorp.megabank.tohoku.ac.jp.

...read moreread less

109 citations

Journal Article•DOI•

Rare variant phasing and haplotypic expression from RNA sequencing with phASER

[...]

Stephane E. Castel, Pejman Mohammadi¹, Wendy K Chung¹, Yufeng Shen¹, Tuuli Lappalainen¹ - Show less +1 more•Institutions (1)

Columbia University¹

08 Sep 2016-Nature Communications

TL;DR: PhASER as mentioned in this paper is a fast and accurate method for variant phrasing from RNA-seq and genome sequencing data, which can be used for interpretation and analysis of allelic activity.

...read moreread less

Abstract: Genome interpretation and analysis of allelic activity requires appropriate haplotype phasing. Here the authors present phASER, a fast and accurate method for variant phrasing from RNA-seq and genome sequencing data.

...read moreread less

109 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
…
121
122
123
124
125
126
127
…
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations