A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

doi:10.1038/35057149

Home
/
Papers
/
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

Journal Article•DOI•

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

Ravi Sachidanandam, David Weissman, Steven Schmidt, Jerzy M. Kakol, Lincoln Stein, Gabor T. Marth, Steve Sherry, James C. Mullikin, Beverley J. Mortimore, David Willey, Sarah E. Hunt, Charlotte G. Cole, Penny Coggill, Catherine M. Rice, Zemin Ning, Jane Rogers, David R. Bentley, Pui-Yan Kwok, Elaine R. Mardis, Raymond T. Yeh, Brian Schultz, Lisa Cook, Ruth Davenport, Michael Dante, Lucinda Fulton, LaDeana W. Hillier, Robert H. Waterston, John Douglas Mcpherson, Brian Gilman, Stephen F. Schaffner, William J. Van Etten, David Reich, John M. Higgins, Mark J. Daly, Brendan Blumenstiel, Jennifer Baldwin, Nicole Stange-Thomann, Michael C. Zody, Lauren Linton, Eric S. Lander, David Altshuler - Show less +37 more

15 Feb 2001-Nature (Nature Publishing Group)-Vol. 409, Iss: 6822, pp 928-933

TL;DR: This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

read less

Abstract: We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

The Human Genome Browser at UCSC

[...]

W. James Kent¹, Charles W. Sugnet¹, Terrence S. Furey¹, Krishna M. Roskin¹, Tom H. Pringle, Alan M. Zahler¹, and David Haussler¹ - Show less +3 more•Institutions (1)

University of California, Santa Cruz¹

01 Jun 2002-Genome Research

TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.

...read moreread less

Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

...read moreread less

9,605 citations

Journal Article•DOI•

A Map of Human Genome Variation From Population-Scale Sequencing

[...]

Gonçalo R. Abecasis¹, David Altshuler², David Altshuler³, Adam Auton⁴, Lisa D Brooks⁵, Richard Durbin⁶, Richard A. Gibbs⁷, Matthew E. Hurles⁶, Gil McVean⁴ - Show less +5 more•Institutions (7)

University of Michigan¹, Broad Institute², Harvard University³, University of Oxford⁴, Johns Hopkins University⁵, Wellcome Trust Sanger Institute⁶, Baylor College of Medicine⁷

28 Oct 2010-Nature

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.

...read moreread less

Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

...read moreread less

7,538 citations

Journal Article•DOI•

The International HapMap Project

[...]

John W. Belmont¹, Paul Hardenbol, Thomas D. Willis, Fuli Yu¹, Huanming Yang², Lan Yang Ch'Ang, Wei Huang³, Bin Liu², Yan Shen³, Paul K.H. Tam⁴, Lap-Chee Tsui⁴, Mary M.Y. Waye⁵, Jeffrey Tze Fei Wong⁶, Changqing Zeng², Qingrun Zhang², Mark S. Chee⁷, Luana Galver⁷, Semyon Kruglyak⁷, Sarah S. Murray⁷, Arnold Oliphant⁷, Alexandre Montpetit⁸, Fanny Chagnon⁸, Vincent Ferretti⁸, Martin Leboeuf⁸, Michael S. Phillips⁸, Andrei Verner⁸, Shenghui Duan⁹, Denise L. Lind¹⁰, Raymond D. Miller⁹, John P. Rice⁹, Nancy L. Saccone⁹, Patricia Taillon-Miller⁹, Ming Xiao¹⁰, Akihiro Sekine, Koki Sorimachi, Yoichi Tanaka, Tatsuhiko Tsunoda, Eiji Yoshino, David R. Bentley¹¹, Sarah E. Hunt¹¹, Don Powell¹¹, Houcan Zhang¹², Ichiro Matsuda¹³, Yoshimitsu Fukushima¹⁴, Darryl Macer¹⁵, Eiko Suda¹⁵, Charles N. Rotimi¹⁶, Clement Adebamowo¹⁷, Toyin Aniagwu¹⁷, Patricia A. Marshall¹⁸, Olayemi Matthew¹⁷, Chibuzor Nkwodimmah¹⁷, Charmaine D.M. Royal¹⁶, Mark Leppert¹⁹, Missy Dixon¹⁹, Fiona Cunningham²⁰, Ardavan Kanani²⁰, Gudmundur A. Thorisson²⁰, Peter E. Chen²¹, David J. Cutler²¹, Carl S. Kashuk²¹, Peter Donnelly²², Jonathan Marchini²², Gilean McVean²², Simon Myers²², Lon R. Cardon²², Andrew P. Morris²², Bruce S. Weir²³, James C. Mullikin²⁴, Michael Feolo²⁴, Mark J. Daly²⁵, Renzong Qiu²⁶, Alastair Kent, Georgia M. Dunston¹⁶, Kazuto Kato²⁷, Norio Niikawa²⁸, Jessica Watkin²⁹, Richard A. Gibbs¹, Erica Sodergren¹, George M. Weinstock¹, Richard K. Wilson⁹, Lucinda Fulton⁹, Jane Rogers¹¹, Bruce W. Birren²⁵, Hua Han², Hongguang Wang, Martin Godbout³⁰, John C. Wallenburg⁸, Paul L'Archevêque, Guy Bellemare, Kazuo Todani, Takashi Fujita, Satoshi Tanaka, Arthur L. Holden, Francis S. Collins²⁴, Lisa D. Brooks²⁴, Jean E. McEwen²⁴, Mark S. Guyer²⁴, Elke Jordan³¹, Jane Peterson²⁴, Jack Spiegel²⁴, Lawrence M. Sung³², Lynn F. Zacharia²⁴, Karen Kennedy²⁹, Michael Dunn²⁹, Richard Seabrook²⁹, Mark Shillito, Barbara Skene²⁹, John Stewart²⁹, David Valle²¹, Ellen Wright Clayton³³, Lynn B. Jorde¹⁹, Aravinda Chakravarti²¹, Mildred K. Cho³⁴, Troy Duster³⁵, Troy Duster³⁶, Morris W. Foster³⁷, Maria Jasperse³⁸, Bartha Maria Knoppers³⁹, Pui-Yan Kwok¹⁰, Julio Licinio⁴⁰, Jeffrey C. Long⁴¹, Pilar N. Ossorio⁴², Vivian Ota Wang³³, Charles N. Rotimi¹⁶, Patricia Spallone²⁹, Patricia Spallone⁴³, Sharon F. Terry⁴⁴, Eric S. Lander²⁵, Eric H. Lai⁴⁵, Deborah A. Nickerson⁴⁶, Gonçalo R. Abecasis⁴¹, David Altshuler⁴⁷, Michael Boehnke⁴¹, Panos Deloukas¹¹, Julie A. Douglas⁴¹, Stacey Gabriel²⁵, Richard R. Hudson⁴⁸, Thomas J. Hudson⁸, Leonid Kruglyak⁴⁹, Yusuke Nakamura⁵⁰, Robert L. Nussbaum²⁴, Stephen F. Schaffner²⁵, Stephen T. Sherry²⁴, Lincoln Stein²⁰, Toshihiro Tanaka - Show less +142 more•Institutions (50)

Baylor College of Medicine¹, Chinese Academy of Sciences², Chinese National Human Genome Center³, University of Hong Kong⁴, The Chinese University of Hong Kong⁵, Hong Kong University of Science and Technology⁶, Illumina⁷, McGill University⁸, Washington University in St. Louis⁹, University of California, San Francisco¹⁰, Wellcome Trust Sanger Institute¹¹, Beijing Normal University¹², Health Sciences University of Hokkaido¹³, Shinshu University¹⁴, University of Tsukuba¹⁵, Howard University¹⁶, University of Ibadan¹⁷, Case Western Reserve University¹⁸, University of Utah¹⁹, Cold Spring Harbor Laboratory²⁰, Johns Hopkins University²¹, University of Oxford²², North Carolina State University²³, National Institutes of Health²⁴, Massachusetts Institute of Technology²⁵, Chinese Academy of Social Sciences²⁶, Kyoto University²⁷, Nagasaki University²⁸, Wellcome Trust²⁹, Genome Canada³⁰, Foundation for the National Institutes of Health³¹, University of Maryland, Baltimore³², Vanderbilt University³³, Stanford University³⁴, University of California, Berkeley³⁵, New York University³⁶, University of Oklahoma³⁷, University of New Mexico³⁸, Université de Montréal³⁹, University of California, Los Angeles⁴⁰, University of Michigan⁴¹, University of Wisconsin-Madison⁴², London School of Economics and Political Science⁴³, Genetic Alliance⁴⁴, GlaxoSmithKline⁴⁵, University of Washington⁴⁶, Harvard University⁴⁷, University of Chicago⁴⁸, Fred Hutchinson Cancer Research Center⁴⁹, University of Tokyo⁵⁰

18 Dec 2003-Nature

TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.

...read moreread less

Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

...read moreread less

5,926 citations

Journal Article•DOI•

The Structure of Haplotype Blocks in the Human Genome

[...]

Stacey Gabriel¹, Stephen F. Schaffner¹, Huy Nguyen¹, Jamie Moore¹, Jessica Roy¹, Brendan Blumenstiel¹, John M. Higgins¹, Matthew DeFelice¹, Amy L. Lochner¹, Maura Faggart¹, Shau Neen Liu-Cordero¹, Charles N. Rotimi², Adebowale Adeyemo³, Richard S. Cooper⁴, Ryk Ward⁵, Eric S. Lander¹, Mark J. Daly¹, David Altshuler¹, David Altshuler⁶ - Show less +15 more•Institutions (6)

Massachusetts Institute of Technology¹, Howard University², University of Ibadan³, Loyola University Chicago⁴, University of Oxford⁵, Harvard University⁶

21 Jun 2002-Science

TL;DR: It is shown that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed.

...read moreread less

Abstract: Haplotype-based methods offer a powerful approach to disease gene mapping, based on the association between causal mutations and the ancestral haplotypes on which they arose. As part of The SNP Consortium Allele Frequency Projects, we characterized haplotype patterns across 51 autosomal regions (spanning 13 megabases of the human genome) in samples from Africa, Europe, and Asia. We show that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. The boundaries of blocks and specific haplotypes they contain are highly correlated across populations. We demonstrate that such haplotype frameworks provide substantial statistical power in association studies of common genetic variation across each region. Our results provide a foundation for the construction of a haplotype map of the human genome, facilitating comprehensive genetic association studies of human disease.

...read moreread less

5,634 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

...read moreread less

22,269 citations

Journal Article•DOI•

The Future of Genetic Studies of Complex Human Diseases

[...]

Neil Risch¹, Kathleen R. Merikangas²•Institutions (2)

Stanford University¹, Yale University²

13 Sep 1996-Science

TL;DR: The identification of the genetic basis of complex human diseases such as schizophrenia and diabetes has proven difficult as mentioned in this paper, and Risch and Merikangas proposed that they can best accomplish this goal by combining the power of the human genome project with association studies.

...read moreread less

Abstract: The identification of the genetic basis of complex human diseases such as schizophrenia and diabetes has proven difficult. In their Perspective, Risch and Merikangas propose that we can best accomplish this goal by combining the power of the human genome project with association studies, a method for determining the basis of a genetic disease.

...read moreread less

5,143 citations

Journal Article•DOI•

A greedy algorithm for aligning DNA sequences.

[...]

Zheng Zhang¹, Scott Schwartz, Lukas Wagner, Webb Miller•Institutions (1)

Pennsylvania State University¹

01 Feb 2000-Journal of Computational Biology

TL;DR: A new greedy alignment algorithm is introduced with particularly good performance and it is shown that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data.

...read moreread less

Abstract: For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

...read moreread less

4,628 citations

Journal Article•DOI•

Evolutionary relationship of dna sequences in finite populations

[...]

Fumio Tajima¹•Institutions (1)

University of Texas Health Science Center at Houston¹

01 Oct 1983-Genetics

TL;DR: These studies indicate that the estimates of the average number of nucleotide differences and nucleon diversity have a large variance, and a large part of this variance is due to stochastic factors.

...read moreread less

Abstract: With the aim of analyzing and interpreting data on DNA polymorphism obtained by DNA sequencing or restriction enzyme technique, a mathematical theory on the expected evolutionary relationship among DNA sequences (nucleons) sampled is developed under the assumption that the evolutionary change of nucleons is determined solely by mutation and random genetic drift. The statistical property of the number of nucleotide differences between randomly chosen nucleons and that of heterozygosity or nucleon diversity is investigated using this theory. These studies indicate that the estimates of the average number of nucleotide differences and nucleon diversity have a large variance, and a large part of this variance is due to stochastic factors. Therefore, increasing sample size does not help reduce the variance significantly The distribution of sample allele (nucleomorph) frequencies is also studied, and it is shown that a small number of samples are sufficient in order to know the distribution pattern.

...read moreread less

3,038 citations

"A map of human genome sequence vari..." refers background in this paper

...The time to the most recent common ancestor at a particular stretch of DNA is variable, and represents the opportunity for sequence divergence; thus, the expected pattern of heterozygosity is more heterogeneous than if every locus shared the same histor...
[...]

Journal Article•DOI•

Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome

[...]

David G. Wang¹, Jian-Bing Fan², Jian-Bing Fan¹, Chia-Jen Siao², Chia-Jen Siao¹, Anthony Berno¹, Anthony Berno², Peter M. Young¹, Peter M. Young², Ron Sapolsky², Ron Sapolsky¹, Ghassan Ghandour¹, Ghassan Ghandour², Nancy Perkins², Nancy Perkins¹, Ellen Winchester², Ellen Winchester¹, Jessica B. Spencer¹, Jessica B. Spencer², Leonid Kruglyak², Leonid Kruglyak¹, Lincoln Stein¹, Lincoln Stein², Linda Hsie¹, Linda Hsie², Thodoros Topaloglou¹, Thodoros Topaloglou², Earl Hubbell², Earl Hubbell¹, Elizabeth M. Robinson², Elizabeth M. Robinson¹, Michael P. Mittmann¹, Michael P. Mittmann², Macdonald S. Morris², Macdonald S. Morris¹, Naiping Shen¹, Naiping Shen², Dan Kilburn², Dan Kilburn¹, John D. Rioux², John D. Rioux¹, Chad Nusbaum², Chad Nusbaum¹, Steve Rozen¹, Steve Rozen², Thomas J. Hudson², Thomas J. Hudson¹, Robert J. Lipshutz¹, Robert J. Lipshutz², Mark S. Chee², Mark S. Chee¹, Eric S. Lander¹, Eric S. Lander² - Show less +49 more•Institutions (2)

Massachusetts Institute of Technology¹, Affymetrix²

15 May 1998-Science

TL;DR: A large-scale survey for SNPs was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips, and a genetic map was constructed showing the location of 2227 candidate SNPs.

...read moreread less

Abstract: Single-nucleotide polymorphisms (SNPs) are the most frequent type of variation in the human genome, and they provide powerful tools for a variety of medical genetic studies. In a large-scale survey for SNPs, 2.3 megabases of human genomic DNA was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips. A total of 3241 candidate SNPs were identified. A genetic map was constructed showing the location of 2227 of these SNPs. Prototype genotyping chips were developed that allow simultaneous genotyping of 500 SNPs. The results provide a characterization of human diversity at the nucleotide level and demonstrate the feasibility of large-scale identification of human SNPs.

...read moreread less

2,383 citations