scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

TL;DR: This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
Abstract: We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

9,605 citations

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations

Journal ArticleDOI
John W. Belmont1, Paul Hardenbol, Thomas D. Willis, Fuli Yu1, Huanming Yang2, Lan Yang Ch'Ang, Wei Huang3, Bin Liu2, Yan Shen3, Paul K.H. Tam4, Lap-Chee Tsui4, Mary M.Y. Waye5, Jeffrey Tze Fei Wong6, Changqing Zeng2, Qingrun Zhang2, Mark S. Chee7, Luana Galver7, Semyon Kruglyak7, Sarah S. Murray7, Arnold Oliphant7, Alexandre Montpetit8, Fanny Chagnon8, Vincent Ferretti8, Martin Leboeuf8, Michael S. Phillips8, Andrei Verner8, Shenghui Duan9, Denise L. Lind10, Raymond D. Miller9, John P. Rice9, Nancy L. Saccone9, Patricia Taillon-Miller9, Ming Xiao10, Akihiro Sekine, Koki Sorimachi, Yoichi Tanaka, Tatsuhiko Tsunoda, Eiji Yoshino, David R. Bentley11, Sarah E. Hunt11, Don Powell11, Houcan Zhang12, Ichiro Matsuda13, Yoshimitsu Fukushima14, Darryl Macer15, Eiko Suda15, Charles N. Rotimi16, Clement Adebamowo17, Toyin Aniagwu17, Patricia A. Marshall18, Olayemi Matthew17, Chibuzor Nkwodimmah17, Charmaine D.M. Royal16, Mark Leppert19, Missy Dixon19, Fiona Cunningham20, Ardavan Kanani20, Gudmundur A. Thorisson20, Peter E. Chen21, David J. Cutler21, Carl S. Kashuk21, Peter Donnelly22, Jonathan Marchini22, Gilean McVean22, Simon Myers22, Lon R. Cardon22, Andrew P. Morris22, Bruce S. Weir23, James C. Mullikin24, Michael Feolo24, Mark J. Daly25, Renzong Qiu26, Alastair Kent, Georgia M. Dunston16, Kazuto Kato27, Norio Niikawa28, Jessica Watkin29, Richard A. Gibbs1, Erica Sodergren1, George M. Weinstock1, Richard K. Wilson9, Lucinda Fulton9, Jane Rogers11, Bruce W. Birren25, Hua Han2, Hongguang Wang, Martin Godbout30, John C. Wallenburg8, Paul L'Archevêque, Guy Bellemare, Kazuo Todani, Takashi Fujita, Satoshi Tanaka, Arthur L. Holden, Francis S. Collins24, Lisa D. Brooks24, Jean E. McEwen24, Mark S. Guyer24, Elke Jordan31, Jane Peterson24, Jack Spiegel24, Lawrence M. Sung32, Lynn F. Zacharia24, Karen Kennedy29, Michael Dunn29, Richard Seabrook29, Mark Shillito, Barbara Skene29, John Stewart29, David Valle21, Ellen Wright Clayton33, Lynn B. Jorde19, Aravinda Chakravarti21, Mildred K. Cho34, Troy Duster35, Troy Duster36, Morris W. Foster37, Maria Jasperse38, Bartha Maria Knoppers39, Pui-Yan Kwok10, Julio Licinio40, Jeffrey C. Long41, Pilar N. Ossorio42, Vivian Ota Wang33, Charles N. Rotimi16, Patricia Spallone29, Patricia Spallone43, Sharon F. Terry44, Eric S. Lander25, Eric H. Lai45, Deborah A. Nickerson46, Gonçalo R. Abecasis41, David Altshuler47, Michael Boehnke41, Panos Deloukas11, Julie A. Douglas41, Stacey Gabriel25, Richard R. Hudson48, Thomas J. Hudson8, Leonid Kruglyak49, Yusuke Nakamura50, Robert L. Nussbaum24, Stephen F. Schaffner25, Stephen T. Sherry24, Lincoln Stein20, Toshihiro Tanaka 
18 Dec 2003-Nature
TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.
Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

5,926 citations

Journal ArticleDOI
21 Jun 2002-Science
TL;DR: It is shown that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed.
Abstract: Haplotype-based methods offer a powerful approach to disease gene mapping, based on the association between causal mutations and the ancestral haplotypes on which they arose. As part of The SNP Consortium Allele Frequency Projects, we characterized haplotype patterns across 51 autosomal regions (spanning 13 megabases of the human genome) in samples from Africa, Europe, and Asia. We show that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. The boundaries of blocks and specific haplotypes they contain are highly correlated across populations. We demonstrate that such haplotype frameworks provide substantial statistical power in association studies of common genetic variation across each region. Our results provide a foundation for the construction of a haplotype map of the human genome, facilitating comprehensive genetic association studies of human disease.

5,634 citations

References
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
13 Sep 1996-Science
TL;DR: The identification of the genetic basis of complex human diseases such as schizophrenia and diabetes has proven difficult as mentioned in this paper, and Risch and Merikangas proposed that they can best accomplish this goal by combining the power of the human genome project with association studies.
Abstract: The identification of the genetic basis of complex human diseases such as schizophrenia and diabetes has proven difficult. In their Perspective, Risch and Merikangas propose that we can best accomplish this goal by combining the power of the human genome project with association studies, a method for determining the basis of a genetic disease.

5,143 citations

Journal ArticleDOI
TL;DR: A new greedy alignment algorithm is introduced with particularly good performance and it is shown that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data.
Abstract: For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

4,628 citations

Journal ArticleDOI
01 Oct 1983-Genetics
TL;DR: These studies indicate that the estimates of the average number of nucleotide differences and nucleon diversity have a large variance, and a large part of this variance is due to stochastic factors.
Abstract: With the aim of analyzing and interpreting data on DNA polymorphism obtained by DNA sequencing or restriction enzyme technique, a mathematical theory on the expected evolutionary relationship among DNA sequences (nucleons) sampled is developed under the assumption that the evolutionary change of nucleons is determined solely by mutation and random genetic drift. The statistical property of the number of nucleotide differences between randomly chosen nucleons and that of heterozygosity or nucleon diversity is investigated using this theory. These studies indicate that the estimates of the average number of nucleotide differences and nucleon diversity have a large variance, and a large part of this variance is due to stochastic factors. Therefore, increasing sample size does not help reduce the variance significantly The distribution of sample allele (nucleomorph) frequencies is also studied, and it is shown that a small number of samples are sufficient in order to know the distribution pattern.

3,038 citations


"A map of human genome sequence vari..." refers background in this paper

  • ...The time to the most recent common ancestor at a particular stretch of DNA is variable, and represents the opportunity for sequence divergence; thus, the expected pattern of heterozygosity is more heterogeneous than if every locus shared the same histor...

    [...]

Journal ArticleDOI
15 May 1998-Science
TL;DR: A large-scale survey for SNPs was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips, and a genetic map was constructed showing the location of 2227 candidate SNPs.
Abstract: Single-nucleotide polymorphisms (SNPs) are the most frequent type of variation in the human genome, and they provide powerful tools for a variety of medical genetic studies. In a large-scale survey for SNPs, 2.3 megabases of human genomic DNA was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips. A total of 3241 candidate SNPs were identified. A genetic map was constructed showing the location of 2227 of these SNPs. Prototype genotyping chips were developed that allow simultaneous genotyping of 500 SNPs. The results provide a characterization of human diversity at the nucleotide level and demonstrate the feasibility of large-scale identification of human SNPs.

2,383 citations


"A map of human genome sequence vari..." refers background in this paper

  • ...SNPs occur (on average) every 1,000–2,000 bases when two human chromosomes are compare...

    [...]