Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome
15 May 1998-Science (American Association for the Advancement of Science)-Vol. 280, Iss: 5366, pp 1077-1082
TL;DR: A large-scale survey for SNPs was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips, and a genetic map was constructed showing the location of 2227 candidate SNPs.
Abstract: Single-nucleotide polymorphisms (SNPs) are the most frequent type of variation in the human genome, and they provide powerful tools for a variety of medical genetic studies. In a large-scale survey for SNPs, 2.3 megabases of human genomic DNA was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips. A total of 3241 candidate SNPs were identified. A genetic map was constructed showing the location of 2227 of these SNPs. Prototype genotyping chips were developed that allow simultaneous genotyping of 500 SNPs. The results provide a characterization of human diversity at the nucleotide level and demonstrate the feasibility of large-scale identification of human SNPs.
Citations
More filters
••
TL;DR: This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs.
Abstract: 1. Introduction Designing PCR and sequencing primers are essential activities for molecular biologists around the world. This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs. 1–4. Primer3 is a computer program that suggests PCR primers for a variety of applications, for example to create STSs (sequence tagged sites) for radiation hybrid mapping (5), or to amplify sequences for single nucleotide polymor-phism discovery (6). Primer3 can also select single primers for sequencing reactions and can design oligonucleotide hybridization probes. In selecting oligos for primers or hybridization probes, Primer3 can consider many factors. These include oligo melting temperature, length, GC content , 3′ stability, estimated secondary structure, the likelihood of annealing to or amplifying undesirable sequences (for example interspersed repeats), the likelihood of primer–dimer formation between two copies of the same primer, and the accuracy of the source sequence. In the design of primer pairs Primer3 can consider product size and melting temperature, the likelihood of primer– dimer formation between the two primers in the pair, the difference between primer melting temperatures, and primer location relative to particular regions of interest or to be avoided.
16,407 citations
Cites background from "Large-Scale Identification, Mapping..."
...Primer3 is a computer program that suggests PCR primers for a variety of applications, for example to create STSs (sequence tagged sites) for radiation hybrid mapping (5), or to amplify sequences for single nucleotide polymorphism discovery (6)....
[...]
••
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
12,098 citations
••
Baylor College of Medicine1, Chinese Academy of Sciences2, Chinese National Human Genome Center3, University of Hong Kong4, The Chinese University of Hong Kong5, Hong Kong University of Science and Technology6, Illumina7, McGill University8, Washington University in St. Louis9, University of California, San Francisco10, Wellcome Trust Sanger Institute11, Beijing Normal University12, Health Sciences University of Hokkaido13, Shinshu University14, University of Tsukuba15, Howard University16, University of Ibadan17, Case Western Reserve University18, University of Utah19, Cold Spring Harbor Laboratory20, Johns Hopkins University21, University of Oxford22, North Carolina State University23, National Institutes of Health24, Massachusetts Institute of Technology25, Chinese Academy of Social Sciences26, Kyoto University27, Nagasaki University28, Wellcome Trust29, Genome Canada30, Foundation for the National Institutes of Health31, University of Maryland, Baltimore32, Vanderbilt University33, Stanford University34, New York University35, University of California, Berkeley36, University of Oklahoma37, University of New Mexico38, Université de Montréal39, University of California, Los Angeles40, University of Michigan41, University of Wisconsin-Madison42, London School of Economics and Political Science43, Genetic Alliance44, GlaxoSmithKline45, University of Washington46, Harvard University47, University of Chicago48, Fred Hutchinson Cancer Research Center49, University of Tokyo50
TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.
Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.
5,926 citations
••
TL;DR: The performance of the genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
Abstract: A dense set of single nucleotide polymorphisms (SNP) covering the genome and an efficient method to assess SNP genotypes are expected to be available in the near future. An outstanding question is how to use these technologies efficiently to identify genes affecting liability to complex disorders. To achieve this goal, we propose a statistical method that has several optimal properties: It can be used with case control data and yet, like family-based designs, controls for population heterogeneity; it is insensitive to the usual violations of model assumptions, such as cases failing to be strictly independent; and, by using Bayesian outlier methods, it circumvents the need for Bonferroni correction for multiple tests, leading to better performance in many settings while still constraining risk for false positives. The performance of our genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
3,130 citations
Additional excerpts
...SNP) throughout the human genome (Collins et al., 1998; Wang et al., 1998)....
[...]
••
TL;DR: This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
Abstract: We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
2,908 citations
Cites background from "Large-Scale Identification, Mapping..."
...SNPs occur (on average) every 1,000–2,000 bases when two human chromosomes are compare...
[...]
References
More filters
••
TL;DR: It is reported here how modern photolithographic techniques can be used to facilitate sequence analysis by generating miniaturized arrays of densely packed oligonucleotide probes, which can then be applied to parallel DNA hybridization analysis, directly yielding sequence information.
Abstract: In many areas of molecular biology there is a need to rapidly extract and analyze genetic information; however, current technologies for DNA sequence analysis are slow and labor intensive. We report here how modern photolithographic techniques can be used to facilitate sequence analysis by generating miniaturized arrays of densely packed oligonucleotide probes. These probe arrays, or DNA chips, can then be applied to parallel DNA hybridization analysis, directly yielding sequence information. In a preliminary experiment, a 1.28 x 1.28 cm array of 256 different octanucleotides was produced in 16 chemical reaction cycles, requiring 4 hr to complete. The hybridization pattern of fluorescently labeled oligonucleotide targets was then detected by epifluorescence microscopy. The fluorescence signals from complementary probes were 5-35 times stronger than those with single or double base-pair hybridization mismatches, demonstrating specificity in the identification of complementary sequences. This method should prove to be a powerful tool for rapid investigations in human genetics and diagnostics, pathogen detection, and DNA molecular recognition.
1,753 citations
••
TL;DR: A physical map has been constructed of the human genome containing 15,086 sequence-tagged sites (STSs), with an average spacing of 199 kilobases, anchored by the radiation hybrid and genetic maps.
Abstract: A physical map has been constructed of the human genome containing 15,086 sequence-tagged sites (STSs), with an average spacing of 199 kilobases. The project involved assembly of a radiation hybrid map of the human genome containing 6193 loci and incorporated a genetic linkage map of the human genome containing 5264 loci. This information was combined with the results of STS-content screening of 10,850 loci against a yeast artificial chromosome library to produce an integrated map, anchored by the radiation hybrid and genetic maps. The map provides radiation hybrid coverage of 99 percent and physical coverage of 94 percent of the human genome. The map also represents an early step in an international project to generate a transcript map of the human genome, with more than 3235 expressed sequences localized. The STSs in the map provide a scaffold for initiating large-scale sequencing of the human genome.
814 citations
••
TL;DR: In the first clinical application of high–density oligonucleotide arraysequencing, the sequences of 167 viral isolates from 102 patients have been determined and the DNA sequence of USA HIV–1 clade B proteases was found to be extremely variable.
Abstract: Naturally occurring mutations in HIV-1-infected patients have important implications for therapy and the outcome of clinical studies. However, little is known about the prevalence of mutations that confer resistance to HIV-1 protease inhibitors in isolates derived from patients naive for such inhibitors. In the first clinical application of high-density oligonucleotide array sequencing, the sequences of 167 viral isolates from 102 patients have been determined. The DNA sequence of USA HIV-1 clade B proteases was found to be extremely variable and 47.5% of the 99 amino acid positions varied. This level of amino acid diversity is greater than that previously known for all worldwide HIV-1 clades combined (40%). Many of the amino acid changes that are known to contribute to drug resistance occurred as natural polymorphisms in isolates from patients who had never received protease inhibitors.
678 citations
••
TL;DR: Fourteen of fifteen patient samples with known mutations were accurately diagnosed, and no false positive mutations were identified in 20 control samples, suggesting DNA chip–based assays may provide a valuable new technology for high–throughput cost–efficient detection of genetic alterations.
Abstract: The ability to scan a large gene rapidly and accurately for all possible heterozygous mutations in large numbers of patient samples will be critical for the future of medicine. We have designed high-density arrays consisting of over 96,600 oligonucleotides 20-nucleotides (nt) in length to screen for a wide range of heterozygous mutations in the 3.45-kilobases (kb) exon 11 of the hereditary breast and ovarian cancer gene BRCA1. Reference and test samples were co-hybridized to these arrays and differences in hybridization patterns quantitated by two-colour analysis. Fourteen of fifteen patient samples with known mutations were accurately diagnosed, and no false positive mutations were identified in 20 control samples. Eight single nucleotide polymorphisms were also readily detected. DNA chip-based assays may provide a valuable new technology for high-throughput cost-efficient detection of genetic alterations.
665 citations
••
TL;DR: How polymorphic and densely spaced biallelic markers need to be for extraction of most of the inheritance information from human pedigrees is examined, and a map of 700–900 moderately polymorphic bialLElic markers is concluded to be equivalent to the current 300–400 microsatellite marker sets.
Abstract: Improvements in genetic mapping techniques have driven recent progress in human genetics. The use of single nucleotide polymorphisms (SNPs) as biallelic genetic markers offers the promise of rapid, highly automated genotyping. As maps of SNPs and the techniques for genotyping them are being developed, it is important to consider what properties such maps must have in order for them to be useful for linkage studies. I examine how polymorphic and densely spaced biallelic markers need to be for extraction of most of the inheritance information from human pedigrees, and compare maps of biallelics with today's genome-scanning sets of microsatellite markers. I conclude that a map of 700-900 moderately polymorphic biallelic markers is equivalent--and a map of 1,500-3,000 superior--to the current 300-400 microsatellite marker sets.
532 citations