scispace - formally typeset
Search or ask a question

Showing papers by "Ying Hu published in 2000"


Journal ArticleDOI
TL;DR: The National Cancer Institute's CGAP-GAI (Cancer Genome Anatomy Project Genetic Annotation Initiative) group has identified 10,243 SNPs by examining publicly available EST chromatograms, and a set of comprehensive SNP maps containing single nucleotide polymorphisms in genes expressed in breast, colon, kidney, liver, lungs, lung, or prostate tissue is produced.
Abstract: SNPs (Single-Nucleotide Polymorphisms) are the most common form of DNA variation in humans. These variants occur at an estimated frequency of one per 1000 to 2000 base pairs (Cooper et al. 1995; Kwok et al. 1996; Wang et al. 1998; Cargill et al. 1999; Halushka et al. 1999), making it possible in principle to identify a genetic marker in every gene. A collection of tens or hundreds of thousands of SNPs would serve as a valuable resource for the discovery of genetic factors affecting disease susceptibility and resistance. These markers can be used in association studies that assay how alleles of candidate disease loci correlate with particular diseases (Lander and Schork 1994; Lander 1996; Risch and Merikangas 1996). Likewise, an extensive collection of SNPs will be useful for identifying genetic variants involved in drug metabolism (Meyer and Zanger 1997); this information will enable clinicians to determine which pharmacological agent is most effective for treating a given patient's condition, as well as which compounds are least likely to produce an adverse reaction. Because of their abundance, SNPs are the marker of choice for constructing high-resolution genetic maps used for linkage analysis (Lander and Schork 1994; Kruglyak 1997; Zhao et al. 1998) and positional cloning (Collins 1995). High-density genetic maps are essential for studying complex traits such as predisposition to hypertension, diabetes, or asthma or susceptibility to infectious diseases such as malaria or acquired immune deficiency syndrome. Dense SNP-based maps also will prove valuable for loss-of-heterozygosity studies (Cavenee et al. 1983), which have played a critical role in deciphering the genetic changes involved in cancer initiation and progression. Understanding the genetic events that lead from immortalization to metastasis will improve cancer diagnosis and may reveal common genetic changes in apparently unrelated tumor types, thereby suggesting new therapies for certain forms of cancer. Several large-scale SNP detection projects have been undertaken in recent years. The first, performed at the Whitehead Institute, was based on the hybridization of genomic PCR (Polymerase Chain Reaction) products to DNA oligonucleotide arrays (Wang et al. 1998). The Whitehead collection contains 3241 putative SNPs, 2227 of which have been placed on genetic maps. An alternative approach—examining high-throughput genomic sequence for nucleotide variants—was used by Taillon-Miller et al. (1998) to identify 153 potential SNPs in 200.6 kilobases of sequence from chromosomes 5, 7, and 13. More recently, SNP mining strategies based on the analysis of ESTs (Expressed Sequence Tags) have been described (Buetow et al. 1999; Picoult-Newberg et al. 1999). Because the high error rate in EST sequences (∼1%) makes it difficult to distinguish true genetic variants from sequencing artifacts, both Buetow et al. and Picoult-Newberg et al. used the basecalling program Phred (Ewing and Green 1998; Ewing et al. 1998) and the sequence assembly program Phrap (http://genome.washington.edu) to directly analyze EST sequencing traces. The two groups used different algorithms to filter out false-positives and validate predicted SNPs. The goal of the National Cancer Institute's Cancer Genome Anatomy Project (CGAP) is to provide a comprehensive catalog of molecular differences distinguishing tumorous cells from their normal counterparts. Within CGAP, the Genome Annotation Initiative (CGAP-GAI) group seeks to identify allelic variants of genes involved in cancer initiation and progression. In our most recent round of SNP discovery, we used the SNPpipeline, a set of sequence analysis tools described in Buetow et al. (1999), to identify more than 10,000 high-probability candidate single nucleotide polymorphisms among publicly available EST sequences. Information about this collection of SNPs is accessible via the internet (http://cgap.nci.nih.gov/GAI/). To present these SNPs in a format useful to the human genetics community, we have placed >6800 predicted variants on integrated genetic/physical maps. We have produced maps showing the locations of SNPs in genes expressed in the breast, colon, kidney, liver, lung, or prostate in addition to a comprehensive integrated map. We provide a Java-based SNP viewer that displays sequence polymorphisms in the context of DNA sequence alignments and a search engine that retrieves SNPs by keyword, description, or gene symbol. Each SNP is linked to the extensive annotation maintained by the National Center for Biotechnology Information (NCBI). Our SNP prediction tools are publicly available for noncommercial use.

47 citations