scispace - formally typeset
Search or ask a question
Journal ArticleDOI

In Silico Functional Assessment of Sequence Variations: Predicting Phenotypic Functions of Novel Variations

31 Dec 2008-Genomics & Informatics (Korea Genome Organization)-Vol. 6, Iss: 4, pp 166-172
TL;DR: This article surveys and compares variation databases and in silico prediction programs that assess the effects of CVs on protein function and introduces a combinatorial approach that uses machine learning algorithms to improve prediction performance.
Abstract: A multitude of protein-coding sequence variations (CVs) in the human genome have been revealed as a result of major initiatives, including the Human Variome Project, the 1000 Genomes Project, and the International Cancer Genome Consortium. This naturally has led to debate over how to accurately assess the functional consequences of CVs, because predicting the functional effects of CVs and their relevance to disease phenotypes is becoming increasingly important. This article surveys and compares variation databases and in silico prediction programs that assess the effects of CVs on protein function. We also introduce a combinatorial approach that uses machine learning algorithms to improve prediction performance.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article
TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.
Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

2,737 citations

References
More filters
Journal ArticleDOI
Paul Burton1, David Clayton2, Lon R. Cardon, Nicholas John Craddock3  +192 moreInstitutions (4)
07 Jun 2007-Nature
TL;DR: This study has demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in theBritish population is generally modest.
Abstract: There is increasing evidence that genome-wide association ( GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study ( using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined similar to 2,000 individuals for each of 7 major diseases and a shared set of similar to 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 X 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals ( including 58 loci with single-point P values between 10(-5) and 5 X 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

9,244 citations


"In Silico Functional Assessment of ..." refers background in this paper

  • ...The Swiss-Prot, dbSNP, and HapMap databases provide fundamental information on neutral polymorphisms....

    [...]

  • ...The Swiss-Prot/TrEMBL databases and PSI-BLAST were used for sequence alignment....

    [...]

  • ...Polymorphism and mutation databases Database Recent release date* Data type Features Website OMIM (Hamosh et al., 2005) Updated daily Deleterious mutations Full-text descriptions of published disease-causing variations http://www.ncbi.nlm.nih. gov/omim HGMD (Stenson et al., 2008) Sept 2008 Deleterious mutations Comprehensive collection of published disease-causing variations http://www.hgmd.cf.ac.uk LSDB in HGVS Nov 2008 Deleterious mutations Specialized collection of a particular gene or locus http://www.hgvs.org/dblist/ glsdb.html Swiss-Prot (Yip et al., 2004) Nov 2008 Deleterious mutations and neutral polymorphisms Well-summarized list of variations and corresponding proteins http://www.expasy.org/ cgi-bin/lists?humsavar.txt dbSNP (Sherry et al., 2001) Apr 2008 Neutral and (few) deleterious SNPs Broad collections of SNPs regardless of clinical associations (clinically associated SNPs linked to source sites) http://www.ncbi.nlm.nih. gov/projects/SNP dbGaP (Mailman et al., 2007) Nov 2008 Deleterious or pheno- type-affecting SNPs Collections of SNPs affecting clinical phenotypes or nonclinical traits http://www.ncbi.nlm.nih. gov/gap HapMap (Frazer et al., 2007) Oct 2008 Neutral SNPs and (very few) deleterious SNPs Collections of SNPs of 270 individuals randomly selected from African, Asian, and European populations http://www.hapmap.org *Accessed Nov 2008....

    [...]

  • ...…LSDB in HGVS Nov 2008 Deleterious mutations Specialized collection of a particular gene or locus http://www.hgvs.org/dblist/ glsdb.html Swiss-Prot (Yip et al., 2004) Nov 2008 Deleterious mutations and neutral polymorphisms Well-summarized list of variations and corresponding…...

    [...]

  • ...Random forest was trained with various features, such as structural information, sequence conservation, and SIFT prediction using a dataset from Swiss-Prot....

    [...]

Journal ArticleDOI
TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Abstract: In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K.Sirotkin (1999) Genome Res., 9, 677–679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp:// ncbi.nlm.nih.gov/snp/.

6,449 citations

Journal ArticleDOI
TL;DR: SIFT is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study and can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms.
Abstract: Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.

5,318 citations

Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations


"In Silico Functional Assessment of ..." refers background in this paper

  • ...…advancements in sequencing technologies, several studies have reported a number of sequence variations in certain cancers (Sjoblom et al., 2006; Greenman et al., 2007; Campbell et al., 2008; Jones et al., 2008), in which mutational patterns have differed greatly between patients with the same…...

    [...]

Journal ArticleDOI
26 Sep 2008-Science
TL;DR: It is found that pancreatic cancers contain an average of 63 genetic alterations, the majority of which are point mutations, which defined a core set of 12 cellular signaling pathways and processes that were each genetically altered in 67 to 100% of the tumors.
Abstract: There are currently few therapeutic options for patients with pancreatic cancer, and new insights into the pathogenesis of this lethal disease are urgently needed. Toward this end, we performed a comprehensive genetic analysis of 24 pancreatic cancers. We first determined the sequences of 23,219 transcripts, representing 20,661 protein-coding genes, in these samples. Then, we searched for homozygous deletions and amplifications in the tumor DNA by using microarrays containing probes for approximately 10(6) single-nucleotide polymorphisms. We found that pancreatic cancers contain an average of 63 genetic alterations, the majority of which are point mutations. These alterations defined a core set of 12 cellular signaling pathways and processes that were each genetically altered in 67 to 100% of the tumors. Analysis of these tumors' transcriptomes with next-generation sequencing-by-synthesis technologies provided independent evidence for the importance of these pathways and processes. Our data indicate that genetically altered core pathways and regulatory processes only become evident once the coding regions of the genome are analyzed in depth. Dysregulation of these core pathways and processes through mutation can explain the major features of pancreatic tumorigenesis.

3,721 citations


"In Silico Functional Assessment of ..." refers background in this paper

  • ...…in sequencing technologies, several studies have reported a number of sequence variations in certain cancers (Sjoblom et al., 2006; Greenman et al., 2007; Campbell et al., 2008; Jones et al., 2008), in which mutational patterns have differed greatly between patients with the same disease....

    [...]

  • ...Using recent advancements in sequencing technologies, several studies have reported a number of sequence variations in certain cancers (Sjoblom et al., 2006; Greenman et al., 2007; Campbell et al., 2008; Jones et al., 2008), in which mutational patterns have differed greatly between patients with the same disease....

    [...]

Related Papers (5)