scispace - formally typeset
Search or ask a question
Book ChapterDOI

Rough sets for selection of functionally diverse genes from microarray data

19 Dec 2011-pp 477-484
TL;DR: The performance of the rough set based gene selection algorithm, along with a comparison with other gene selection methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on two cancer and one arthritis microarray data sets.
Abstract: Selection of reliable genes from a huge gene expression data containing high intergene correlation is essential to carry out a diagnostic test and successful treatment. In this regard, a rough set based gene selection algorithm is reported, which selects a set of genes by maximizing the relevance and significance of the selected genes. A gene ontology-based similarity measure is proposed to analyze the functional diversity of the selected genes. It also helps to analyze the effectiveness of different gene selection methods. The performance of the rough set based gene selection algorithm, along with a comparison with other gene selection methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on two cancer and one arthritis microarray data sets. An important finding is that the rough set based gene selection algorithm selects more functionally diverse set of genes than the existing algorithms.
References
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations

Journal ArticleDOI
TL;DR: A method that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements is described, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.
Abstract: Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes while accounting for the enormous number of genes. We describe a method, Significance Analysis of Microarrays (SAM), that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements. For genes with scores greater than an adjustable threshold, SAM uses permutations of the repeated measurements to estimate the percentage of genes identified by chance, the false discovery rate (FDR). When the transcriptional response of human cells to ionizing radiation was measured by microarrays, SAM identified 34 genes that changed at least 1.5-fold with an estimated FDR of 12%, compared with FDRs of 60 and 84% by using conventional methods of analysis. Of the 34 genes, 19 were involved in cell cycle regulation and 3 in apoptosis. Surprisingly, four nucleotide excision repair genes were induced, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.

12,102 citations