scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Rough set based gene selection algorithm for microarray sample classification

01 Dec 2010-pp 7-13
TL;DR: A rough set based gene selection algorithm is presented that selects the set of genes by maximizing the relevance and significance of the genes, which are calculated based on the theory of rough sets.
Abstract: Gene selection from microarray data is an important issue for gene expression based classification and to carry out a diagnostic test In this regard, a rough set based gene selection algorithm is presented It selects the set of genes by maximizing the relevance and significance of the genes, which are calculated based on the theory of rough sets Using the predictive accuracy of K-nearest neighbor rule and support vector machine, the performance of the proposed algorithm, along with a comparison with other related methods is studied on five cancer and two arthritis microarray data sets Promising performance was achieved by the proposed gene selection algorithm with relevant and significant genes from microarray data set in a reasonable time
Citations
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: This review highlights three serious issues in the evaluation and benchmarking of multiclass classification of acute leukaemia, namely, conflicting criteria, evaluation criteria and criteria importance, and multicriteria decision-making (MCDM) analysis techniques were proposed as effective recommended solutions in the methodological aspect.
Abstract: This study aims to systematically review prior research on the evaluation and benchmarking of automated acute leukaemia classification tasks The review depends on three reliable search engines: ScienceDirect, Web of Science and IEEE Xplore A research taxonomy developed for the review considers a wide perspective for automated detection and classification of acute leukaemia research and reflects the usage trends in the evaluation criteria in this field The developed taxonomy consists of three main research directions in this domain The taxonomy involves two phases The first phase includes all three research directions The second one demonstrates all the criteria used for evaluating acute leukaemia classification The final set of studies includes 83 investigations, most of which focused on enhancing the accuracy and performance of detection and classification through proposed methods or systems Few efforts were made to undertake the evaluation issues According to the final set of articles, three groups of articles represented the main research directions in this domain: 56 articles highlighted the proposed methods, 22 articles involved proposals for system development and 5 papers centred on evaluation and comparison The other taxonomy side included 16 main and sub-evaluation and benchmarking criteria This review highlights three serious issues in the evaluation and benchmarking of multiclass classification of acute leukaemia, namely, conflicting criteria, evaluation criteria and criteria importance It also determines the weakness of benchmarking tools To solve these issues, multicriteria decision-making (MCDM) analysis techniques were proposed as effective recommended solutions in the methodological aspect This methodological aspect involves a proposed decision support system based on MCDM for evaluation and benchmarking to select suitable multiclass classification models for acute leukaemia The said support system is examined and has three sequential phases Phase One presents the identification procedure and process for establishing a decision matrix based on a crossover of evaluation criteria and acute leukaemia multiclass classification models Phase Two describes the decision matrix development for the selection of acute leukaemia classification models based on the integrated Best and worst method (BWM) and VIKOR Phase Three entails the validation of the proposed system

85 citations

Journal ArticleDOI
TL;DR: A systematic review of literature related to the detection and classification of acute leukaemia is aimed to help emphasise current research opportunities and thus extend and create additional research fields.

56 citations

Journal ArticleDOI
TL;DR: The KDD is an automated process of knowledge discovery from the original data that consists of many steps like data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation.
Abstract: 1. INTRODUCTIONmining is a process of knowledge discovery. The KDD is an automated process of knowledge discovery from the original data. The KDD consists of many steps like data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation. Among the steps the data selection is very much important to select the relevant feature and remove the irrelevant attributes. Classification is one of the datamining techniques used to discover the unknown class. The different classification methods in data mining are Bayesian classification (Statistical classifier), Decision tree induction, and Rule based classification (IF THEN Rule), Classification using Back propagation (Neural network algorithm), Support vector machine, Classification using Association Rule, k-nearest neighbor classifiers, casebased reasoning classifiers, Rough set approach, Genetic algorithm, Fuzzy set approach.

50 citations

Journal ArticleDOI
TL;DR: An extensive experimental comparison of the proposed method and other methods using four different classifiers and 22 different medical data sets confirm that the proposed MFFS strategy yields promising results on feature selection and classification accuracy for medical data mining field of research.

36 citations


Cites methods from "Rough set based gene selection algo..."

  • ...A new feature selection method based on rough set theory has been proposed by Paul and Maji [23]....

    [...]

References
More filters
Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations

Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations

Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Abstract: Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

12,530 citations


"Rough set based gene selection algo..." refers background in this paper

  • ...In functional genomics, the gene expression data is widely used for gene selection, clustering and classification of samples into cancer versus normal or in different types or subtypes of cancer [1]....

    [...]