Rough set based gene selection algorithm for microarray sample classification

doi:10.1109/ICM2CS.2010.5706710

Home
/
Papers
/
Rough set based gene selection algorithm for microarray sample classification

Proceedings Article•DOI•

Rough set based gene selection algorithm for microarray sample classification

Sushmita Paul¹, Pradipta Maji¹•Institutions (1)

Indian Statistical Institute¹

01 Dec 2010-pp 7-13

TL;DR: A rough set based gene selection algorithm is presented that selects the set of genes by maximizing the relevance and significance of the genes, which are calculated based on the theory of rough sets.

read less

Abstract: Gene selection from microarray data is an important issue for gene expression based classification and to carry out a diagnostic test In this regard, a rough set based gene selection algorithm is presented It selects the set of genes by maximizing the relevance and significance of the genes, which are calculated based on the theory of rough sets Using the predictive accuracy of K-nearest neighbor rule and support vector machine, the performance of the proposed algorithm, along with a comparison with other related methods is studied on five cancer and two arthritis microarray data sets Promising performance was achieved by the proposed gene selection algorithm with relevant and significant genes from microarray data set in a reasonable time

...read moreread less

Citations

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Systematic Review of an Automated Multiclass Detection and Classification System for Acute Leukaemia in Terms of Evaluation and Benchmarking, Open Challenges, Issues and Methodological Aspects

[...]

M. A. Alsalem¹, A. A. Zaidan¹, B. B. Zaidan¹, M. Hashim¹, Osamah Shihab Albahri¹, Ahmed Shihab Albahri¹, Ali Hadi¹, K. I. Mohammed¹ - Show less +4 more•Institutions (1)

Sultan Idris University of Education¹

19 Sep 2018-Journal of Medical Systems

TL;DR: This review highlights three serious issues in the evaluation and benchmarking of multiclass classification of acute leukaemia, namely, conflicting criteria, evaluation criteria and criteria importance, and multicriteria decision-making (MCDM) analysis techniques were proposed as effective recommended solutions in the methodological aspect.

...read moreread less

Abstract: This study aims to systematically review prior research on the evaluation and benchmarking of automated acute leukaemia classification tasks The review depends on three reliable search engines: ScienceDirect, Web of Science and IEEE Xplore A research taxonomy developed for the review considers a wide perspective for automated detection and classification of acute leukaemia research and reflects the usage trends in the evaluation criteria in this field The developed taxonomy consists of three main research directions in this domain The taxonomy involves two phases The first phase includes all three research directions The second one demonstrates all the criteria used for evaluating acute leukaemia classification The final set of studies includes 83 investigations, most of which focused on enhancing the accuracy and performance of detection and classification through proposed methods or systems Few efforts were made to undertake the evaluation issues According to the final set of articles, three groups of articles represented the main research directions in this domain: 56 articles highlighted the proposed methods, 22 articles involved proposals for system development and 5 papers centred on evaluation and comparison The other taxonomy side included 16 main and sub-evaluation and benchmarking criteria This review highlights three serious issues in the evaluation and benchmarking of multiclass classification of acute leukaemia, namely, conflicting criteria, evaluation criteria and criteria importance It also determines the weakness of benchmarking tools To solve these issues, multicriteria decision-making (MCDM) analysis techniques were proposed as effective recommended solutions in the methodological aspect This methodological aspect involves a proposed decision support system based on MCDM for evaluation and benchmarking to select suitable multiclass classification models for acute leukaemia The said support system is examined and has three sequential phases Phase One presents the identification procedure and process for establishing a decision matrix based on a crossover of evaluation criteria and acute leukaemia multiclass classification models Phase Two describes the decision matrix development for the selection of acute leukaemia classification models based on the integrated Best and worst method (BWM) and VIKOR Phase Three entails the validation of the proposed system

...read moreread less

85 citations

Journal Article•DOI•

A review of the automated detection and classification of acute leukaemia: Coherent taxonomy, datasets, validation and performance measurements, motivation, open challenges and recommendations.

[...]

M. A. Alsalem¹, A. A. Zaidan¹, B. B. Zaidan¹, M. Hashim¹, H.T. Madhloom¹, N.D. Azeez¹, S. Alsyisuf² - Show less +3 more•Institutions (2)

Sultan Idris University of Education¹, Management and Science University²

01 May 2018-Computer Methods and Programs in Biomedicine

TL;DR: A systematic review of literature related to the detection and classification of acute leukaemia is aimed to help emphasise current research opportunities and thus extend and create additional research fields.

...read moreread less

56 citations

Journal Article•DOI•

Analysis of Feature Selection Algorithms on Classification: A Survey

[...]

S. Vanaja, K. Ramesh Kumar

18 Jun 2014-International Journal of Computer Applications

TL;DR: The KDD is an automated process of knowledge discovery from the original data that consists of many steps like data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation.

...read moreread less

Abstract: 1. INTRODUCTIONmining is a process of knowledge discovery. The KDD is an automated process of knowledge discovery from the original data. The KDD consists of many steps like data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation. Among the steps the data selection is very much important to select the relevant feature and remove the irrelevant attributes. Classification is one of the datamining techniques used to discover the unknown class. The different classification methods in data mining are Bayesian classification (Statistical classifier), Decision tree induction, and Rule based classification (IF THEN Rule), Classification using Back propagation (Neural network algorithm), Support vector machine, Classification using Association Rule, k-nearest neighbor classifiers, casebased reasoning classifiers, Rough set approach, Genetic algorithm, Fuzzy set approach.

...read moreread less

50 citations

Journal Article•DOI•

Multi Filtration Feature Selection (MFFS) to improve discriminatory ability in clinical data set

[...]

S. Sasikala¹, S. Appavu alias Balamurugan², S. Geetha³•Institutions (3)

Anna University¹, College of Information Technology², Thiagarajar College of Engineering³

01 Jul 2016-Applied Computing and Informatics

TL;DR: An extensive experimental comparison of the proposed method and other methods using four different classifiers and 22 different medical data sets confirm that the proposed MFFS strategy yields promising results on feature selection and classification accuracy for medical data mining field of research.

...read moreread less

36 citations

Cites methods from "Rough set based gene selection algo..."

...A new feature selection method based on rough set theory has been proposed by Paul and Maji [23]....
[...]

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Pattern Classification and Scene Analysis.

[...]

Ulf Grenander, Richard O. Duda, Peter E. Hart

01 Sep 1974-Journal of the American Statistical Association

14,948 citations

Book•

Pattern classification and scene analysis

[...]

Richard O. Duda, Peter E. Hart

01 Jan 1973

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

13,647 citations

Journal Article•DOI•

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

[...]

Todd R. Golub¹, Todd R. Golub², Donna K. Slonim¹, Pablo Tamayo¹, Christine Huard¹, Michelle Gaasenbeek¹, Jill P. Mesirov¹, Hilary A. Coller¹, Mignon L. Loh², James R. Downing³, Michael A. Caligiuri⁴, Clara D. Bloomfield⁴, Eric S. Lander¹ - Show less +9 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², St. Jude Children's Research Hospital³, Ohio State University⁴

15 Oct 1999-Science

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

Abstract: Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

12,530 citations

"Rough set based gene selection algo..." refers background in this paper

...In functional genomics, the gene expression data is widely used for gene selection, clustering and classification of samples into cancer versus normal or in different types or subtypes of cancer [1]....
[...]