scispace - formally typeset
Search or ask a question
Book ChapterDOI

Simultaneous Gene Selection and Cancer Classification Using a Hybrid Intelligent Water Drop Approach

TL;DR: The results, evaluated on three cancer datasets, demonstrate that the genes selected by the IWD technique yield classification accuracies comparable to previously reported algorithms.
Abstract: Computational Analysis of gene expression data is extremely difficult, due to the existence of a huge number of genes and less number of samples (limited number of patients) Thus,it is of significant importance to provide a subset of the most informative genesto a learning algorithm, for constructing robust prediction models In this study, we propose a hybrid Intelligent Water Drop (IWD) - Support Vector Machines (SVM) algorithm, with weighted gene ranking as a heuristic, for simultaneous gene subset selection and cancer prediction Our results, evaluated on three cancer datasets, demonstrate that the genes selected by the IWD technique yield classification accuracies comparable to previously reported algorithms

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Results indicate that the MRMC-IWD model can satisfactorily solve optimization problems using the divide-and-conquer strategy and is able to balance exploration and exploitation, but also to enable convergence towards the optimal solutions, by employing a local search method.

16 citations

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a hybrid feature selection method for microarray data processing, which combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations


"Simultaneous Gene Selection and Can..." refers methods in this paper

  • ...Three such datasets were obtained from the Kent Ridge Biomedical datasets repository [8] and the libSVM repository [7]....

    [...]

  • ...In particular, SVM with recursive feature elimination (RFE) was used by Vapnik et al [7] for gene selection and achieved notably high accuracy levels....

    [...]

  • ...Support Vector Machines (SVM) were introduced by Vapnik et al [5-6] and successively extended by a number of other researchers....

    [...]

Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations


"Simultaneous Gene Selection and Can..." refers methods in this paper

  • ...The heuristic information for each individual gene is obtained by calculating the weighted sum of the IG, CS and CFS scores which were obtained using the WEKA[4] data mining library....

    [...]

Proceedings ArticleDOI
01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

11,211 citations

Journal ArticleDOI
TL;DR: A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving and a number of "no free lunch" (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class.
Abstract: A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of "no free lunch" (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class. These theorems result in a geometric interpretation of what it means for an algorithm to be well suited to an optimization problem. Applications of the NFL theorems to information-theoretic aspects of optimization and benchmark measures of performance are also presented. Other issues addressed include time-varying optimization problems and a priori "head-to-head" minimax distinctions between optimization algorithms, distinctions that result despite the NFL theorems' enforcing of a type of uniformity over all algorithms.

10,771 citations


"Simultaneous Gene Selection and Can..." refers methods in this paper

  • ...According to the No-Free-Lunch Theorem [11], all metaheuristic based approaches report the same performance results when averaged over all possible objective functions....

    [...]