scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Particle swarm optimisation for feature selection in classification

01 May 2014-Vol. 18, pp 261-276
TL;DR: Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features.
Abstract: In classification, feature selection is an important data pre-processing technique, but it is a difficult problem due mainly to the large search space. Particle swarm optimisation (PSO) is an efficient evolutionary computation technique. However, the traditional personal best and global best updating mechanism in PSO limits its performance for feature selection and the potential of PSO for feature selection has not been fully investigated. This paper proposes three new initialisation strategies and three new personal best and global best updating mechanisms in PSO to develop novel feature selection approaches with the goals of maximising the classification performance, minimising the number of features and reducing the computational time. The proposed initialisation strategies and updating mechanisms are compared with the traditional initialisation and the traditional updating mechanism. Meanwhile, the most promising initialisation strategy and updating mechanism are combined to form a new approach (PSO(4-2)) to address feature selection problems and it is compared with two traditional feature selection methods and two PSO based methods. Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features. PSO(4-2) outperforms the two traditional methods and two PSO based algorithm in terms of the computational time, the number of features and the classification performance. The superior performance of this algorithm is due mainly to both the proposed initialisation strategy, which aims to take the advantages of both the forward selection and backward selection to decrease the number of features and the computational time, and the new updating mechanism, which can overcome the limitations of traditional updating mechanisms by taking the number of features into account, which reduces the number of features and the computational time.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms.
Abstract: Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation (EC) techniques have recently gained much attention and shown some success. However, there are no comprehensive guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.

1,237 citations

Journal ArticleDOI
TL;DR: This study discusses several frequently-used evaluation measures for feature selection, and surveys supervised, unsupervised, and semi-supervised feature selection methods, which are widely applied in machine learning problems, such as classification and clustering.

1,057 citations

Journal ArticleDOI
TL;DR: A new wrapper feature selection approach is proposed based on Whale Optimization Algorithm based on the influence of using the Tournament and Roulette Wheel selection mechanisms instead of using a random operator in the searching process to search the optimal feature subsets for classification purposes.

534 citations

Journal ArticleDOI
TL;DR: In this paper, semi-supervised feature selection methods are fully investigated and two taxonomies of these methods are presented based on two different perspectives which represent the hierarchical structure of semi- supervised feature Selection methods.

371 citations

Journal ArticleDOI
TL;DR: Binary variants of the ant lion optimizer (ALO) are proposed and used to select the optimal feature subset for classification purposes in wrapper-mode and prove the capability of the proposed binary algorithms to search the feature space for optimal feature combinations regardless of the initialization and the used stochastic operators.

357 citations


Cites background from "Particle swarm optimisation for fea..."

  • ...But, when the best feature subset contains a relatively large number of features, backward selection has a larger chance to obtain the best solution [41]....

    [...]

References
More filters
Proceedings ArticleDOI
06 Aug 2002
TL;DR: A concept for the optimization of nonlinear functions using particle swarm methodology is introduced, and the evolution of several paradigms is outlined, and an implementation of one of the paradigm is discussed.
Abstract: A concept for the optimization of nonlinear functions using particle swarm methodology is introduced. The evolution of several paradigms is outlined, and an implementation of one of the paradigms is discussed. Benchmark testing of the paradigm is described, and applications, including nonlinear function optimization and neural network training, are proposed. The relationships between particle swarm optimization and both artificial life and genetic algorithms are described.

35,104 citations

Journal ArticleDOI
TL;DR: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point.
Abstract: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point. The simplex adapts itself to the local landscape, and contracts on to the final minimum. The method is shown to be effective and computationally compact. A procedure is given for the estimation of the Hessian matrix in the neighbourhood of the minimum, needed in statistical estimation problems.

27,271 citations


"Particle swarm optimisation for fea..." refers methods in this paper

  • ...Since NSM is slow and can be applied only to the problems with low dimensionality, this initialisation method may not be appropriate for feature selection problems, where the dimensionality is typically large....

    [...]

  • ...Parsopoulos and Vrahatis [33] use the Nonlinear Simplex method (NSM) [34] to generate initial particles in PSO....

    [...]

  • ...Experiments show that the NSM based initialisation can improve the performance of PSO on 14 benchmark functions....

    [...]

Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations

01 Jan 2007

17,341 citations

Journal ArticleDOI
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

14,509 citations


"Particle swarm optimisation for fea..." refers background in this paper

  • ...However, irrelevant and redundant features are not useful for classification and they may even reduce the classification performance due to the large search space, which is termed “the curse of dimensionality” [1, 2]....

    [...]

  • ...The size of the search space increases exponentially with respect to the number of available features in the dataset [1]....

    [...]

  • ...There could be two-way or multi-way interactions among features [1, 5]....

    [...]

  • ...Although many different search techniques have been applied to feature selection, most of these algorithms still suffer from the problems of stagnation in local optima or being computationally expensive [2, 1, 4]....

    [...]