scispace - formally typeset
Open AccessJournal ArticleDOI

Evolutionary Population Dynamics and Grasshopper Optimization approaches for feature selection problems

TLDR
The comprehensive results and various comparisons reveal that the EPD has a remarkable impact on the efficacy of the GOA and using the selection mechanism enhanced the capability of the proposed approach to outperform other optimizers and find the best solutions with improved convergence trends.
Abstract
Searching for the optimal subset of features is known as a challenging problem in feature selection process. To deal with the difficulties involved in this problem, a robust and reliable optimization algorithm is required. In this paper, Grasshopper Optimization Algorithm (GOA) is employed as a search strategy to design a wrapper-based feature selection method. The GOA is a recent population-based metaheuristic that mimics the swarming behaviors of grasshoppers. In this work, an efficient optimizer based on the simultaneous use of the GOA, selection operators, and Evolutionary Population Dynamics (EPD) is proposed in the form of four different strategies to mitigate the immature convergence and stagnation drawbacks of the conventional GOA. In the first two approaches, one of the top three agents and a randomly generated one are selected to reposition a solution from the worst half of the population. In the third and fourth approaches, to give a chance to the low fitness solutions in reforming the population, Roulette Wheel Selection (RWS) and Tournament Selection (TS) are utilized to select the guiding agent from the first half. The proposed GOA_EPD approaches are employed to tackle various feature selection tasks. The proposed approaches are benchmarked on 22 UCI datasets. The comprehensive results and various comparisons reveal that the EPD has a remarkable impact on the efficacy of the GOA and using the selection mechanism enhanced the capability of the proposed approach to outperform other optimizers and find the best solutions with improved convergence trends. Furthermore, the comparative experiments demonstrate the superiority of the proposed approaches when compared to other similar methods in the literature.

read more

Content maybe subject to copyright    Report

ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Evolutionary Population Dynamics and
Grasshopper Optimization Approaches for
Feature Selection Problems
Majdi Mafarja
a
, Ibrahim Aljarah
b
, Ali Asghar Heidari
c
, Abdelaziz I.
Hammouri
d
, Hossam Faris
b
, Ala’ M. Al-Zoubi
b
, Seyedali Mirjalili
e
a
Department of Computer Science, Birzeit University, Birzeit, Palestine
mmafarja@birzeit.edu, mmafarjeh@gmail.com
b
King Abdullah II School for Information Technology, The University of Jordan,
Amman, Jordan
{hossam.faris,i.aljarah}@ju.edu.jo , alaah14@gmail.com
c
School of Surveying and Geospatial Engineering, University of Tehran, Tehran, Iran
as_heidari@ut.ac.ir
d
Department of Computer Information Systems, Al-Balqa Applied University, Al-Salt,
Jordan
Aziz@bau.edu.jo
e
School of Information and Communication Technology, Griffith University, Nathan,
Brisbane, QLD 4111, Australia
seyedali.mirjalili@griffithuni.edu.au
Abstract
Searching for the optimal subset of features is known as a challenging prob-
lem in feature selection process. To deal with the difficulties involved in this
problem, a robust and reliable optimization algorithm is required. In this
paper, grasshopper optimization algorithm (GOA) is employed as a search
strategy to design a wrapper-based feature selection method. The GOA is a
recent population-based metaheuristic that mimics the swarming behaviors of
grasshoppers. In this work, an efficient optimizer based on the simultaneous
use of the GOA, selection operators, and Evolutionary Population Dynamics
(EPD) is proposed in the form of four different strategies to mitigate the im-
mature convergence and stagnation drawbacks of the conventional GOA. In
the first two approaches, one of the top three agents and a randomly gener-
ated one are selected to reposition a solution from the worst half of the popu-
lation. In the third and fourth approaches, to give a chance to the low fitness
solutions in reforming the population, Roulette Wheel Selection (RWS) and
Preprint submitted to Knowledge-Based Systems December 30, 2017

ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Tournament Selection (TS) are utilized to select the guiding agent from the
first half. The proposed GOA_EPD approaches are employed to tackle var-
ious feature selection tasks. The proposed approaches are benchmarked on
22 UCI datasets. The comprehensive results and various comparisons reveal
that the EPD has a remarkable impact on the efficacy of the GOA and using
the selection mechanism enhanced the capability of the proposed approach
to outperform other optimizers and find the best solutions with improved
convergence trends. Furthermore, the comparative experiments demonstrate
the superiority of the proposed approaches when compared to other similar
methods in the literature.
Keywords: Grasshopper Optimization Algorithm, GOA, Feature Selection,
Classification, Metaheuristics, Evolutionary Population Dynamics, Binary
1. Introduction
The existence of thousands of applications of information systems compli-
cated the role of extracting useful information from the collected data [1, 2].
Data mining plays the main role in extracting the useful knowledge from the
collected datasets [3, 4]. The collected datasets may contain irrelevant and
redundant data. Feature selection (FS) is one of the major preprocessing
phases that aims to exclude the irrelevant/redundant data from the dataset
being processed [5, 6].
FS methods can be broadly categorized into three main classes: super-
vised [7], unsupervised [8], and semi-supervised methods [9]. Supervised FS
requires the availability of the class labels to select proper features and used
for classification problems. While in unsupervised FS, the class labels are not
required, and used for clustering tasks. On the other hand, semi-supervised
methods applied when part of the data is labeled.
There are several supervised, semi supervised, and unsupervised FS al-
gorithms in literature. To name a few, the correlation-based feature selec-
tion (CFS) [7], fast correlation-based filter (FCBF) [10], and wavelet power
spectrum (Spectrum) [11] are examples on supervised techniques. While
non-negative spectral learning and sparse regression-based dual-graph reg-
ularized (NSSRD) feature selection is one of the latest unsupervised tech-
niques proposed by Shang et al. in 2017 [8]. The subspace learning-based
graph regularized (SGFS) technique and self-representation based dual-graph
regularized feature selection clustering (DFSC) are also well-established FS
2

ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
techniques proposed by Shang et al. in 2016 [12, 13]. On the other hand,
feature selection via spectral analysis, and forward feature selection [9, 14]
are examples on semi-supervised FS.
FS process can be accomplished in four major steps [15]: subset genera-
tion, subset assessment, ending criterion, and validation. From the evaluation
perspectives, FS methods can be divided to two groups based on selection
strategy: wrapper-based and filter-based. In filter-based methods, the se-
lection of a subset is performed independently from the learning algorithm
(e.g., classification). The merits of a feature or a subset of them is estimated
with regard to specific characteristics of the info [16]. Examples of filter
models include Chi-Square [17], Information Gain (IG) [18], Gain Ratio [19],
and ReliefF [20]. In the wrapper-based methods, the goodness of a subset is
evaluated based on a learning algorithm [21]. Examples of wrapper models
include the LVW algorithm [22] and a neural network-based method [23].
Subset generation is considered as a search process to select a subset of
items from the initial set using complete, heuristic search, or a random search
[15, 24, 25]. The complete search generates all possible subsets to select the
best one. If the dataset includes n features, then 2
n
subsets will be generated
and assessed, which is computationally expensive for the larger size datasets.
Random search is another possible policy to select the attributes. It searches
for the next feature subset randomly [26]. The main drawback of the random
search strategy is that it may perform as a complete search in the worst case
[5, 27].
An alternative strategy to the previous two strategies is the heuristic
search. Heuristic search can be clarified as a ‘depth first’ search managed
by heuristics. According to Talbi [27], metaheuristic search methods can be
defined as “upper level general methodologies (templates) that can be used
as guiding strategies in designing underlying heuristics to solve specific opti-
mization problems”[27]. Various metaheuristics such as Grey wolf optimizer
(GWO) [28, 29], Whale Optimization Algorithm (WOA) [30], Ant Lion Opti-
mization (ALO) [31], Firefly Algorithm (FA) [32], Particle Swarm Optimiza-
tion (PSO) [33], and Ant Colony Optimization (ACO) [34] may demonstrate
superior efficiencies in tackling feature selection problems when compared to
the exact methods [35, 36]. Metaheuristic algorithms have shown improved
results and efficiencies in dealing with many real-life applications such as
path planning [37], clustering [38], and power dispatch [39]. For example,
E.S. Ali et al. applied the ALO to find the best location and sizing of re-
newable distributed generations [40]. Wu et al. utilized the WOA for path
3

ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
planning of solar-powered UAV [37]. Faris et al. also reviewed the recent
variants and applications of the GWO [41].The history of metaheuristics is
presented in [42].
The GOA is a new efficient nature-inspired population-based metaheuris-
tic algorithm [43] proposed by Saremi et al. in 2017 to inspire the idealized
swarming behaviors of grasshopper insects in nature. This algorithm can dis-
close improved results and efficiencies on global unconstrained/constrained
optimization and various real-life tasks. The basic GOA has been applied to
realize the best parameters of proton exchange membrane fuel cells (PEM-
FCs) stack and the results exposed the viability of the GOA-based algorithm
in dealing with the steady-state and dynamic models [44]. In 2017, Wu et
al. [45] proposed a dynamic GOA for optimizing the distributed trajectory
of UAVs in urban environments. They proved that this algorithm can attain
enhanced results and satisfactory trajectories. Tharwat et al. [46] devel-
oped a modified multi-objective GOA (MOGOA) with external archive for
constrained and unconstrained problems. Mirjalili et al. [47] also developed
the basic multi-objective GOA and revealed that the proposed algorithm can
tackle several benchmark problem, effectively and with better performance
in terms of accuracy of Pareto optimal solutions and the related distribution.
Although the metaheuristic algorithms do not guarantee finding the best
solution in all runs, they can find relatively accurate solutions in a reason-
able time [27, 48]. Metaheuristics can be classified into two main families;
single-solution and population-based algorithms [27]. In the former class
(e.g., Simulated Annealing), one solution is manipulated and transformed
during the search process, while a set of solutions is evolved in the former
class (e.g., PSO). Single-solution-based algorithms show more exploitative
behaviour; which means digging the space around a possible solution whereas
the population-based class are more explorative or a mix of both behavior;
which means exploring different regions of the space [27]. When designing
a metaheuristic algorithm, these two criteria should be taken into account.
High exploration decreases the quality of results and causes an unpromis-
ing convergence. This results in a failure to find the target global optimum.
However, high exploitation may cause the optimizer to be trapped in Local
Optima (LO).
Evolutionary algorithms (EA) are deep-rooted metaheuristics inspired by
natural processes [49, 50]. Genetic algorithms (GA), by J. H. Holland [51];
and evolutionary programming by L. Fogel et.al [52] are two different kind
of EA. In recent years, many EA are proposed to tackle the optimization
4

ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
problems especially in the field of feature selection [53, 54, 55]. Ant Colony
(AntRSAR) and Genetic Algorithm (GenRSAR) are two EAs that have been
proposed by Jensen and Shen [56, 57] and applied to FS problems. For
instance, a chaos-based genetic FS method (CGFSO) has been proposed in
[58]. Two hybrid approaches have been proposed in [59] between the GA and
Simulated Annealing (SA) and in [60] between the GA and Record to Record
algorithm. A Scatter Search-based approach (SSAR) proposed by Jue et al.
[61] is another EA-based FS method. Ant Lion Optimizer (ALO), a recent
well-regarded metaheuristic, proposed by S. Mirjalili in [62], was utilized
as a searching mechanism in a wrapper FS method in [63, 64]. A chaotic
ALO approach was proposed for FS in [65]. The GWO, as another recent
population-based optimizer [29], has been successfully employed to tackle
several applications like the tuning of fuzzy control systems [66]. It has been
applied to FS problems [67, 68] as well. Recently, a new wrapper-based FS
algorithm that uses a hybrid Whale Optimization algorithm (WOA) with SA
algorithm as a search method was proposed in [69].
EAs are modeled to mimic the evolution of individuals from their ini-
tial states to become better adapted to some objectives imposed upon them.
These revolutionary paradigms apply some evolutionary operators (muta-
tion and recombination in GA or pheromone updating rules of ACO) to
some selected individuals (based on some selection mechanisms; random,
tournament, and roulette wheel selection) in the population to generate an
offspring. However, these operators affect and manipulate individuals rather
that the whole population. Evolutionary Population Dynamics (EPD) is an-
other evolutionary operator that manipulates the whole population rather
than manipulating individuals [70]. Using this operator with EAs will omit
the worst individuals from the population rather than improving the best
individuals in the population (e.g., recombination in GA). Extremal opti-
mization (EO) [71] is a metaheuristic algorithm that works based on the idea
of EPD. The EO algorithm has been used in many research fields with much
success [72, 73, 74]. The EPD operator is the main feature that enhanced
the performance of this algorithm [28].
This paper presents an efficient GOA-based optimizer with EPD and
selection operators are proposed to improve the efficacy of the basic GOA
in dealing with FS tasks. In this work, we have made the following key
contributions:
The significant merits of the EPD operator motivated our attempts to
5

Citations
More filters
Journal ArticleDOI

Harris hawks optimization: Algorithm and applications

TL;DR: The statistical results and comparisons show that the HHO algorithm provides very promising and occasionally competitive results compared to well-established metaheuristic techniques.
Journal ArticleDOI

Chimp optimization algorithm

TL;DR: A novel metaheuristic algorithm inspired by the individual intelligence and sexual motivation of chimps in their group hunting, which is different from the other social predators, is proposed, which indicates that the ChOA outperforms the other benchmark optimization algorithms.
Journal ArticleDOI

An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems

TL;DR: Two new wrapper FS approaches that use SSA as the search strategy are proposed and it is observed that the proposed approach significantly outperforms others on around 90% of the datasets.
Journal ArticleDOI

A survey on new generation metaheuristic algorithms

TL;DR: In this survey, fourteen new and outstanding metaheuristics that have been introduced for the last twenty years other than the classical ones such as genetic, particle swarm, and tabu search are distinguished.
Journal ArticleDOI

Binary dragonfly optimization for feature selection using time-varying transfer functions

TL;DR: A wrapper-feature selection algorithm is proposed based on the Binary Dragonfly Algorithm based on time-varying S-shaped and V-shaped transfer functions to leverage the impact of the step vector on balancing exploration and exploitation.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book

Adaptation in natural and artificial systems

TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What contributions have the authors mentioned in the paper "Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems" ?

In this paper, grasshopper optimization algorithm ( GOA ) is employed as a search strategy to design a wrapper-based feature selection method. In this work, an efficient optimizer based on the simultaneous use of the GOA, selection operators, and Evolutionary Population Dynamics ( EPD ) is proposed in the form of four different strategies to mitigate the immature convergence and stagnation drawbacks of the conventional GOA. 

Future studies can focus on the application of the EPD strategy to other population-based optimizers. For future works, the authors intended to compare the proposed GOAEPD with different classes of FS methods in the field.