scispace - formally typeset
Open AccessJournal ArticleDOI

Domain of competence of XCS classifier system in complexity measurement space

Ester Bernadó-Mansilla, +1 more
- 01 Feb 2005 - 
- Vol. 9, Iss: 1, pp 82-104
TLDR
This paper investigates the domain of competence of XCS by means of a methodology that characterizes the complexity of a classification problem by a set of geometrical descriptors, and focuses on XCS with hyperrectangle codification, which has been predominantly used for real-attributed domains.
Abstract
The XCS classifier system has recently shown a high degree of competence on a variety of data mining problems, but to what kind of problems XCS is well and poorly suited is seldom understood, especially for real-world classification problems. The major inconvenience has been attributed to the difficulty of determining the intrinsic characteristics of real-world classification problems. This paper investigates the domain of competence of XCS by means of a methodology that characterizes the complexity of a classification problem by a set of geometrical descriptors. In a study of 392 classification problems along with their complexity characterization, we are able to identify difficult and easy domains for XCS. We focus on XCS with hyperrectangle codification, which has been predominantly used for real-attributed domains. The results show high correlations between XCS's performance and measures of length of class boundaries, compactness of classes, and nonlinearities of decision boundaries. We also compare the relative performance of XCS with other traditional classifier schemes. Besides confirming the high degree of competence of XCS in these problems, we are able to relate the behavior of the different classifier schemes to the geometrical complexity of the problem. Moreover, the results highlight certain regions of the complexity measurement space where a classifier scheme excels, establishing a first step toward determining the best classifier scheme for a given classification problem.

read more

Content maybe subject to copyright    Report

1
Domain of Competence of XCS
Classifier System
in Complexity Measurement Space
Ester Bernad´o Mansilla
and Tin Kam Ho
Computer Engineering Department
Enginyeria i Arquitectura La Salle, Ramon Llull University
Quatre Camins, 2. 08022 Barcelona, Spain
E-mail: esterb@salleurl.edu Tel: +34 932 902 433
Computing Sciences Research Center, Bell Laboratories, Lucent Technologies
700 Mountain Avenue, 2C-425
Murray Hill, NJ 07974-0636 USA
E-mail: tkh@research.bell-labs.com Tel: +1 908 582 5989
Abstract
The XCS classifier system has recently shown a high degree of competence on a variety of data mining
problems. But to what kind of problems XCS is well and poorly suited is seldom understood, especially
for real-world classification problems. The major inconvenience has been attributed to the difficulty of
determining the intrinsic characteristics of real-world classification problems. This paper investigates the
domain of competence of XCS by means of a methodology that characterizes the complexity of a classi-
fication problem by a set of geometrical descriptors. In a study of 392 classification problems along with
their complexity characterization, we are able to identify difficult and easy domains for XCS. We focus
on XCS with hyperrectangle codification, which has been predominantly used for real-attributed domains.
The results show high correlations between XCS’s performance and measures of length of class boundaries,
compactness of classes and non-linearities of decision boundaries. We also compare the relative performance
of XCS with other traditional classifier schemes. Besides confirming the high degree of competence of XCS
in these problems, we are able to relate the behavior of the different classifier schemes to the geometrical
complexity of the problem. Moreover, the results highlight certain regions of the complexity measurement
space where a classifier scheme excels, establishing a first step towards determining the best classifier
scheme for a given classification problem.

2
Index Terms
Learning classifier systems, geometrical complexity, genetic algorithms, pattern recognition, machine
learning, classification.
I. INTRODUCTION
XCS [1], [2] is a classifier system that combines reinforcement learning [3] and genetic algorithms
(GA) [4], [5] to evolve a set of rules representing the target concept. XCS descends from the lineage of
learning classifier systems (LCS), which were first introduced by Holland [4], [6], [7]. Its success and
robust performance in a variety of domains establish XCS as one of the major developments of learning
classifier systems.
Recent investigations of XCS have been focused on a variety of aspects, with the common goal of
improving our understanding of the system. The result has often been better performance and wider
applicability in several domains. Some of these studies investigate XCS from an algorithmic point of
view. Some remarkable efforts in this direction are [8], [9]. They investigate the dynamics of the different
components of XCS, and how these components interact to evolve optimal rules that represent the target
concept in an accurate and compact way. Other studies focus on certain components of the algorithm,
such as the deletion algorithms [10], the definition of fitness [11], or the knowledge representation [12]–
[14]. Most of these studies are restricted to artificially designed problems, because they are easy to analyze
and allow control of their degree of complexity to some extent.
Other types of investigations focus on the applicability of XCS to real-world domains. In [15], XCS is
applied to a data mining problem with a high degree of performance, both in terms of accuracy rate
and explanatory capabilities. This study is extended in [16] with a varied set of data mining problems,
where XCS appears to perform competitively with respect to other classifier techniques, such as nearest
neighbors and decision trees. These studies provide a comparison of accuracy rates between XCS and
other classifier schemes
1
and show that XCS is competitive. However, this information is of limited use,
as there is a lack of a deeper understanding of the general behavior of XCS in real-world problems,
i.e., what types of problems XCS is well or poorly suited to, and the reasons of the success or failure
of XCS compared to other classifier schemes. Moreover, it is well known that there is not any classifier
scheme that globally dominates in every domain. Instead, there might be some types of problems where
a particular classifier scheme excels [17]. Therefore, a major issue of current research is to identify the
domains of competence of a particular classifier scheme. One of the major obstacles in this kind of
1
In the context of XCS, a classifier is a rule and a set of associated parameters. XCS evolves a set of such classifiers. The term
classifier is also used by the pattern recognition community to refer to the whole system that classifies (e.g., nearest neighbor
classifier, linear classifier, etc). In section II we use this term following the terminology used in XCS. In the rest of the paper the
term classifier is used in the sense of the whole system, unless properly indicated.

3
investigation is the difficulty of characterizing the differences between various real-world problems, and
relating the classifier’s behavior to such differences.
This paper is in line of recent efforts in the pattern recognition community to characterize the behavior
of a classifier system related to the features of the problem. We identify the features of a problem that
are most relevant to classification accuracy as measurements of the problem’s geometrical complexity.
The paper analyzes the domain of applicability of XCS in such a measurement space. In particular, the
paper addresses the following issues:
1) What is the complexity of a real-world classification problem? We will analyze the sources of difficulty
in a classification problem. We will focus on the description of the geometrical complexity of the
problem, such as the degree of clustering of the points of the same class, the proximity between the
classes, and other factors that are critical for classification accuracy. We emphasize that we consider
only those descriptors that can be extracted directly from the dataset. Therefore the analysis can be
applied to arbitrary real-world data without reliance on an assumption of the generating model.
2) Given a real-world classification problem characterized by a set of complexity descriptors, is XCS well suited
for that problem? That is, given a certain classification problem, is XCS applicable? Will XCS be
able to extract an accurate knowledge representation? This issue has special relevance when we
apply XCS to a real-world classification problem, where the intrinsic difficulties of the problem are
often unknown, and are difficult to separate from the inadequacy of the classification algorithm.
With the characterization of a real-world classification problem by a set of complexity descriptors
that are not directly dependent on the classifier, we are able to investigate the relation between
XCS’s performance and problem complexity. This study will identify which kind of problems XCS
is particularly well suited and poorly suited to. In order to infer conclusions which show the general
tendency of XCS in real-world problems, we will use an extended set of 392 problems. This gives
better coverage on the variety of problems than some previous studies on XCS’s performance [16]
that relied on a small set of problems (about 20 typically) or only artificial ones [18].
3) Given a classification problem, what is the best suited classifier scheme? As observed previously, there
is not an outstanding classifier scheme that dominates in all sorts of classification problems. There
are certain types of classification problems for which particular kinds of classifiers are best suited.
Thus, a central issue is to identify the good matches between classifier schemes and problems. We
will analyze XCS’s performance in comparison to other learning algorithms, and relate the results to
the complexity of the problem. Previous studies could hardly explain why XCS was better or worse
than a particular classifier, and in which cases this happened. Relating the respective performances
to the complexity characterization of each problem will give us a handle in understanding what
kind of problems are best suited for a particular classifier and, even more important, why.
This study is aimed at a better understanding of XCS’s behavior in real-world classification problems.

4
Detecting where XCS has difficulties may lead to improvement of the method. The study will also lay
the bases for the characterization of the problems in a space of complexity metrics, and the identification
of what kind of classifiers are more appropriate for certain regions of the measurement space.
The rest of this paper is structured as follows. First, we present a brief overview of XCS, describing
how the different components are designed to achieve the learning goals. Next, since XCS’s performance
depends both on the algorithmic components and the knowledge representation, we analyze the knowl-
edge representation used in domains with real valued attributes. We investigate some cases where XCS
encounters different levels of difficulty due to the geometry of the problem. Section IV analyzes the
sources of difficulty of real-world classification problems, and proposes several measures that represent
different aspects of the problem complexity. Next, we characterize the behavior of XCS with respect to the
complexity of the problem, and identify regions in the measurement space where the easiest problems and
the most difficult problems for XCS are located. Section VI compares XCS with other classifier schemes,
trying to determine the domains of competence of each classifier scheme. Finally, we present our main
conclusions and discuss future work.
II. DESCRIPTION OF XCS
XCS represents the knowledge extracted from the problem in a set of rules. This ruleset is incrementally
evaluated by means of interacting with the environment, through a reinforcement learning scheme, and is
improved by a search mechanism based on a genetic algorithm. The following is a brief description of
XCS. Although XCS is applicable to single-step and multi-step problems, we restrict our description of
XCS to single-step tasks like classification problems within the scope of this paper. For more details, the
reader is referred to [1] and [2] for an introduction of XCS, and [19] for an algorithmic description.
A. Representation
XCS evolves a population [P] of classifiers where each classifier has a rule and a set of associated
parameters estimating the quality of the rules. Each rule consists of a condition part and an action part:

. The condition specifies the set of input states where the classifier can be applied.
For binary inputs, the condition is usually represented in the ternary alphabet:

, where
is the
length of the input string. In this case, a condition

!
"#"$"$
%
matches an input example
'&
(&
!
"#"$"#)&
*%
,
if and only if
+
,-/.
&
-102-3.
. The symbol
, called don’t care, allows the formation of generalizations
in the rule’s condition. The action part of the rule specifies the action or class that the classifier proposes
when its condition is satisfied. It is coded as an integer.
Three main parameters estimate the quality of each classifier: a) the payoff prediction
4
, an estimate
of the payoff that the classifier will receive if its condition matches the input and its action is selected,
b) the prediction error
5
, which estimates the average error between the classifier’s prediction and the
received payoff and c) the fitness
6
, an estimate of the accuracy of the payoff prediction. There are other

5
parameters qualifying each classifier, such as: the experience of the classifier (denoted as exp), the average
size of the action sets where the classifier has participated (as), the time-step of the last application of
the genetic algorithm (ts) and the number of actual micro-classifiers this macroclassifier
2
represents, called
numerosity (num). These are described in the following.
B. Performance Component
At each time step, an input
&
is presented to the system. Given
&
, the system builds a match set [M],
which is formed by all the classifiers in [P] whose conditions are satisfied by the input example. If the
number of actions represented in [M] is less than a threshold


, then covering is triggered. Covering
creates new classifiers with a condition matching the current input and an action selected randomly
from those not present in [M]. From the resulting match set, an action must be selected and sent to
the environment. For this purpose, a payoff prediction

%
is computed for each action
in [M].

%
estimates the payoff that the system will receive if action
is chosen. It is computed as a fitness-weighted
average of the predictions of all classifiers proposing that action. The system chooses the winning action
based on these prediction values. The chosen action determines the action set [A] which consists of all
the classifiers in [M] advocating this action.
In classification, the winning action is usually selected using either pure explore mode or pure exploit
mode. In pure explore mode, the action is selected randomly. This makes sense during training, i.e., when
the system is learning the consequences of all possible actions for a given input. In pure exploit mode,
the action is selected deterministically according to the highest prediction. This is used in test, that is,
when the system classifies new unseen instances based on the knowledge it has acquired.
C. Reinforcement Component
Once the action is sent to the environment, the environment returns a reward
, which is used to update
the parameters of the classifiers in [A]. First, the prediction of each classifier is adjusted as follows:
4

4


4
%
(1)
where
(

) is the learning rate. Next, the prediction error is updated:
5

5

 
4

5
%
(2)
Then, the classifier’s accuracy is computed as an inverse function of the classifier’s error:
.

5
!
5
"
%
$#&%
5
('
5
"
otherwise
(3)
2
Classifiers in XCS are macroclassifiers, i.e., each classifier represents num micro-classifiers having identical conditions and actions
[19].

Figures
Citations
More filters
Journal ArticleDOI

KEEL: a software tool to assess evolutionary algorithms for data mining problems

TL;DR: KEEL as discussed by the authors is a software tool to assess evolutionary algorithms for data mining problems of various kinds including regression, classification, unsupervised learning, etc., which includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL.
Journal ArticleDOI

An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes

TL;DR: This work develops a double study, using different base classifiers in order to observe the suitability and potential of each combination within each classifier, and compares the performance of these ensemble techniques with the classifiers' themselves.
Journal ArticleDOI

A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

TL;DR: A study involving a set of techniques which can be used for doing a rigorous comparison among algorithms, in terms of obtaining successful classification models, and proposes the use of the most powerful non-parametric statistical tests to carry out multiple comparisons.
Journal ArticleDOI

Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

TL;DR: A set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance are proposed.
Journal ArticleDOI

Evolutionary rule-based systems for imbalanced data sets

TL;DR: This paper adapts and analyzes LCSs for challenging imbalanced data sets and establishes the bases for further studying the combination of re-sampling technique and learner best suited to a specific kind of problem.
References
More filters
Book

Genetic algorithms in search, optimization, and machine learning

TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book

Adaptation in natural and artificial systems

TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "Domain of competence of xcs classifier system in complexity measurement space" ?

This paper investigates the domain of competence of XCS by means of a methodology that characterizes the complexity of a classification problem by a set of geometrical descriptors. In a study of 392 classification problems along with their complexity characterization, the authors are able to identify difficult and easy domains for XCS. The authors also compare the relative performance of XCS with other traditional classifier schemes. 

The search component in XCS is responsible for improving the ruleset, by discovering new promising classifiers and deleting the ones that do not contribute to the knowledge. 

The role of the reinforcement component is to evaluate the current classifiers, so that highly fit classifiers correspond to consistent (accurate) descriptions of the target concept. 

Once the action is sent to the environment, the environment returns a reward , which is used to update the parameters of the classifiers in [A]. 

In XCS, this is achieved via a niche GA, by means of: a) the selection operator, which applies local fitness pressure within niches, b) crossover, which is restricted to related classifiers and c) deletion, which tries to balance the size of the niches. 

The fact that the GA takes place in the action sets rather than in the whole population produces a generalization pressure, which leads to the evolution of maximally general classifiers. 

If one of the classifier’s parents is sufficiently experienced ( & 4 !#"%$ , where7 !#"%$ is a threshold set by the user), accurate ( 5 5$" ) and more general than the classifier, then the classifier is discarded, and the parent’s numerosity is incremented by one. 

There are other5 parameters qualifying each classifier, such as: the experience of the classifier (denoted as exp), the average size of the action sets where the classifier has participated (as), the time-step of the last application of the genetic algorithm (ts) and the number of actual micro-classifiers this macroclassifier2 represents, called numerosity (num). 

The codification based on the ternary alphabet (as described in section II-A) has proved to be well suited for a varied range of domains with binary attributes. 

Three main parameters estimate the quality of each classifier: a) the payoff prediction 4 , an estimate of the payoff that the classifier will receive if its condition matches the input and its action is selected, b) the prediction error 5 , which estimates the average error between the classifier’s prediction and the received payoff and c) the fitness 6 , an estimate of the accuracy of the payoff prediction. 

Subsumption was introduced in [2] in order to eliminate some specialized classifiers from the population which were already covered by other accurate and more general classifiers. 

the prediction error is updated:5 5 4 5 % (2) Then, the classifier’s accuracy is computed as an inverse function of the classifier’s error:.