scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems With Genetic Rule Selection and Lateral Tuning

01 Oct 2011-IEEE Transactions on Fuzzy Systems (IEEE)-Vol. 19, Iss: 5, pp 857-872
TL;DR: This method limits the order of the associations in the association rule extraction and considers the use of subgroup discovery, which is based on an improved weighted relative accuracy measure to preselect the most interesting rules before a genetic postprocessing process for rule selection and parameter tuning.
Abstract: The inductive learning of fuzzy rule-based classification systems suffers from exponential growth of the fuzzy rule search space when the number of patterns and/or variables becomes high. This growth makes the learning process more difficult and, in most cases, it leads to problems of scalability (in terms of the time and memory consumed) and/or complexity (with respect to the number of rules obtained and the number of variables included in each rule). In this paper, we propose a fuzzy association rule-based classification method for high-dimensional problems, which is based on three stages to obtain an accurate and compact fuzzy rule-based classifier with a low computational cost. This method limits the order of the associations in the association rule extraction and considers the use of subgroup discovery, which is based on an improved weighted relative accuracy measure to preselect the most interesting rules before a genetic postprocessing process for rule selection and parameter tuning. The results that are obtained more than 26 real-world datasets of different sizes and with different numbers of variables demonstrate the effectiveness of the proposed approach.

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 5, OCTOBER 2011 857
A Fuzzy Association Rule-Based Classification
Model for High-Dimensional Problems With
Genetic Rule Selection and Lateral Tuning
Jes
´
us Alcal
´
a-Fdez, Rafael Alcal
´
a, and Francisco Herrera, Member, IEEE
Abstract—The inductive learning of fuzzy rule-based classifi-
cation systems suffers from exponential growth of the fuzzy rule
search space when the number of patterns and/or variables be-
comes high. This growth makes the learning process more difficult
and, in most cases, it leads to problems of scalability (in terms of the
time and memory consumed) and/or complexity (with respect to
the number of rules obtained and the number of variables included
in each rule). In this paper, we propose a fuzzy association rule-
based classification method for high-dimensional problems, which
is based on three stages to obtain an accurate and compact fuzzy
rule-based classifier with a low computational cost. This method
limits the order of the associations in the association rule extrac-
tion and considers the use of subgroup discovery, which is based
on an improved weighted relative accuracy measure to preselect
the most interesting rules before a genetic postprocessing process
for rule selection and parameter tuning. The results that are ob-
tained more than 26 real-world datasets of different sizes and with
different numbers of variables demonstrate the effectiveness of the
proposed approach.
Index Terms—Associative classification, classification, data min-
ing, fuzzy association rules, genetic algorithms (GAs), genetic fuzzy
rule selection, high-dimensional problems.
I. INTRODUCTION
F
UZZY rule-based classification systems (FRBCSs) [1], [2]
are useful and well-known tools in the machine learning
framework, since they can provide an interpretable model for
the end user [3]–[6]. There are many real applications in which
FRBCSs have been employed, including anomaly intrusion de-
tection [7], image processing [8], among others. In most of these
areas, the available or useful data consist of a high number of
patterns (instances or examples) and/or variables. In this situa-
tion, the inductive learning of FRBCSs suffers from exponential
growth of the fuzzy rule search space. This growth makes the
learning process more difficult, and in most cases, it leads to
problems of scalability (in terms of the time and memory con-
Manuscript received September 30, 2010; revised February 2, 2011; accepted
April 12, 2011. Date of publication April 29, 2011; date of current version
October 10, 2011. This work was supported in part by the Spanish Ministry of
Education and Science under Grant TIN2008-06681-C06-01.
The authors are with the Department of Computer Science and Artificial
Intelligence, Research Center on Information and Communications Technol-
ogy (CITIC-UGR), University of Granada, 18071 Granada, Spain (e-mail:
jalcala@decsai.ugr.es; alcala@decsai.ugr.es; herrera@decsai.ugr.es).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TFUZZ.2011.2147794
sumed) and/or complexity (with respect to the number of rules
obtained and the number of variables included in each rule)
[9], [10].
Association discovery is one of the most common data min-
ing techniques that are used to extract interesting knowledge
from large datasets [11]. Much effort has been made to use its
advantages for classification under the name of associative clas-
sification [12]–[19]. Association discovery aims to find interest-
ing relationships between the different items in a database [20],
while classification aims to discover a model from training data
that can be used to predict the class of test patterns [21]. Both
association discovery and classification rules mining are essen-
tial in practical data mining applications [11], [22], and their
integration could result in greater savings and convenience for
the user.
A typical associative classification system is constructed in
two stages:
1) discovering the association rules inherent in a database;
2) selecting a small set of relevant association rules to con-
struct a classifier.
In order to enhance the interpretability of the obtained clas-
sification rules and to avoid unnatural boundaries in the parti-
tioning of the attributes, different studies have been presented
to obtain classification systems, which is based on fuzzy asso-
ciation rules [23]–[28]. For instance, in [24], the authors have
made use of a genetic algorithm (GA) [29], [30] to automat-
ically determine minimum support and confidence thresholds,
mining for each chromosome a fuzzy rule set for classification
by means of an algorithm, which is based on the Apriori al-
gorithm [31], and adjusting the fuzzy confidence of these rules
with the approach that was proposed by Nozaki et al. in [32].
Consequently, this approach can only be used for small prob-
lems since its computational cost is very high when we consider
problems that consist of a high number of patterns and/or vari-
ables. On the other hand, in [25], the authors used an algorithm
that is based on the Apriori algorithm to mine association rules
only up to a certain level and to select the K most confident
ones for each class among them, in order to finally employ a ge-
netic rule-selection method that obtains a classifier from them.
However, many patterns may be uncovered if we only consider
the confidence measure to select the candidate rules.
In this paper, we present a fuzzy association rule-based classi-
fication method for high-dimensional problems (FARC-HD) to
obtain an accurate and compact fuzzy rule-based classifier with
a low computational cost. This method is based on the following
three stages:
1063-6706/$26.00 © 2011 IEEE

858 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 5, OCTOBER 2011
1) Fuzzy association rule extraction for classification:A
search tree is employed to list all possible frequent fuzzy
item sets and to generate fuzzy association rules for
classification, limiting the depth of the branches in or-
der to find a small number of short (i.e., simple) fuzzy
rules.
2) Candidate rule prescreening: Even though the order of the
associations is limited in the association rule extraction,
the number of rules generated can be very large. In order
to decrease the computational cost of the genetic postpro-
cessing stage, we consider the use of subgroup discovery
based on an improved weighted relative accuracy mea-
sure (wWRAcc
) to preselect the most interesting rules by
means of a pattern weighting scheme [33].
3) Genetic rule selection and lateral tuning: Finally, we
make use of GAs to select and tune a compact set of
fuzzy association rules with high classification accuracy
in order to consider the known positive synergy that both
techniques present (selection and tuning). Several works
have successfully combined the selection of rules with
the tuning of membership functions (MFs) within the
same process [34], [35], taking advantage of the possi-
bility of different coding schemes that GAs provide. The
successful application of GAs to identify fuzzy systems
has led to the so-called genetic fuzzy systems (GFSs)
[36]–[38].
In order to assess the performance of the proposed approach,
we have used 26 real-world datasets with a number of vari-
ables ranging from 4 to 90 and a number of patterns ranging
from 150 to 19 020. We have developed the following studies.
First, we have shown the results that are obtained from com-
parison with three other GFSs [38]. Second, we have compared
the performance of our approach with two approaches to obtain
fuzzy associative classifiers. Third, we have shown the results
that are obtained from the comparison with four other classical
approaches for associative classification and with the C4.5 deci-
sion tree [39]. Furthermore, in these studies, we have made use
of some nonparametric statistical tests for pairwise and multi-
ple comparison [40]–[43] of the performance of these classifiers.
Then, we have shown a study on the influence of the depth of
the trees and the number of evaluations in the genetic selection
and tuning process. Finally, we have analyzed the scalability of
the proposed approach.
This paper is arranged as follows. Section II introduces the
type of rules, rule weights, and inference model, which are used,
and the basic definitions for fuzzy association rules and asso-
ciative classification. Section III describes in detail each stage
of the proposed approach. Section IV presents the experimental
setup. Section V shows and discusses the results that are ob-
tained on 26 real-world datasets. Finally, in Section VI, some
concluding remarks are made.
II. P
RELIMINARIES
In this section, we first describe FRBCSs. Then, we introduce
the basic definitions for fuzzy association rules. Finally, we
present fuzzy association rules for classification.
A. Fuzzy Rule-Based Classification Systems
Any classification problem consists of N training patterns,
i.e., x
p
=(x
p1
,...,x
pm
), p =1, 2,...,N,fromS classes,
where x
pi
is the ith attribute value (i =1, 2,...,m)ofthe
pth training pattern. In this paper, we use fuzzy rules of the
following form for our classifier:
Rule R
j
: IF x
1
is A
j1
and ··· and x
m
is A
jm
THEN Class = C
j
with RW
j
where R
j
is the label of the jth rule, x =(x
1
,...,x
m
) is an
m-dimensional pattern vector, A
ji
is an antecedent fuzzy set,
C
j
is a class label, and RW
j
is the rule weight.
The rule weight of each fuzzy rule R
j
has a great effect on the
performance of fuzzy rule-based classifiers [44]. Different spec-
ifications of the rule weight have been proposed and examined
in the literature. In [45], we can find some heuristic methods
for rule weight specification. In this paper, we employ the most
common one, i.e., the fuzzy confidence value or certainty factor
(CF) [46]:
RW
j
= CF
j
=
x
p
ClassC
j
μ
A
j
(x
p
)
N
p=1
μ
A
j
(x
p
)
(1)
where μ
A
j
(x
p
) is the matching degree of the pattern x
p
with the
antecedent part of the fuzzy rule R
j
. We use the fuzzy reasoning
method of the weighted vote or additive combination [46] to
classify new patterns by the rule base (RB). With this method,
each fuzzy rule casts a vote for its consequent class. The total
strength of the vote for each class is computed as follows:
V
Class
h
(x
p
)=
R
j
RB; C
j
=h
μ
A
j
(x
p
) · CF
j
h =1, 2,...,S, R
j
RB. (2)
The new pattern x
p
is classified as the class with the maximum
total strength of the vote. If multiple class labels have the same
maximum value for x
p
or no fuzzy rule is compatible with x
p
,
this pattern is classified as the class with most patterns in the
training data.
B. Fuzzy Association Rules
Association rules are used to represent and identify depen-
dences between items in a database [11], [20]. They are expres-
sions of the type A B, where A and B are sets of items,
and A B = . This means that if all the items in A exist in a
transaction, then all the items in B with a high probability are
also in the transaction, and A and B should have no common
items [31]. There are many previous studies to mine association
rules that are focused on databases with binary or discrete val-
ues; however, data in real-world applications usually consist of
quantitative values. Designing data mining algorithms, which
are able to deal with various types of data, presents a challenge
to workers in this research field.
Fuzzy set theory has been used more and more frequently in
data mining because of its simplicity and similarity to human
reasoning [1]. The use of fuzzy sets to describe associations

ALCAL
´
A-FDEZ et al.: FUZZY ASSOCIATION RULE-BASED CLASSIFICATION MODEL FOR HIGH-DIMENSIONAL PROBLEMS 859
Fig. 1. Attributes and linguistic terms for the attributes X
1
and X
2
.
between data extends the types of relationships that may be
represented, facilitates the interpretation of rules in linguistic
terms, and avoids unnatural boundaries in the partitioning of
the attribute domains. For this reason, in recent years, different
studies have proposed methods to mine fuzzy association rules
from quantitative data [47]–[54].
Let us consider a simple database T with two attributes (X
1
and X
2
) and three linguistic terms with their associated MFs
(see Fig. 1). Based on this definition, a simple example of fuzzy
association rule is that X
1
is Middle X
2
is High.
Support and confidence are the most common measures of
interest of an association rule. These measures can be defined
for fuzzy association rules as follows:
Support(A B)=
x
p
T
μ
AB
(x
p
)
|N|
(3)
Confidence(A B)=
x
p
T
μ
AB
(x
p
)
x
p
T
μ
A
(x
p
)
(4)
where |N| is the number of transactions in T , μ
A
(x
p
) is the
matching degree of the transaction x
p
with the antecedent part of
the rule, and μ
AB
(x
p
) is the matching degree of the transaction
x
p
with the antecedent and consequent of the rule.
C. Fuzzy Association Rules for Classification
Over the past few years, different studies have proposed meth-
ods to obtain fuzzy association rule-based classifiers [23]–[28].
The task of classification is to find a set of rules in order to iden-
tify the classes of undetermined patterns. A fuzzy association
rule can be considered to be a classification rule if the antecedent
contains fuzzy item sets, and the consequent part contains only
one class label (C = {C
1
,...,C
j
,...,C
S
}). A fuzzy associa-
tive classification rule, i.e., A C
j
, could be measured directly
in terms of support and confidence as follows:
Support(A C
j
)=
x
p
ClassC
j
μ
A
(x
p
)
|N|
(5)
Confidence(A C
j
)=
x
p
ClassC
j
μ
A
(x
p
)
x
p
T
μ
A
(x
p
)
. (6)
III. F
UZZY ASSOCIATION RULE-BASED CLASSIFIER FOR
HIGH-DIMENSIONAL PROBLEMS
In this section, we will describe our proposal to obtain a fuzzy
association rule-based classifier for high-dimensional problems.
This method is based on the following three stages:
1) Fuzzy association rule extraction for classification:A
search tree is employed to list all the possible frequent
fuzzy item sets and to generate fuzzy association rules for
classification.
2) Candidate rule prescreening: A rule evaluation criterion
is used to preselect candidate fuzzy association rules.
3) Genetic rule selection and lateral tuning: The best coop-
erative rules are selected and tuned by means of a GA,
considering the positive synergy between both techniques
within the same process.
Finally, we add a default rule considering the class with the
most patterns in the training data. In the following, we will intro-
duce the three mentioned stages, which explain in detail all their
characteristics (see Sections III-A–C and present a flowchart of
the algorithm (see Section III-D).
A. Stage 1. Fuzzy Association Rule Extraction for Classification
To generate the RB, we employ a search tree to list all the
possible fuzzy item sets of a class. The root or level 0 of a search
tree is an empty set. All attributes are assumed to have an order
(in our case, the order of appearance in the training data), and
the one-item sets that correspond to the attributes are listed in
the first level of the search tree according to their order. If an
attribute has j possible outcomes (q
j
linguistic terms for each
quantitative attribute), it will have j one-item sets that are listed
in the first level. The children of a one-item node for an attribute
A are the two-item sets that include the one-item set of attribute
A and a one-item set for another attribute behind attribute A in
the order, and so on. If an attribute has j>2 possible outcomes,
it can be replaced by j binary variables to ensure that no more
than one of these j binary attributes can appear in the same node
in a search tree. An example with two attributes V
1
and V
2
with
two linguistic terms L and H is detailed in Fig. 2.
An item set with a support higher than the minimum support
is a frequent item set. If the support of an n-item set in a node J
is less than the minimum support, it does not need to be extended
more because the support of any item set in a node in the subtree,
which is led by node J, will also be less than the minimum sup-
port. Likewise, if a candidate item set generates a classification
rule with confidence higher than the maximum confidence, this
rule has reached the quality level that is demanded by the user,
and it is again unnecessary to extend it further. These properties
greatly reduce the number of nodes needed for searching.
The fuzzy support of an item set can be calculated as follows:
Support(A)=
x
p
T
μ
A
(x
p
)
|N|
(7)
where μ
A
(x
p
) is the matching degree of the pattern x
p
with the
item set. The matching degree μ
A
(x
p
) of x
p
to the different
fuzzy regions is computed by the use of a conjunction operator,
in our case, the product T-norm.
Once all frequent fuzzy item sets have been obtained, the
candidate fuzzy association rules for classification can be gen-
erated, setting the frequent fuzzy item sets in the antecedent of
the rules and the corresponding class in the consequent. This
process is repeated for each class.

860 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 5, OCTOBER 2011
Fig. 2. Search tree for two quantitative attributes V
1
and V
2
with two linguistic terms L and H .
The number of frequent fuzzy item sets that are extracted de-
pends directly on the minimum support. The minimum support
is usually calculated considering the total number of patterns in
the dataset; however, the number of patterns for each class in
a dataset can be different. For this reason, our algorithm deter-
mines the minimum support of each class by the distributions
of the classes over the dataset. Thus, the minimum support for
class C
j
is defined as
MinimumSupport
C
j
= minSup f
C
j
(8)
where minSup is the minimum support determined by the expert,
and f
C
j
is the pattern ratio of the class C
j
.
In this stage, we can generate a large number of candidate
fuzzy association rules for classification. It is, however, very
difficult for human users to handle such a large number of gen-
erated fuzzy rules and to intuitively understand long fuzzy rules
with manyantecedent conditions. For thisreason,we only gener-
ate short fuzzy rules and with only a small number of antecedent
conditions. Thus, the depth of the trees is limited to a fixed value
Depth
max
that is determined by an expert.
B. Stage 2. Candidate Rule Prescreening
In the previous stage, we can generate a large number of
candidate rules. In order to decrease the computational costs of
stage 3, we consider the use of subgroup discovery to preselect
the most interesting rules from the RB, which are obtained in
the previous stage by means of a pattern weighting scheme
[33]. This scheme treats the patterns in such a way that covered
positive patterns are not deleted when the current best rule is
selected. Instead, each time a rule is selected, the algorithm
stores a count i for each pattern of how many times (with how
many of the selected rules) the pattern has been covered.
Weights of positive patterns covered by the selected rule de-
crease according to the formula w(e
j
,i)=
1
i+1
. In the first it-
eration, all target class patterns are assigned the same weight,
i.e., w(e
j
, 0) = 1, while in the following iterations the contribu-
tions of patterns are inversely proportional to their coverage by
previously selected rules. This way, the patterns that are already
covered by one or more selected rules decrease their weights
while uncovered target class patterns whose weights have not
been decreased will have a greater chance of being covered in
the following iterations. Covered patterns are completely elim-
inated when they have been covered more than k
t
times.
TABLE I
F
IVE PATTERNS IN THIS EXAMPLE
Thus, in each iteration of the process, the rules are ordered
according to a rule evaluation criterion from best to worst. The
best rule is selected, covered patterns are reweighted, and the
procedure repeats these steps until one of the stopping criteria
is satisfied: either all patterns have been covered more than k
t
times, or there are no more rules in the RB. This process is to
be repeated for each class.
wWRAcc
was used to evaluate the quality of intervalar rules
in APRIORI-SD [33]. This measure was defined as follows:
wWRAcc
(A C
j
)=
n
(A)
N
·
n
(A · C
j
)
n
(A)
n(C
j
)
N
(9)
where N
is the sum of the weights of all patterns, n
(A)
is the sum of the weights of all covered patterns, n
(A · C
j
)
is the sum of the weights of all correctly covered patterns,
n(C
j
) is the number of patterns of class C
j
, and N is the
number of all patterns. For instance, let us consider a simple
database with two attributes X
1
and X
2
, two classes C
1
and
C
2
, and five training patterns. Table I shows the five train-
ing patterns and their weights in the pth iteration of the pro-
cess. In this iteration, the wWRAcc
value of a simple rule,
i.e, R = IF X
1
is [0.0, 5.0[ and X
2
is [5.0, 10.0] C
1
,is
calculated as follows:
wWRAcc
(R)=
1.0+0.5
1.0+1.0+0.0+1.0+0.5
·
1.0+0.5
1.0+0.5
2
5
=0.257.
We have modified this measure to enable the handling of
fuzzy rules. The new measure is defined as follows:
wWRAcc

(A C
j
)=
n

(A · C
j
)
n
(C
j
)
·
n

(A · C
j
)
n

(A)
n(C
j
)
N
(10)

ALCAL
´
A-FDEZ et al.: FUZZY ASSOCIATION RULE-BASED CLASSIFICATION MODEL FOR HIGH-DIMENSIONAL PROBLEMS 861
where n

(A) is the sum of the products of the weights of all
covered patterns by their matching degrees with the antecedent
part of the rule, n

(A · C
j
) is the sum of the products of the
weights of all correctly covered patterns by their matching de-
grees with the antecedent part of the rules, and n
(C
j
) is the sum
of the weights of patterns of class C
j
. Moreover, the first term
in the definition of wWRAcc
has been replaced by
n

(A ·C
j
)
n(C
j
)
to
reward rules that cover uneliminated patterns of class C
j
.
Let us consider three linguistic terms for the attributes
X
1
and X
2
(see Fig. 1). Based on this definition, a sim-
ple example of fuzzy association rule for classification is:
R = If X
1
is Low and X
2
is High C
1
. This rule covers
the training patterns in Table I with degrees (ID1, 1.0), (ID2,
0.0), (ID3, 0.0), (ID4, 0.0), and (ID5, 0.5). In this situation, the
wWRAcc

value of this rule is calculated as follows:
wWRAcc

(R)=
1.0 1.0+0.5 0.5
1.0+0.5
·
1.0 1.0+0.5 0.5
1.0 1.0+0.5 0.5
2
5
=0.5.
This measure can obtain positive or negative values in the inter-
val [1.0, 1.0]. A rule with a wWRAcc

value near to 1 may be
more useful for the classification.
C. Stage 3. Rule Selection and Lateral Tuning
We consider the use of GAs to select and tune a compact set
of fuzzy association rules with high classification accuracy from
the RB, which are obtained in the previous stage. We consider
the approach that is proposed in [35], where rules are based
on the linguistic two-tuple representation [55]. This represen-
tation allows the lateral displacement of the labels considering
only one parameter (symbolic translation parameter), which in-
volves a simplification of the tuning search space that eases the
derivation of optimal models, particularly, when it is combined
with a rule selection within the same process enabling it to take
advantage of the positive synergy that both techniques present.
This way, this process to contextualize the MFs enables them
to achieve a better covering degree while maintaining the orig-
inal shapes, which results in accuracy improvements without a
significant loss in the interpretability of the fuzzy labels. The
symbolic translation parameter of a linguistic term is a number
within the interval [0.5, 0.5) that expresses the domain of a
label when it is moving between its two lateral labels. Let us
consider a set of labels S representing a fuzzy partition. For-
mally, we have the pair (S
i
, α
i
), S
i
S, α
i
[0.5, 0.5). An
example is illustrated in Fig. 3, where we show the symbolic
translation of a label that is represented by the pair (S
2
, 0.3).
Let us consider the simple problem presented in the previous
section. Based on this definition, examples of classic rule and
linguistic two-tuple represented rule are as follows.
Classic Rule:
IF X
1
is Low and X
2
is Middle
THEN Class is C
1
Fig. 3. Symbolic translation of a linguistic label and lateral displacement
of the involved MF. (a) Simbolic translation of a linguistic term. (b) Lateral
displacement of an MF.
Two-Tuple Representation:
IF X
1
is (Low, 0.1) and X
2
is (Middle, 0.3)
THEN Class isC
1
.
In [35], two different rule representation approaches were
proposed: a global approach and a local approach. In our partic-
ular case, the tuning is applied to the level of linguistic partitions
(global approach). This way, the pair (X
i
, label) takes the same
α value in all the rules where it is considered, i.e., a global
collection of two tuples is considered by all the fuzzy rules. For
example, X
1
is (High, 0.3) that will present the same value for
those rules in which the pair X
1
is High” was initially consid-
ered. This proposal decreases the tuning problem complexity,
greatly easing the derivation of optimal models. Another im-
portant issue is that from the parameters α that are applied to
each label, we could obtain the equivalent triangular MFs, by
which an FRBCS that is based on linguistic two tuples could
be represented as a classical Mamdani FRBCS. Notice that the
class label and RW of the rule are not modified.
In the following, the main characteristics of the genetic ap-
proach that combines rule selection and lateral tuning are pre-
sented: genetic model, codification and initial gene pool, chro-
mosome evaluation, crossover operator, and restarting approach.
1) CHC Genetic Model: The approach that is proposed in
[35] considers the use of a specific GA, the CHC algorithm [56].

Citations
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: The definition and basic properties of the different types of fuzzy sets that have appeared up to now in the literature are reviewed and the relationships between them are analyzed.
Abstract: In this paper, we review the definition and basic properties of the different types of fuzzy sets that have appeared up to now in the literature. We also analyze the relationships between them and enumerate some of the applications in which they have been used.

386 citations


Cites methods from "A Fuzzy Association Rule-Based Clas..."

  • ...The experimental results presented in [128] show that the approach using IVFSs (named IVTURS) improves the results of two state-of-the-art fuzzy classifiers like FARC-HD [2] and FURIA [77]....

    [...]

Journal ArticleDOI
TL;DR: The state-of-the-art feature selection schemes reported in the field of computational intelligence are reviewed to reveal the inadequacies of existing approaches in keeping pace with the emerging phenomenon of Big Dimensionality.
Abstract: The world continues to generate quintillion bytes of data daily, leading to the pressing needs for new efforts in dealing with the grand challenges brought by Big Data. Today, there is a growing consensus among the computational intelligence communities that data volume presents an immediate challenge pertaining to the scalability issue. However, when addressing volume in Big Data analytics, researchers in the data analytics community have largely taken a one-sided study of volume, which is the "Big Instance Size" factor of the data. The flip side of volume which is the dimensionality factor of Big Data, on the other hand, has received much lesser attention. This article thus represents an attempt to fill in this gap and places special focus on this relatively under-explored topic of "Big Dimensionality", wherein the explosion of features (variables) brings about new challenges to computational intelligence. We begin with an analysis on the origins of Big Dimensionality. The evolution of feature dimensionality in the last two decades is then studied using popular data repositories considered in the data analytics and computational intelligence research communities. Subsequently, the state-of-the-art feature selection schemes reported in the field of computational intelligence are reviewed to reveal the inadequacies of existing approaches in keeping pace with the emerging phenomenon of Big Dimensionality. Last but not least, the "curse and blessing of Big Dimensionality" are delineated and deliberated.

226 citations

Journal ArticleDOI
TL;DR: The most recent components added to KEEL 3.0 are described, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery, which greatly improve the versatility of KEEL to deal with more modern data mining problems.
Abstract: This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.

218 citations


Cites background or methods from "A Fuzzy Association Rule-Based Clas..."

  • ...Several approaches have been designed to tackle this problem, which can be divided into two main alternatives: (1) internal approaches that create new algorithms or modify existing ones to take the class-imbalance problem into consideration and (2) external approaches that pre-process the data in order to diminish the effect of their class imbalance....

    [...]

  • ...(1) Under-Sampling and Over-Sampling Models, (2) Imbal-...

    [...]

  • ...The data flow and results from the methods and statistical techniques are shown in Figure 2 (2)....

    [...]

  • ...0, highlighting its main features: (1) data management section, including import/export, visualisations, editions and partitioning of data; (2) GUI for statistical analysis, with the statistical procedures described in 15,21; (3) Flexible experiments configuration, showing an example of experiment illustrated by means of a flowchart, involving four data preprocessing algorithms (Chi2 discretizer, Mutual Information feature selection, SVM missing values imputation and Iterative Partitioning noise filter), three classifiers (CART, C4....

    [...]

Journal ArticleDOI
TL;DR: A methodology based on GFS and OVO in the framework of IDS with the use of Genetic Fuzzy Systems within a pairwise learning framework, which improves precision for rare attack events and achieves a better separability between a "normal activity" and the different attack types.
Abstract: A methodology based on GFS and OVO in the framework of IDS is proposed.Linguistic labels enable smoother borderline, and allows higher interpretability.Divide-and-conquer learning scheme, improves precision for rare attack events.Several metrics of performance show the goodness of this approach on KDDCUP'99.Our results excels the state-of-the-art of GFS for IDS and C4.5 decision tree. Security policies of information systems and networks are designed for maintaining the integrity of both the confidentiality and availability of the data for their trusted users. However, a number of malicious users analyze the vulnerabilities of these systems in order to gain unauthorized access or to compromise the quality of service. For this reason, Intrusion Detection Systems have been designed in order to monitor the system and trigger alerts whenever they found a suspicious event.Optimal Intrusion Detection Systems are those that achieve a high attack detection rate together with a small number of false alarms. However, cyber attacks present many different characteristics which make them hard to be properly identified by simple statistical methods. According to this fact, Data Mining techniques, and especially those based in Computational Intelligence, have been used for implementing robust and accuracy Intrusion Detection Systems.In this paper, we consider the use of Genetic Fuzzy Systems within a pairwise learning framework for the development of such a system. The advantages of using this approach are twofold: first, the use of fuzzy sets, and especially linguistic labels, enables a smoother borderline between the concepts, and allows a higher interpretability of the rule set. Second, the divide-and-conquer learning scheme, in which we contrast all possible pair of classes with aims, improves the precision for the rare attack events, as it obtains a better separability between a "normal activity" and the different attack types.The goodness of our methodology is supported by means of a complete experimental study, in which we contrast the quality of our results versus the state-of-the-art of Genetic Fuzzy Systems for intrusion detection and the C4.5 decision tree.

181 citations


Cites background or methods from "A Fuzzy Association Rule-Based Clas..."

  • ...Fernández, Bustince, & Herrera, 2013) and it is especially wellsuited for high-dimensional problems (Alcalá-Fdez et al., 2011), such as the one we are addressing in IDS....

    [...]

  • ...…fuzzy classifier has shown a robust behaviour for different classification scenarios (López, Fernández, & Herrera, 2013; Sanz, Fernández, Bustince, & Herrera, 2013) and it is especially wellsuited for high-dimensional problems (Alcalá-Fdez et al., 2011), such as the one we are addressing in IDS....

    [...]

  • ...Specifically, for the evaluation of the goodness of our IDS proposal, we will contrast the experimental results versus the standard FARC-HD algorithm and several GFS approaches that have been developed for misuse detection....

    [...]

  • ...Furthermore, the baseline classifier used, i.e. FARC-HD, excel from the algorithms of the state-of-the-art as it is well-suited for high dimensional problems....

    [...]

  • ...According to the former, in Section 3.1 we will first describe the features of the FARC-HD algorithm, which has been selected as baseline technique....

    [...]

References
More filters
Book
01 Sep 1988
TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
Abstract: From the Publisher: This book brings together - in an informal and tutorial fashion - the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields Major concepts are illustrated with running examples, and major algorithms are illustrated by Pascal computer programs No prior knowledge of GAs or genetics is assumed, and only a minimum of computer programming and mathematics background is required

52,797 citations

Book
01 Jan 1975
TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Abstract: Name of founding work in the area. Adaptation is key to survival and evolution. Evolution implicitly optimizes organisims. AI wants to mimic biological optimization { Survival of the ttest { Exploration and exploitation { Niche nding { Robust across changing environments (Mammals v. Dinos) { Self-regulation,-repair and-reproduction 2 Artiicial Inteligence Some deenitions { "Making computers do what they do in the movies" { "Making computers do what humans (currently) do best" { "Giving computers common sense; letting them make simple deci-sions" (do as I want, not what I say) { "Anything too new to be pidgeonholed" Adaptation and modiication is root of intelligence Some (Non-GA) branches of AI: { Expert Systems (Rule based deduction)

32,573 citations


"A Fuzzy Association Rule-Based Clas..." refers methods in this paper

  • ...In order to enhance the interpretability of the obtained classification rules and to avoid unnatural boundaries in the partitioning of the attributes, different studies have been presented to obtain classification systems, which is based on fuzzy association rules [23]–[28]....

    [...]

Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations


"A Fuzzy Association Rule-Based Clas..." refers result in this paper

  • ...Third, we have shown the results that are obtained from the comparison with four other classical approaches for associative classification and with the C4.5 decision tree [39]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a simple and widely accepted multiple test procedure of the sequentially rejective type is presented, i.e. hypotheses are rejected one at a time until no further rejections can be done.
Abstract: This paper presents a simple and widely ap- plicable multiple test procedure of the sequentially rejective type, i.e. hypotheses are rejected one at a tine until no further rejections can be done. It is shown that the test has a prescribed level of significance protection against error of the first kind for any combination of true hypotheses. The power properties of the test and a number of possible applications are also discussed.

20,459 citations