scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Fuzzy-Rough Approach for Case Base Maintenance

02 Aug 2001-pp 118-130
TL;DR: A heuristic algorithm is used, i.e., a fuzzy-rough algorithm in the process of simplifying fuzzy rules, which has many significant advantages, such as rapid speed of training and matching, generating a family of fuzzy rules which is approximately simplest.
Abstract: This paper proposes a fuzzy-rough method of maintaining Case-Based Reasoning (CBR) systems. The methodology is mainly based on the idea that a large case library can be transformed to a small case library together with a group of adaptation rules, which take the form of fuzzy rules generated by the rough set technique. In paper [1], we have proposed a methodology for case base maintenance which used a fuzzy decision tree induction to discover the adaptation rules; in this paper, we focus on using a heuristic algorithm, i.e., a fuzzy-rough algorithm [2] in the process of simplifying fuzzy rules. This heuristic, regarded as a new fuzzy learning algorithm, has many significant advantages, such as rapid speed of training and matching, generating a family of fuzzy rules which is approximately simplest. By applying such a fuzzy-rough learning algorithm to the adaptation mining phase, the complexity of case base maintenance is reduced, and the adaptation knowledge is more compact and effective. The effectiveness of the method is demonstrated experimentally using two sets of testing data, and we also compare the maintenance results of using fuzzy ID3, in [1], and the fuzzy-rough approach, as in this paper.

Summary (1 min read)

1 Introduction

  • At present, large-scale CBR systems are becoming more popular, with caselibrary sizes ranging from thousands [3][4] to millions of cases [5].
  • Large case library sizes raise problems of case retrieval efficiency, and many CBR researchers pay more attention to the problem of Case Base Maintenance (CBM).
  • Anand et al. [9] proposed to use data mining techniques for mining adaptation knowledge, and maintaining CBR systems.

2.1 Phase One - Learning Feature Weights

  • The smaller the evaluation value, the better the corresponding features.
  • Thus the authors would like to find the weights such that the evaluation function attains its minimum.
  • When all the weights take value 1, the similarity measure is denoted by )1( pqSM .
  • Select the parameter α and the learning rateη.

2.2 Phase Two - Partitioning the Case Library into Several Clusters

  • This section attempts to partition the case library into several clusters by using the weighted distance metric with the weights learned in section 2.1.
  • Since the considered features are considered to be real-valued, many methods, such as K-Means clustering [15] and Kohonen’s self-organizing network [16], can be used to partition the case library.
  • In order to compare the fuzzy decision tree and fuzzyrough approaches in mining adaptation rules, the authors take the similarity matrix clustering method in [1].

2.4 Selecting Representative Cases

  • This phase aims to select representative cases from each cluster according to the adaptation rules obtained in phase three.
  • Instead of the deletion, [1] proposes a selection strategy which makes use of Smyth’s proposed concepts of coverage and reachability with some changes (called ε -coverage and ε -reachability respectively).

3 Experimental Analysis

  • This section presents the experimental analysis of their approach on a real-world problem, i.e. the rice taste (RT) problem.
  • There are 56 cases are deleted by using fuzzy rough approach while only 39 cases are deleted by using fuzzy decision tree method.
  • So the overall selection result based on the adaptation rules generated by fuzzy-rough method is better than those based on the rules generated by the fuzzy decision tree.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Fuzzy-Rough Approach for Case Base Maintenance
Guoqing Cao, Simon Shiu and Xizhao Wang
Department of Computing, Hong Kong Polytechnic University
Hung Hom, Kowloon, Hong Kong
{csgqcao, csckshiu, csxzwang}@comp.polyu.edu.hk
Abstract. This paper proposes a fuzzy-rough method of maintaining Case-
Based Reasoning (CBR) systems. The methodology is mainly based on the idea
that a large case library can be transformed to a small case library together with
a group of adaptation rules, which take the form of fuzzy rules generated by the
rough set technique. In paper [1], we have proposed a methodology for case
base maintenance which used a fuzzy decision tree induction to discover the
adaptation rules; in this paper, we focus on using a heuristic algorithm, i.e., a
fuzzy-rough algorithm [2] in the process of simplifying fuzzy rules. This
heuristic, regarded as a new fuzzy learning algorithm, has many significant
advantages, such as rapid speed of training and matching, generating a family
of fuzzy rules which is approximately simplest. By applying such a fuzzy-rough
learning algorithm to the adaptation mining phase, the complexity of case base
maintenance is reduced, and the adaptation knowledge is more compact and
effective. The effectiveness of the method is demonstrated experimentally using
two sets of testing data, and we also compare the maintenance results of using
fuzzy ID3, in [1], and the fuzzy-rough approach, as in this paper.
1 Introduction
At present, large-scale CBR systems are becoming more popular, with case-
library sizes ranging from thousands [3][4] to millions of cases [5]. Large case library
sizes raise problems of case retrieval efficiency, and many CBR researchers pay more
attention to the problem of Case Base Maintenance (CBM). According to Leake and
Wilson [6]case base maintenance is the process of refining a CBR systems case
base to improve the systems performance. That is, “case base maintenance
implements policies for revising the organization or contents (representation, domain
content, accounting information, or implementation) of the case base, in order to
facilitate future reasoning for a particular set of performance objectives”.
How should we maintain large case-based reasoning systems? In the past,
researchers have done much work in this area. Smyth and Keane [7] suggested a
competence-preserving deletion approach. They put forward the concept of
competence (or coverage), being the range of target problems that a given system can
solve and also a fundamental evaluation criterion of CBR system performance. Smyth
and McKenna [8] also presented a new model of case competence, and demonstrated
a way in which the proposed model of competence can be used to assist case authors.

Anand et al. [9] proposed to use data mining techniques for mining adaptation
knowledge, and maintaining CBR systems.
Recently, Richter proposed the notion of knowledge containers [10][11], and it
quickly became the standard paradigm for representation of the structural elements in
CBR systems. Simon et al. established a methodology that could be used to transfer
case knowledge to adaptation knowledge [1]. The methodology integrated identifying
salient features, distinguishing different concepts, learning adaptation knowledge,
computing case competence, and selecting representative cases together into a
framework of CBM. Fuzzy set theory, as proposed by L.A. Zadeh [12], and rough set
theory, allow the utilization of uncertain knowledge by means of fuzzy linguistic
terms and their membership functions, which reflects humans understanding of the
problem [13]. The rough set theory proposed by Z. Pawlak [14] enables us to find
relationships between data without any additional information such as prior
probability, only requiring knowledge representation as a set of if-then rules [13]. In
this paper, we propose a new method of adaptation knowledge discovery, integrating
rough set theory and fuzzy set theory to transfer the case knowledge to adaptation
knowledge. This fuzzy-rough approach has many significant advantages, such as
rapid speed of training and matching, generating a family of fuzzy rules which is
approximately simplest. By applying such a fuzzy-rough learning algorithm to the
phase of mining adaptation rules, the cost and complexity of case base maintenance is
reduced, and the more important virtue is that the adaptation knowledge is more
compact, effective and easily used.
2 Methodology for CBM using Fuzzy-Rough Approach
In this paper, we use the framework of case base maintenance in [1] to carry out
our CBM process. The details of maintaining a case-base from scratch, as proposed in
[1], consists of four phases: firstly, an approach to learning feature weight
automatically is used to evaluate the importance of different features in a given case
base; secondly, clustering of cases will be carried out to identify different concepts in
the case base using the acquired feature knowledge; thirdly, adaptation rules will be
mined for each concept using fuzzy decision trees, but in this paper, we apply a fuzzy-
rough approach to mine adaptation rules for each concept; finally, a selection strategy
based on the concepts of
ε
-coverage and
ε
-reachability is used to select
representative cases.
In the following sub-section, we briefly introduce phases 1, 2 and 4 of the
methodology proposed in [1], and introduce our approach to step 3 in detail.
2.1 Phase One - Learning Feature Weights
In this section, a feature evaluation function is defined. The smaller the evaluation
value, the better the corresponding features. Thus we would like to find the weights

such that the evaluation function attains its minimum. The task of minimization of the
evaluation function with respect to weights is performed using a gradient descent
technique. We formulate this optimization problem as follows:
For a given collection of feature weights
= nj
j
w
j
w L,1],1,0[ and a pair of
cases
p
e and
q
e , equation (1) defines a weighted distance measure
)(w
pq
d and
equation (2) defines a similarity measure
)(w
pq
SM .
2/1
1
22
2/1
1
22)()(
)(),(
=
==
==
n
j
jj
n
j
qjpjjqp
ww
pq
wxxweedd
χ
(1)
where
22
)(
qjpjj
xx =
χ
. When all the weights are equal to 1, the distance metric
defined above degenerates to the Euclidean measure, denoted by
)1(
pq
d , in short,
denoted by
pq
d .
)(
)(
1
1
w
pq
w
pq
d
SM
+
=
α
(2)
where
α
is a positive parameter. When all the weights take value 1, the similarity
measure is denoted by
)1(
pq
SM .
A feature evaluation index E is defined as
)1(*
]
)(
)(
1(
)1(
)
)1(
1(
)(
[*2
)(
<
+
=
NN
p pqq
w
pq
SM
pq
SM
pq
SM
w
pq
SM
wE
(3)
where N is the number of cases in the case base.
To minimize equation (3), we use a gradient descent technique. The change in
j
w
(i.e.
j
w ) is computed as
,
j
w
E
j
w
=
η
(4)
for
n
j
,
,1
L
=
, where
η
is the learning rate.
The training algorithm is described as follows:

Step 1. Select the parameter
α
and the learning rate
η
.
Step 2. Initialize
j
w with random values in [0, 1].
Step 3. Compute
j
w for each j using equation (4).
Step 4. Update
j
w with
j
w +
j
w for each j.
Step 5. Repeat step 3 and step 4 until convergence, i.e., until the value of E becomes
less than or equal to a given threshold or until the number of iterations exceeds a
certain predefined number.
2.2 Phase Two - Partitioning the Case Library into Several Clusters
This section attempts to partition the case library into several clusters by using the
weighted distance metric with the weights learned in section 2.1. Since the considered
features are considered to be real-valued, many methods, such as K-Means clustering
[15] and Kohonen’s self-organizing network [16], can be used to partition the case
library. However in this paper, in order to compare the fuzzy decision tree and fuzzy-
rough approaches in mining adaptation rules, we take the similarity matrix clustering
method in [1].
2.3 Phase Three - Mining Adaptation Rules by Fuzzy-Rough Approach
For each cluster },,
2
,
1
{
m
eeeL L= , we denote its cases in the form of
),,,
2
,
1
(
i
c
in
x
i
x
i
x
i
e
L
= , where
ij
x corresponds to the value of feature
)1( nj
j
F and
i
c corresponds to the action ),,1( mi L= . Arbitrarily taking a
case )1( mk
k
e
in the cluster L, a set of vectors, namely
{
}
mi
n
R
i
f
i
f ,,2,1,
1
| L=
+
, can be computed in the following way:
=== ),,,
22
,
11
(
k
c
i
c
kn
x
in
x
k
x
i
x
k
x
i
x
k
e
i
e
i
f L
},,,
2
,
1
{
i
u
in
y
i
y
i
y L
We attempt to find several adaptation rules with respect to the case
)1( mk
k
e from the set of vectors
{
}
mi
n
R
i
f
i
f ,,2,1,
1
| L=
+
by fuzzy rules.
Consider a problem of learning from examples in which there are n+1 numerical
attributes,
+ )1(
,
)(
,,
)2(
,
)1( n
Attr
n
AttrAttrAttr L
(
)1( +n
Attr
) is the
classification attribute). Then
{
}
mi
i
f ,,2,1| L= can be regarded as m examples

NB NS ZE PS PB
a 0 b
described by the n+1 attributes. We first fuzzify these n+1 numerical attributes into
linguistic terms.
The number of linguistic terms for each attribute is assumed to be five (which can
be enlarged or reduced if it is needed in a real problem). These five linguistic terms
are Negative Big, Negative Small, Zero, Positive Small, and Positive Big, in short,
NB,NS, ZE, PS and PB respectively. Their membership functions are supposed to
have triangular form and are shown in Figure 1. For each attribute (the k-th attribute
)( k
Attr
, 11
+
nk ) with the attribute-values
{
}
mk
y
k
y
k
y
k
AttrRange ,,
2
,
1
)
)
(
( L=
, the two parameters in Figure 1, a and b,
are defined by
)(NCard
Ny
ya
= and )(PCard
Py
yb
=
(5)
in which
<= 0),
)(
(| y
k
AttrRangyyN
,
N
k
AttrRangeP = )
)(
(
and Card(E) denotes the cardinality of a crisp set E.
Figure 1. Five membership functions
After the process of fuzzification, we transform the crisp cases in the case library
to fuzzy cases successfully. Each fuzzy case is considered to be a fuzzy set defined on
the non-fuzzy label space consisting of all values of attributes, where the non-fuzzy
label space consists of the linguistic terms of each attribute. Consider each fuzzy case
as an initial fuzzy rule. We then apply the rough set technique to these fuzzy rules and
get a subset of those fuzzy rules, which covers all fuzzy cases, and the cardinality of
the subset is approximately minimal. The fuzzy-rough algorithm is divided into three
tasks to be fulfilled [1]: (1) in search of a minimal reduct for each initial fuzzy rule,
(2) in search of a family of minimal reducts for the i th ( Mi
1 , where M is the
number of fuzzy cases )fuzzy case such that each reduct inside of this family covers
the i th fuzzy case, and (3) in search of a subset of those fuzzy rules which covers all
fuzzy cases and the cardinality of the subset is minimal.
We first introduce the definitions used in the fuzzy-rough approach.
In order to transfer the fuzzy data into fuzzy rules, firstly we introduce fuzzy
knowledge base concept, Table 1 is said to be a fuzzy knowledge base, where there
are n rows and m attributes. ),...,2,1( njAttr
j
= . ),...,2,1;,...,2,1( mjniA
ij
== are all

Citations
More filters
Journal ArticleDOI
Guo-Niu Zhu1, Jie Hu1, Jin Qi1, Jin Ma1, Yinghong Peng1 
TL;DR: A hybrid CBR system is proposed by introducing reduction technique in feature selection and cluster analysis in case organization and the results indicate that the research techniques can effectively enhance the performance of theCBR system.

56 citations


Cites background or methods from "A Fuzzy-Rough Approach for Case Bas..."

  • ...Therefore, to develop a CBR with appropriate feature selection and case organization as well as case base visualization is an urgent need to assist product design....

    [...]

  • ...Once feature selection is finished, the growing hierarchical self-organizing map (GHSOM) is taken as a cluster tool to organize those cases so that the initial case base can be divided into some small subsets with hierarchical structure....

    [...]

  • ...With the rapid development of CBR, large scale case base is becoming more common, with the number of instances ranging from thousands to millions (Cao et al., 2001)....

    [...]

  • ...Cao et al. (2001, 2003) proposed a fuzzy-rough method for case library maintenance and adaptation knowledge mining....

    [...]

Journal ArticleDOI
01 Oct 2010
TL;DR: A novel approach to noise reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on noise reduction and is significantly better than other analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class densities.
Abstract: To some extent the problem of noise reduction in machine learning has been finessed by the development of learning techniques that are noise-tolerant. However, it is difficult to make instance-based learning noise tolerant and noise reduction still plays an important role in k-nearest neighbour classification. There are also other motivations for noise reduction, for instance the elimination of noise may result in simpler models or data cleansing may be an end in itself. In this paper we present a novel approach to noise reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction techniques are based. Roughly speaking, for each training example an SVM is trained on its neighbourhood and if the SVM classification for the central example disagrees with its actual class there is evidence in favour of removing it from the training set. We provide an empirical evaluation on 15 real datasets showing improved classification accuracy when using training data edited with our method as well as specific experiments regarding the spam filtering application domain. We present a further evaluation on two artificial datasets where we analyse two different types of noise (Gaussian feature noise and mislabelling noise) and the influence of different class densities. The conclusion is that LSVM noise reduction is significantly better than the other analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class densities.

51 citations


Cites background or methods from "A Fuzzy-Rough Approach for Case Bas..."

  • ...Similar approaches have been proposed by Cabailero et al (2005) who creates the edited training data from the lower and upper set approximations and Cao et al (2001) who couples rough sets theory with fuzzy decision tree induction....

    [...]

  • ...Lorena and Carvalho (2004), for example, found that preprocessing the training data to remove noise resulted in simplifications in induced SVM classifiers and higher comprehensiveness in induced decision tree classifiers....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a novel methodology to apply fuzzy similarity-based Rough Set algorithm in feature weighting and reduction for CBR system and shows how this algorithm is used in tool selection for die and mold NC machining.
Abstract: Case-based reasoning (CBR) embodied in die and mold NC machining will extend the application of knowledge-based system by utilizing previous cases and experience. However, redundant features may not only dramatically increase the case memory, but also make the case retrieval algorithm more complicated. Additionally, traditional methods of feature weighting limit the development of CBR methodology. This paper presents a novel methodology to apply fuzzy similarity-based Rough Set algorithm in feature weighting and reduction for CBR system. The algorithm is used in tool selection for die and mold NC machining. The proposed method does not need to discretize continuous or real-valued features included in cases, from which can effectively reduce information loss. The weight of feature ai is computed based on the difference of its dependency defined as γ A − γ A − { a i } , which also represents the significance of the corresponding feature. If the difference is equal to 0, the feature is considered to be redundant and should be removed. Finally, a case study is also implemented to prove the proposed method.

45 citations

Journal Article
TL;DR: A snapshot of the state of the art of Case Base Maintenance methods is provided, reviewing some important methods of maintaining case based reasoning and introducing a framework for distinguishing these methods and compare and analyze them.
Abstract: The success of a Case Based Reasoning (CBR) system depends on the quality of case data and the speed of the retrieval process that can be expensive in time especially when the number of cases gets large. To guarantee this quality, maintenance the contents of a case base becomes necessarily. As a result, the research area of Case Base Maintenance (CBM) has drawn more and more attention to CBR systems. This paper provides a snapshot of the state of the art, reviewing some important methods of maintaining case based reasoning. We introduce a framework for distinguishing these methods and compare and analyze them. In addition, this paper also presents simulations on data sets from U.C.I repository to show the effectiveness of some CBM methods taking into account the accuracy, the size and the retrieval time of case bases. Our simulation results which are obtained by compared well known reduction techniques show that these CBM methods have good storage reduction ratios, satisfying classification accuracies and short retrieval time. General Terms Machine Intelligence, Case Based Reasoning, Algorithms.

30 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: The purpose of the WCOID case base maintenance policy is to reduce both the storage requirements and search time and to focus on balancing case retrieval efficiency and competence for a large size case base.
Abstract: The success of a Case Based Reasoning (CBR) system depends on the quality of case data and the speed of the retrieval process that can be expensive in time especially when the number of cases gets large. To guarantee this quality, maintenance the contents of a case base becomes necessarily. In this paper, we propose a novel case base maintenance (CBM) policy named WCOID - Weighting, Clustering, Outliers and Internal cases Detection, using, in addition to clustering and outliers detection methods, feature weights in the process of improving the competence of our reduced case base. The purpose of our WCOID case base maintenance policy is to reduce both the storage requirements and search time and to focus on balancing case retrieval efficiency and competence for a large size case base. WCOID is mainly based on the idea that a large case base with weighted features is transformed to a small case base.

21 citations


Cites background from "A Fuzzy-Rough Approach for Case Bas..."

  • ...One branch of research has focused on the partitioning of case base which builds an elaborate CB structure and maintains it continuously [1], [5], [6]....

    [...]

  • ...On the other hand, following a partitioning policy [1], [6], [13] that builds an elaborate case base structure and maintains it continuously....

    [...]

References
More filters
Book
01 Aug 1996
TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Abstract: A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

52,705 citations

Book
31 Jul 1981
TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.
Abstract: New updated! The latest book from a very famous author finally comes out. Book of pattern recognition with fuzzy objective function algorithms, as an amazing reference becomes what you need to get. What's for is this book? Are you still thinking for what the book is? Well, this is what you probably will get. You should have made proper choices for your better life. Book, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with.

15,662 citations


"A Fuzzy-Rough Approach for Case Bas..." refers methods in this paper

  • ...Since the considered features are considered to be real-valued, many methods, such as K-Means clustering [15] and Kohonen’s self-organizing network [16], can be used to partition the case library....

    [...]

Book
01 Sep 1996
TL;DR: This book presents a selection of recent progress, issues, and directions for the future of case-based reasoning, and experimentally examines one of the fundamental tenets of CBR, that reasoning from prior experiences improves performance.
Abstract: From the Publisher: This book presents a selection of recent progress, issues, and directions for the future of case-based reasoning. It includes chapters addressing fundamental issues and approaches in indexing and retrieval, situation assessment and similarity assessment, and in case adaptation. Those chapters provide a "case-based" view of key problems and solutions in context of the tasks for which they were developed. It also presents lessons learned about how to design CBR systems and how to apply them to real-world problems. The final chapters include a perspective on the state of the field and the most important directions for future impact. The case studies presented involve a broad sampling of tasks, such as design, education, legal reasoning, planning, decision support, problem-solving, and knowledge navigation. In addition, they experimentally examine one of the fundamental tenets of CBR, that reasoning from prior experiences improves performance. The chapters also address other issues that, while not restricted to CBR per se, have been vigorously attacked by the CBR community, including creative problem-solving, strategic memory search, and opportunistic retrieval. This volume provides a vision of the present, and a challenge for the future, of case-based reasoning research and applications.

935 citations

Journal ArticleDOI
TL;DR: A fuzzy decision tree induction method, which is based on the reduction of classification ambiguity with fuzzy evidence, is developed, which represents classification knowledge more naturally to the way of human thinking and are more robust in tolerating imprecise, conflict, and missing information.

902 citations


"A Fuzzy-Rough Approach for Case Bas..." refers background in this paper

  • ...(Yuan and Shaw [17]) The true degree of fuzzy rule BA ⇒ is defined to be =α )(/))(),(min( uuuuuu Uu ABUu A ∑∑ ∈∈ , where A and B are two fuzzy sets defined on the same universe U. Definition 2....

    [...]

  • ...(Yuan and Shaw [17]) The true degree of fuzzy rule B A ⇒ is defined to be = α ) ( / )) ( ), ( min( u u u u u u U u A B U u A ∑ ∑ ∈ ∈ , where A and B are two fuzzy sets defined on the same universe U....

    [...]

Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "A fuzzy-rough approach for case base maintenance" ?

This paper proposes a fuzzy-rough method of maintaining CaseBased Reasoning ( CBR ) systems. In paper [ 1 ], the authors have proposed a methodology for case base maintenance which used a fuzzy decision tree induction to discover the adaptation rules ; in this paper, they focus on using a heuristic algorithm, i. e., a fuzzy-rough algorithm [ 2 ] in the process of simplifying fuzzy rules. The effectiveness of the method is demonstrated experimentally using two sets of testing data, and the authors also compare the maintenance results of using fuzzy ID3, in [ 1 ], and the fuzzy-rough approach, as in this paper. 

Future work includes ( 1 ) a large scale testing of their methodology using different case-bases, ( 2 ) the refining of the fuzzy-rough algorithms, ( 3 ) a comprehensive analysis of the complexity of the case base maintenance and reasoning algorithm in time and space, and ( 4 ) future comparison with other methods, such as fuzzy decision tree, C4. 5, genetic algorithm and so on.