scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Statistical evaluation of rough set dependency analysis

TL;DR: This paper proposes to enhance RSDA by two simple statistical procedures, both based on randomization techniques, to evaluate the validity of prediction based on the approximation quality of attributes of rough set dependency analysis.
Abstract: Rough set data analysis (RSDA) has recently become a frequently studied symbolic method in data mining. Among other things, it is being used for the extraction of rules from databases; it is, however, not clear from within the methods of rough set analysis, whether the extracted rules are valid.In this paper, we suggest to enhance RSDA by two simple statistical procedures, both based on randomization techniques, to evaluate the validity of prediction based on the approximation quality of attributes of rough set dependency analysis. The first procedure tests the casualness of a prediction to ensure that the prediction is not based on only a few (casual) observations. The second procedure tests the conditional casualness of an attribute within a prediction rule.The procedures are applied to three data sets, originally published in the context of rough set analysis. We argue that several claims of these analyses need to be modified because of lacking validity, and that other possibly significant results were overlooked.

Summary (3 min read)

1 Introduction

  • The methods will be applied to three different data sets: .
  • It utilizes rough set analysis to describe patients after highly selective vagotomy (HSV) for duodenal ulcer.
  • The authors show how statistical methodswithin rough set analysis highlight some of their results in a different way.

2 Rough set data analysis

  • Of particular interest in rough set dependency theory are those setsQ which use the least number of attributes, and still haveQ → P .
  • The intersection of all reducts ofP is called thecore ofP .
  • For eachR ⊆ Ω letPR be the partition ofU induced byθR. Define γQ(P ) = ∑ X∈PP |XθQ | |U | .(2.2) γQ(P ) is the relative frequency of the number of correctlyQ–classified elements with respect to the partition induced byP .
  • The larger the difference, the more important one regards the contribution ofq.

3.1 Casual dependencies

  • In the sequel the authors consider the case that a ruleQ → P was givenbeforeperforming the data analysis, and not obtained by optimizing the quality of approximation.
  • The latter needs additional treatment and will be discussed briefly in Section 3.5.
  • U} which preserves the cardinality of the classes.
  • Standard randomization techniques – for example Manly (1991), Chapter 1 – can now be applied to estimate this probability.
  • To decide whether the given rule is casual under the statistical assumption, the authors have to consider all 720 possible rules{σ(p), σ(q)} → d and their approximation qualities.

3.2 How the randomization procedure works

  • The proposed randomization test procedure is one way to model errors in terms of a statistical approach.
  • Because their approach is aimed to test the casualness of a rule system – and assume for a moment that this assumption really holds –, the assumption of representativeness is a problem of any analysis in most real life data bases.
  • Any observation within the other six classes ofθQ was randomly assigned to one of the three classes ofθP .
  • The percentage of the three rules – which is the true value of the approximation qualityγ – is varied by γ 0.0 0.1 0.2 0.3 Figure 1 shows the problem of granularity: GivenN = 10 observations and a true value ofγ = 0.0, the expectation of̂γ is about0.32; the granularity overshoot vanishes at aboutN = 40.
  • The power curves of an effectγ > 0.0 show that the randomization test has a reasonable power – at least in the chosen situation.

3.3 Computational considerations

  • It is well known that randomization is a rather expensive procedure, and one might have objections against this technique because of its cost in real life applications.
  • Iff(N ) is the time complexity for performing the computation of γ, the time complexity of the simulation based randomization procedure is1000f(N ).
  • If randomization is too costly for a data set, RSDA itself will not be applicable in this case.
  • Some simple short cuts such as a check whether the entropy of theQ partition is nearlog2(N ) may avoid superfluous computation.
  • For their re-analysis of the published data sets below it was not necessary to speed up the computations.

3.4 Conditional casual attributes

  • In rough set analysis, the decline of the approximation quality when omitting one attribute is usually used to determine whether an attribute within a minimal determining set is of high value for the prediction.
  • This approach does not take into account that the decline of approximation quality may be due to chance.
  • Assume that an additional attributer is conceptualized in three different ways: A fine grained measurer1 using 8 categories, A medium grained descriptionr2 using 4 categories.
  • Therefore the authors cannot trust the rules derived from the description{q, r1} → p, because the attributer1 is exchangeable with any random generated attributes = σ(r1).
  • Whereas the statistical evaluation of the additional predictive power of the three chosen attribute differs, the analysis of the decline of the approximation quality tells us nothingabout these differences.

3.5 Cross validation of learned dependencies

  • If rough set analysis is used to learn the best subset ofΩ to determineP , a simple randomization procedure is not sufficient, because it does not reflect the optimization of the learning procedure.
  • Within the test subset the same procedure can be used to validate the chosen attributes.
  • If the test procedure does not show a significant result, there are too few rules which can be used to predict the decision attributes from the learned attributes.
  • Note, that these rules need not be the same as those in the learning subset!.
  • If the additional attribute is conditional casual, the hypothesis that the rules in both sets of objects are identical should be kept.

4.1 Duodenal ulcer data

  • All data used in this paper are obtainable fromftp://luce.psycho.uni-osnabrueck.de/.
  • Pawlak et al. (1986) obtained – using rough set analysis – that the attribute setR, consisting of 3: Duration of disease 4: Complication 5: Basic HCI concentration 6: Basic Vol. of gastric juice 9: Stimulated HCI concentration 10: Stimulated Vol. of gastric juice suffices to predict attribute 12 (“Visick grading”).
  • The attribute set discussed in Pawlak et al. (1986) was based on a reduct searching procedure.
  • In order to discuss the cross validation procedure, the authors split the data set into 2 subsets containing 61 cases each.
  • Furthermore, the result suggests a reduction of the number of attributes withinR, because all attributes are conditional casual.

4.2 Earthquake data

  • In Teghem & Benjelloun (1992), the authors search for premonitory factors for earthquakes by emphasizing gas geochemistry.
  • The partition attribute (attribute 16) was the seismic activity on 155 days measured on the Richter scale.
  • The other attributes were radon concentration measured at 8 different locations (attributes 1-8) and 7 measures of climatic factors (attributes 9-15).
  • A problem with the information system was that it has an empty core with respect to attribute 16, and that an evaluation of some reducts turned out to be difficult.
  • The statistical evaluation of some of the information systems proposed by Teghem & Benjelloun (1992) gives us additional insights (Tab. 6).

5 Conclusion

  • Gathering evidence in procedures of Artificial Intelligence should not be based upon casual observations.
  • The authors approach shows how – in principle – a system using the rough set dependency analysis will defend itself against randomness.
  • The reanalysis of three published data sets shows that there is an urgent need for such a technique: Parts of the claimed results using the first two data sets are invalidated, some promising dependencies are overlooked and, as the authors show using data of Study 1, their proposed cross–validation technique offers a new horizon for the interpretation.
  • Concerning Study 3, the conclusions of the authors are validated.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Statistical Evaluation of Rough Set Dependency Analysis
Ivo Düntsch
1
School of Information and Software Engineering
University of Ulster
Newtownabbey, BT 37 0QB, N.Ireland
I.Duentsch@ulst.ac.uk
Günther Gediga
1
FB Psychologie / Methodenlehre
Universität Osnabck
49069 Osnabrück, Germany
gg@Luce.Psycho.Uni-Osnabrueck.DE
and
Institut für semantische Informationsverarbeitung
Universität Osnabck
December 12, 1996
1
Equal authorship implied

Summary
Rough set data analysis (RSDA) has recently become a frequently studied symbolic method in data
mining. Among other things, it is being used for the extraction of rules from databases; it is, however,
not clear from within the methods of rough set analysis, whether the extracted rules are valid.
In this paper, we suggest to enhance RSDA by two simple statistical procedures, both based on ran-
domization techniques, to evaluate the validity of prediction based on the approximation quality of
attributes of rough set dependency analysis. The first procedure tests the casualness of a prediction
to ensure that the prediction is not based on only a few (casual) observations. The second procedure
tests the conditional casualness of an attribute within a prediction rule.
The procedures are applied to three data sets, originally publishedin the context of rough set analysis.
We argue that several claims of these analyses need to be modified because of lacking validity, and
that other possibly significant results were overlooked.
Keywords: Rough sets, dependency analysis, statistical evaluation, validation, randomization test

1 Introduction
Rough set analysis, an emerging technology in artificial intelligence (Pawlak et al. (1995)), has been
compared with statistical models, see for example Wong et al. (1986), Krusi´nska et al. (1992a) or
Krusi´nska et al. (1992b). One area of application of rough set theory is the extraction of rules from
databases; these rules then are sometimes claimed tobe usefulfor future decisionmaking or prediction
of events. However, if such a rule is based on only a few observations, its usefulness for prediction is
arguable (see also Krusi´nska et al. (1992a), p 253 in this context).
The aim of this paper is to employ statistical methods which are compatible with the rough set phi-
losophy to evaluate the “prediction quality” of rough set dependency analysis. The methods will be
applied to three different data sets:
The rst set was publishedin Pawlak et al. (1986) and Słowi´nski & Słowi´nski (1990). It utilizes
rough set analysisto describe patientsafter highlyselectivevagotomy (HSV) for duodenalulcer.
The statistical validity of the conclusions will be discussed.
The second example is the discussion of earthquake data published by Teghem & Charlet
(1992). The main reason why we use this example is that it demonstrates the applicability of
our approach in the situation when the prediction success is perfect in terms of rough analysis.
The third example is used by Teghem & Benjelloun (1992) to compare statistical and rough set
methods. We show how statistical methods within rough set analysis highlight some of their
results in a different way.
2 Rough set data analysis
A major area of application of rough set theory is the study of dependencies among attributes of
information systems. An information system S = hU, ,V
q
,fi
q
consists of
1. A set U of objects,
2. A nite set of attributes,
3. For each q asetV
q
of attribute values,
4. An information function f : U × V
def
=
S
qQ
V
q
with f(x, q) V
q
for all x U, q .
We think of the descriptor f(x, q) as the value which object x takes at attribute q.
With each Q we associate an equivalence relation θ
Q
on U by
x y (θ
Q
)
def
⇐⇒ f(x, q)=f(y, q) for all q Q.
If x U ,thenθ
Q
x is the equivalence class of θ
Q
containing x.
1

Intuitively, x y (θ
Q
) if the objects x and y are indiscernible with respect to the values of their
attributes from Q. If X U,thenthe lower approximation of X by Q
X
θ
Q
=
[
{θ
Q
x : θ
Q
x X}
is the set of all correctly classified elements of X with respect toθ
Q
, i.e. with the information available
from the attributes given in Q.
Suppose that P, Q . We say that P is dependent on Q written as Q P if every class of θ
P
is a union of classes of θ
Q
. In other words, the classification of U induced by θ
P
can be expressed by
the classification induced by θ
Q
.
In order to simplify notation we shall in the sequel usually write Q p instead of Q →{p} and θ
p
instead of θ
{p}
.
Each dependency Q P leads to a set of rules as follows: Suppose that Q
def
= {q
0
,...,q
n
},and
P
def
= {p
0
,...,p
k
}. For each set {t
0
,...,t
n
} where t
i
V
q
i
there is a uniquely determined set
{s
0
,...,s
k
} with s
i
V
p
i
such that
(x U)[f (x, q
0
)=t
0
···∧f(x, q
n
)=t
n
) (f(x, p
0
)=s
0
···∧f(x, p
k
)=s
k
)].(2.1)
Of particular interest in rough set dependency theory are those sets Q which use the least number of
attributes, and still have Q P . A set with this property called a minimal determining set for P .In
other words, a set Q is minimal determining for P ,ifQ P ,andR 6→ P for all R
(
Q.
If such Q is a subset of P we call Q a reduct of P. It is not hard to see, that each P has a reduct,
though this need not be unique. The intersection of all reducts of P is called the core of P .UnlessP
has only one reduct, the core of P is not itself a reduct.
For each R let P
R
be the partition of U induced by θ
R
.Dene
γ
Q
(P )=
P
X∈P
P
|X
θ
Q
|
|U |
.(2.2)
γ
Q
(P ) is the relative frequency of the number of correctly Q–classified elements with respect to
the partition induced by P . It is usually interpreted in rough set analysis as a measurement of the
prediction success of a set of inference rules based on value combinations of Q and value combinations
of P of the form given in (2.1). The prediction success is perfect, if γ
Q
(P )=1; in this case, Q P .
Suppose that Q is a reduct of P ,sothatQ P ,andQ \{q}6P for any q Q. In rough
set theory, the impact of attribute q on the fact that Q P is usually measured by the drop of the
approximation function γ from 1 to γ
Q\{q}
(P ): The larger the difference, the more important one
regards the contribution of q. We shall show below that this interpretation needs to be taken with care
in some cases, and additional statistical evidence may be needed.
2

3 Casual rules and randomization analysis
3.1 Casual dependencies
In the sequel we consider the case that a rule Q P was given before performing the data analysis,
and not obtained by optimizing the quality of approximation. The latter needs additional treatment
andwillbediscussedbrieyinSection3.5.
Suppose that θ
Q
is the identity relation id
U
on U. Then, θ
Q
θ
P
for all P ,i.e.Q P for
all P . Furthermore, each class of θ
Q
consists of exactly one element, and therefore, any rule
Q P is based on exactly one observation. We call such a rule deterministic casual.
If a rule is not deterministic casual, it nevertheless may be based on a few observationsonly, and thus,
its prediction quality could be limited; such rules may be called casual. Therefore, the need arises for
a statistical procedure which tests the casualness of a rule based on mechanisms of rough set analysis.
Assume that theinformation system is the realization of a randomprocessin which the attribute values
of Q and P are realized independently of each other. If no additional information is present, it may be
assumed that the attribute value combinations within Q and P are fixed and the matching of the Q, P
combinations is drawn at random.
Let σ be a permutation of U ,andQ . We define a new information function f
σ(Q)
by
f
σ(Q)
(x, r)
def
=
f(σ(x),r)), if r Q,
f(x, r), otherwise,
and let γ
σ(Q)
(P ) be the approximation of the prediction of P by Q in the new information system.
Note that the structure of the equivalence relation θ
σ(Q)
determined by Q in the revised system is the
same as that of the original θ
Q
. In other words, there is a bijective mapping
τ : {θ
σ(Q)
x : x U}→{θ
Q
x : x U}
which preserves the cardinality of the classes. In particular, if θ
Q
is the identity on U ,soisθ
σ(Q)
.It
follows that for a rule Q p with θ
Q
= id
U
,wehaveγ
σ(Q)
(p)=1as well for all permutations σ of
U.
The distribution of the prediction success is given by the set
R
P,Q
def
= { γ
σ(Q)
(P ):σ a permutation of U }.
Let H be the null hypothesis;we have to estimate the position of the observed approximation quality
γ
obs
def
= γ
Q
(P ) in the set R
P,Q
, i.e. to estimate the probability p(γ
R
γ
obs
|H). Standard ran-
domization techniques for example Manly (1991), Chapter 1 can now be applied to estimate this
probability.
If p(γ
R
γ
obs
|H) is low conventionally in the upper 5% region –, the assumption of randomness
can be rejected, otherwise, if
p(γ
R
γ
obs
|H) > 0.05,
we call the rule (random) casual.
3

Citations
More filters
01 Jan 2009

36 citations

Proceedings ArticleDOI
15 Apr 2013
TL;DR: A novel approach based on Bijective soft sets for the generation of classification rules from the data set is presented and the generated rules are compared to the well-known decision tree classifier algorithm and Naïve bayes.
Abstract: Classification is one of the main issues in Data Mining Research fields. The classification difficulties in medical area frequently classify medical dataset based on the result of medical diagnosis or description of medical treatment by the medical specialist. The Extensive amounts of information and data warehouse in medical databases need the development of specialized tools for storing, retrieving, investigation, and effectiveness usage of stored knowledge and data. Intelligent methods such as neural networks, fuzzy sets, decision trees, and expert systems are, slowly but steadily, applied in the medical fields. Recently, Bijective soft set theory has been proposed as a new intelligent technique for the discovery of data dependencies, data reduction, classification and rule generation from databases. In this paper, we present a novel approach based on Bijective soft sets for the generation of classification rules from the data set. Investigational results from applying the Bijective soft set analysis to the set of data samples are given and evaluated. In addition, the generated rules are also compared to the well-known decision tree classifier algorithm and Naive bayes. The learning illustrates that the theory of Bijective soft set seems to be a valuable tool for inductive learning and provides a valuable support for building expert systems.

36 citations

Journal ArticleDOI
TL;DR: An improvement to Pawlak's model is discussed and a new attribute dependency function is presented based on decision-relative discernibility matrices and measures how many times condition attributes are used to determine the decision value by referring to the matrix.

35 citations


Cites background from "Statistical evaluation of rough set..."

  • ...Düntsch and Gediga [9] pointed out that Pawlak’s model, as defined above, is inadequate....

    [...]

  • ...Despite all these models, the problem discussed in [9] is no closer to being solved....

    [...]

  • ...However, as Düntsch and Gediga [9] pointed out, Pawlak’s model is inadequate in the computation of the dependency degree....

    [...]

Journal ArticleDOI
TL;DR: A simple way to improve the statistical strength of rules obtained by rough set data analysis is suggested by identifying attribute values and investigating the resulting information system, to reduce the granularity within attributes without assuming external structural information.

32 citations


Cites background from "Statistical evaluation of rough set..."

  • ...Assume w.l.o.g. thatfq(x) = v, fq(y) = w, andv 6= w....

    [...]

  • ...There is a long standing tradition (for example [1,9]) to distinguish betweensymmetricandasymmetricbinary attributes....

    [...]

Journal ArticleDOI
TL;DR: A sequence of papers and conference contributions have developed the components of a non‐invasive method of data analysis, which is based on the RSDA principle, but is not restricted to “classical” RSDA applications.
Abstract: Rough set data analysis (RSDA), introduced by Pawlak, has become a much researched method of knowledge discovery with over 1200 publications to date. One feature which distinguishes RSDA from other data analysis methods is that, in its original form, it gathers all its information from the given data, and does not make external model assumptions as all statistical and most machine learning methods (including decision tree procedures) do. The price which needs to be paid for the parsimony of this approach, however, is that some statistical backup is required, for example, to deal with random influences to which the observed data may be subjected. In supplementing RSDA by such meta-procedures care has to be taken that the same non-invasive principles are applied. In a sequence of papers and conference contributions, we have developed the components of a non-invasive method of data analysis, which is based on the RSDA principle, but is not restricted to “classical” RSDA applications. In this article, we present for the first time in a unified way the foundation and tools of such rough information analysis. © 2001 John Wiley & Sons, Inc.

32 citations


Cites background or methods from "Statistical evaluation of rough set..."

  • ...Details and more examples can be found in [ 15 ]....

    [...]

  • ...One the other hand, the statistical rough set analysis of [ 15 ] presented in the next section shows that there is no evidence that this dependency is not due to chance....

    [...]

  • ...In [ 15 ] we have developed two procedures, both based on randomization techniques, which evaluate the validity of prediction based on the approximation quality of attributes of rough set dependency analysis....

    [...]

References
More filters
Book
01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

37,183 citations

Journal ArticleDOI

14,009 citations


"Statistical evaluation of rough set..." refers methods in this paper

  • ...Teghem & Charlet (1992) use the famous Iris data first published by Fisher (1936) to show the applicability of rough set dependency analysis for problems normally treated by discriminant analysis....

    [...]

Journal ArticleDOI
TL;DR: This approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning, and pattern recognition.
Abstract: Rough set theory, introduced by Zdzislaw Pawlak in the early 1980s [11, 12], is a new mathematical tool to deal with vagueness and uncertainty. This approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning, and pattern recognition.

7,185 citations

Book
01 Jan 1980
TL;DR: The writer really shows how the simple words can maximize how the impression of this book is uttered directly for the readers.
Abstract: Every word to utter from the writer involves the element of this life. The writer really shows how the simple words can maximize how the impression of this book is uttered directly for the readers. Even you have known about the content of randomization tests so much, you can easily do it for your better connection. In delivering the presence of the book concept, you can find out the boo site here.

1,999 citations

Book
01 Jan 1991
TL;DR: This book discusses the construction of tests in non-standard situations testing for randomness of species co-occurences on islands examining time change in niche ovelap probing multivariate data with random skewers other examples.
Abstract: Part 1 Randomization tests and confidence intervals: the idea of a randomization test examples of a randomization test aspects of randomization testing raised by the examples confidence intervals from randomization. Part 2 Monte Carlo and other computer intensive methods: Monte Carlo tests jackknifing bootstrapping bootstrap tests of significance and confidence intervals. Part 3 Some general considerations: power determining how many randomizations are needed determining a randomization distribution exactly the computer generation of pseudo-random numbers generating random permutations. Part 4 One and two sample tests: the paired comparisons design the one sample randomization test the two sample randomization test the comparison of two samples on multiple measurements. Part 5 Analysis of variance: one factor analysis of variance Bartlett's test for constant variance examples of more complicated types of analysis of variance discussion computer program. Part 6 Regrssion analysis: simple regression testing for a non-zero beta value confidence limits for beta multiple linear regression randomizing X variable values. Part 7 Distance matrices and spatial data: testing for association between distance matrices Mantel's test determining significance by sampling randomization distribution confidence limits for a matrix regression coefficient problems involving more than two matrices. Part 8 Other analyses on spatial data: the study of spatial point patterns Mead's randomization test a test based on nearest neighbour distances testing for an association between two point patterns the Besag-Diggle test tests using distances between points. Part 9 Time series: randomization and time series randomization tests for serial correlation randomization tests for trend randomization tests for periodicity irregularly spaced series tests on times of occurence discussion of procedures for irregular series bootstrap and Monte Carlo tests. Part 10 Multivariate data: univariate and multivariate tests sample means and covariance matrices comparison on sample means vectors chi-squared analyses for count data principal component analysis and other one sample methods discriminate function analysis. Part 11 Ad hoc methods: the construction of tests in non-standard situations testing for randomness of species co-occurences on islands examining time change in niche ovelap probing multivariate data with random skewers other examples. Part 12 Conclusion: randomization methods bootstrap and Monte Carlo methods.

1,705 citations

Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Statistical evaluation of rough set dependency analysis" ?

In this paper, the authors employ statistical methods which are compatible with the rough set philosophy to evaluate the `` prediction quality '' of rough set dependency analysis.