# Statistical evaluation of rough set dependency analysis

## Summary (3 min read)

### 1 Introduction

- Rough set analysis, an emerging technology in artificial intelligence (Pawlak et al. (1995)), has been compared with statistical models, see for example Wong et al. (1986), Krusi´nska et al. (1992a) or Krusińska et al. (1992b).
- The methods will be applied to three different data sets: •.
- The first set was published in Pawlak et al. (1986) and Słowi´nski & Słowiński (1990).
- It utilizes rough set analysis to describe patients after highly selective vagotomy (HSV) for duodenal ulcer.
- The authors show how statistical methodswithin rough set analysis highlight some of their results in a different way.

### 2 Rough set data analysis

- Aninformation systemS = 〈U, Ω, Vq, f〉q∈Ω consists of 1.
- Of particular interest in rough set dependency theory are those setsQ which use the least number of attributes, and still haveQ → P .
- The intersection of all reducts ofP is called thecore ofP .
- For eachR ⊆ Ω letPR be the partition ofU induced byθR. Define γQ(P ) = ∑ X∈PP |XθQ | |U | .(2.2) γQ(P ) is the relative frequency of the number of correctlyQ–classified elements with respect to the partition induced byP .
- The larger the difference, the more important one regards the contribution ofq.

### 3.1 Casual dependencies

- In the sequel the authors consider the case that a ruleQ → P was givenbeforeperforming the data analysis, and not obtained by optimizing the quality of approximation.
- The latter needs additional treatment and will be discussed briefly in Section 3.5.
- U} which preserves the cardinality of the classes.
- Standard randomization techniques – for example Manly (1991), Chapter 1 – can now be applied to estimate this probability.
- To decide whether the given rule is casual under the statistical assumption, the authors have to consider all 720 possible rules{σ(p), σ(q)} → d and their approximation qualities.

### 3.2 How the randomization procedure works

- The proposed randomization test procedure is one way to model errors in terms of a statistical approach.
- Because their approach is aimed to test the casualness of a rule system – and assume for a moment that this assumption really holds –, the assumption of representativeness is a problem of any analysis in most real life data bases.
- Any observation within the other six classes ofθQ was randomly assigned to one of the three classes ofθP .
- The percentage of the three rules – which is the true value of the approximation qualityγ – is varied by γ 0.0 0.1 0.2 0.3 Figure 1 shows the problem of granularity: GivenN = 10 observations and a true value ofγ = 0.0, the expectation of̂γ is about0.32; the granularity overshoot vanishes at aboutN = 40.
- The power curves of an effectγ > 0.0 show that the randomization test has a reasonable power – at least in the chosen situation.

### 3.3 Computational considerations

- It is well known that randomization is a rather expensive procedure, and one might have objections against this technique because of its cost in real life applications.
- Iff(N ) is the time complexity for performing the computation of γ, the time complexity of the simulation based randomization procedure is1000f(N ).
- If randomization is too costly for a data set, RSDA itself will not be applicable in this case.
- Some simple short cuts such as a check whether the entropy of theQ partition is nearlog2(N ) may avoid superfluous computation.
- For their re-analysis of the published data sets below it was not necessary to speed up the computations.

### 3.4 Conditional casual attributes

- In rough set analysis, the decline of the approximation quality when omitting one attribute is usually used to determine whether an attribute within a minimal determining set is of high value for the prediction.
- This approach does not take into account that the decline of approximation quality may be due to chance.
- Assume that an additional attributer is conceptualized in three different ways: • A fine grained measurer1 using 8 categories, • A medium grained descriptionr2 using 4 categories.
- Therefore the authors cannot trust the rules derived from the description{q, r1} → p, because the attributer1 is exchangeable with any random generated attributes = σ(r1).
- Whereas the statistical evaluation of the additional predictive power of the three chosen attribute differs, the analysis of the decline of the approximation quality tells us nothingabout these differences.

### 3.5 Cross validation of learned dependencies

- If rough set analysis is used to learn the best subset ofΩ to determineP , a simple randomization procedure is not sufficient, because it does not reflect the optimization of the learning procedure.
- Within the test subset the same procedure can be used to validate the chosen attributes.
- If the test procedure does not show a significant result, there are too few rules which can be used to predict the decision attributes from the learned attributes.
- Note, that these rules need not be the same as those in the learning subset!.
- If the additional attribute is conditional casual, the hypothesis that the rules in both sets of objects are identical should be kept.

### 4.1 Duodenal ulcer data

- All data used in this paper are obtainable fromftp://luce.psycho.uni-osnabrueck.de/.
- Pawlak et al. (1986) obtained – using rough set analysis – that the attribute setR, consisting of 3: Duration of disease 4: Complication 5: Basic HCI concentration 6: Basic Vol. of gastric juice 9: Stimulated HCI concentration 10: Stimulated Vol. of gastric juice suffices to predict attribute 12 (“Visick grading”).
- The attribute set discussed in Pawlak et al. (1986) was based on a reduct searching procedure.
- In order to discuss the cross validation procedure, the authors split the data set into 2 subsets containing 61 cases each.
- Furthermore, the result suggests a reduction of the number of attributes withinR, because all attributes are conditional casual.

### 4.2 Earthquake data

- In Teghem & Benjelloun (1992), the authors search for premonitory factors for earthquakes by emphasizing gas geochemistry.
- The partition attribute (attribute 16) was the seismic activity on 155 days measured on the Richter scale.
- The other attributes were radon concentration measured at 8 different locations (attributes 1-8) and 7 measures of climatic factors (attributes 9-15).
- A problem with the information system was that it has an empty core with respect to attribute 16, and that an evaluation of some reducts turned out to be difficult.
- The statistical evaluation of some of the information systems proposed by Teghem & Benjelloun (1992) gives us additional insights (Tab. 6).

### 4.3 Rough set analysis of Fisher’s Iris Data

- Teghem & Charlet (1992) use the famous Iris data first published by Fisher (1936) to show the applicability of rough set dependency analysis for problems normally treated by discriminant analysis.
- The setU consists of 150 flowers characterized by five attributes namely, 1. Petal length, 2. Petal width, 3. Sepal length, 4. Sepal width, and Table 7 validates the argument that only the attribute set{3, 4} should be used to predict the partition attribute.

### 5 Conclusion

- Gathering evidence in procedures of Artificial Intelligence should not be based upon casual observations.
- The authors approach shows how – in principle – a system using the rough set dependency analysis will defend itself against randomness.
- The reanalysis of three published data sets shows that there is an urgent need for such a technique: Parts of the claimed results using the first two data sets are invalidated, some promising dependencies are overlooked and, as the authors show using data of Study 1, their proposed cross–validation technique offers a new horizon for the interpretation.
- Concerning Study 3, the conclusions of the authors are validated.

Did you find this useful? Give us your feedback

##### Citations

763 citations

### Cites methods from "Statistical evaluation of rough set..."

...Some other methods based on non-invasive data analysis and rough sets are reported in (Duentsch and Gediga, 1997)....

[...]

379 citations

### Cites methods from "Statistical evaluation of rough set..."

...In Düntsch & Gediga (1997c) we have developed two simple procedures, both based on randomization techniques, which evaluate the validity of prediction based on the principle of indifference, which is the underlying statistics of RSDA; this technique is briefly described in Section 2.4....

[...]

...In other words, this step can be regarded as a part of the operationalization procedure; it can be implemented as a cheap standard algorithm if the decision attribute is fixed, for example, in our rough set engine GROBIAN (Düntsch & Gediga, 1997a)....

[...]

...SORES is implemented in our rough set engine GROBIAN (Düntsch & Gediga, 1997a)1....

[...]

...This is a general problem of data mining, and we have discussed it within the rough set approach in Düntsch & Gediga (1997c)....

[...]

...The relationship between RSDA and statistical modeling is quite complementary (see Table 1), and we have discussed it in more detail in Düntsch & Gediga (1997b)....

[...]

105 citations

### Cites methods from "Statistical evaluation of rough set..."

...Relevance is also widely used in dependency analysis, feature weighting and distance learning (Düntsch & Gediga, 1997; Wettschereck, Aha, & Mohri, 1997)....

[...]

90 citations

86 citations

##### References

36,497 citations

12,936 citations

### "Statistical evaluation of rough set..." refers methods in this paper

...Teghem & Charlet (1992) use the famous Iris data first published by Fisher (1936) to show the applicability of rough set dependency analysis for problems normally treated by discriminant analysis....

[...]

[...]

^{1}, Jerzy W. Grzymala-Busse

^{2}, Roman Słowiński

^{3}, Wojciech Ziarko

^{4}•Institutions (4)

7,155 citations

[...]

1,950 citations

1,690 citations