Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.

Open AccessProceedings Article

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.

- pp 105-112

TLDR

The simple Bayesian classi er (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences as mentioned in this paper.

Abstract:

The simple Bayesian classi er (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed so far. In this paper we show that the SBC does not in fact assume attribute independence, and can be optimal even when this assumption is violated by a wide margin. The key to this nding lies in the distinction between classi cation and probability estimation: correct classi cation can be achieved even when the probability estimates used contain large errors. We show that the previously-assumed region of optimality of the SBC is a second-order in nitesimal fraction of the actual one. This is followed by the derivation of several necessary and several su cient conditions for the optimality of the SBC. For example, the SBC is optimal for learning arbitrary conjunctions and disjunctions, even though they violate the independence assumption. The paper also reports empirical evidence of the SBC's competitive performance in domains containing substantial degrees of attribute dependence. 1 THE SIMPLE BAYESIAN

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.

Citations

Data Mining: Concepts and Techniques

Wrappers for feature subset selection

Learning from Imbalanced Data

Bayesian Network Classifiers

Simple Heuristics That Make Us Smart

References

C4.5: Programs for Machine Learning

UCI Repository of machine learning databases

Supervised and unsupervised discretization of continuous features

Pattern Classification and Scene Analysis

Rule Induction with CN2: Some Recent Improvements

Related Papers (5)

C4.5: Programs for Machine Learning

Pattern classification and scene analysis

Induction of Decision Trees

UCI Repository of machine learning databases

Data Mining: Practical Machine Learning Tools and Techniques