The Optimality of Naive Bayes.

Open AccessProceedings Article

The Optimality of Naive Bayes.

Harry Zhang

- pp 562-567

Chats0

TLDR

A sufficient condition for the optimality of naive Bayes is presented and proved, in which the dependence between attributes do exist, and evidence that dependence among attributes may cancel out each other is provided.

Abstract:

Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based, is rarely true in realworld applications. An open question is: what is the true reason for the surprisingly good performance of naive Bayes in classification? In this paper, we propose a novel explanation on the superb classification performance of naive Bayes. We show that, essentially, the dependence distribution; i.e., how the local dependence of a node distributes in each class, evenly or unevenly, and how the local dependencies of all nodes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out), plays a crucial role. Therefore, no matter how strong the dependences among attributes are, naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary conditions for the optimality of naive Bayes. Further, we investigate the optimality of naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of naive Bayes, in which the dependence between attributes do exist. This provides evidence that dependence among attributes may cancel out each other. In addition, we explore when naive Bayes works well. Naive Bayes and Augmented Naive Bayes Classification is a fundamental issue in machine learning and data mining. In classification, the goal of a learning algorithm is to construct a classifier given a set of training examples with class labels. Typically, an example E is represented by a tuple of attribute values (x1, x2, , · · · , xn), where xi is the value of attribute Xi. Let C represent the classification variable, and let c be the value of C. In this paper, we assume that there are only two classes: + (the positive class) or − (the negative class). A classifier is a function that assigns a class label to an example. From the probability perspective, according to Bayes Copyright c © 2004, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Rule, the probability of an example E = (x1, x2, · · · , xn) being class c is p(c|E) = p(E|c)p(c) p(E) . E is classified as the class C = + if and only if fb(E) = p(C = +|E) p(C = −|E) ≥ 1, (1) where fb(E) is called a Bayesian classifier. Assume that all attributes are independent given the value of the class variable; that is, p(E|c) = p(x1, x2, · · · , xn|c) = n ∏

The Optimality of Naive Bayes.

Citations

A Survey on Human Activity Recognition using Wearable Sensors

Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library

Machine Learning for Internet of Things Data Analysis: A Survey

Authorship Attribution

Facing the cold start problem in recommender systems

References

C4.5: Programs for Machine Learning

An analysis of Bayesian classifiers

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Idiot's Bayes—Not So Stupid After All?

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.

Related Papers (5)

Random Forests

Support-Vector Networks

Scikit-learn: Machine Learning in Python

The WEKA data mining software: an update

C4.5: Programs for Machine Learning