scispace - formally typeset
Open AccessProceedings Article

The Optimality of Naive Bayes.

Reads0
Chats0
TLDR
A sufficient condition for the optimality of naive Bayes is presented and proved, in which the dependence between attributes do exist, and evidence that dependence among attributes may cancel out each other is provided.
Abstract
Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based, is rarely true in realworld applications. An open question is: what is the true reason for the surprisingly good performance of naive Bayes in classification? In this paper, we propose a novel explanation on the superb classification performance of naive Bayes. We show that, essentially, the dependence distribution; i.e., how the local dependence of a node distributes in each class, evenly or unevenly, and how the local dependencies of all nodes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out), plays a crucial role. Therefore, no matter how strong the dependences among attributes are, naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary conditions for the optimality of naive Bayes. Further, we investigate the optimality of naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of naive Bayes, in which the dependence between attributes do exist. This provides evidence that dependence among attributes may cancel out each other. In addition, we explore when naive Bayes works well. Naive Bayes and Augmented Naive Bayes Classification is a fundamental issue in machine learning and data mining. In classification, the goal of a learning algorithm is to construct a classifier given a set of training examples with class labels. Typically, an example E is represented by a tuple of attribute values (x1, x2, , · · · , xn), where xi is the value of attribute Xi. Let C represent the classification variable, and let c be the value of C. In this paper, we assume that there are only two classes: + (the positive class) or − (the negative class). A classifier is a function that assigns a class label to an example. From the probability perspective, according to Bayes Copyright c © 2004, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Rule, the probability of an example E = (x1, x2, · · · , xn) being class c is p(c|E) = p(E|c)p(c) p(E) . E is classified as the class C = + if and only if fb(E) = p(C = +|E) p(C = −|E) ≥ 1, (1) where fb(E) is called a Bayesian classifier. Assume that all attributes are independent given the value of the class variable; that is, p(E|c) = p(x1, x2, · · · , xn|c) = n ∏

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Survey on Human Activity Recognition using Wearable Sensors

TL;DR: The state of the art in HAR based on wearable sensors is surveyed and a two-level taxonomy in accordance to the learning approach and the response time is proposed.
Book

Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library

TL;DR: Whether you want to build simple or sophisticated vision applications, Learning OpenCV is the book any developer or hobbyist needs to get started, with the help of hands-on exercises in each chapter.
Journal ArticleDOI

Machine Learning for Internet of Things Data Analysis: A Survey

TL;DR: This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case and presents a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information.
Book

Authorship Attribution

TL;DR: This review shows that the authorship attribution discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.
Journal ArticleDOI

Facing the cold start problem in recommender systems

TL;DR: This paper proposes a model where widely known classification algorithms in combination with similarity techniques and prediction mechanisms provide the necessary means for retrieving recommendations in RSs, and adopts the widely known dataset provided by the GroupLens research group.
References
More filters
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Proceedings Article

An analysis of Bayesian classifiers

TL;DR: An average-case analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks, and explores the behavioral implications of the analysis by presenting predicted learning curves for artificial domains.
Journal ArticleDOI

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

TL;DR: In this article, it was shown that the bias and variance components of the estimation error combine to influence classification in a very different way than with squared error on the probabilities themselves, and that certain types of (very high) bias can be canceled by low variance to produce accurate classification.
Journal ArticleDOI

Idiot's Bayes—Not So Stupid After All?

TL;DR: In this article, the authors examine the evidence for this, both empirical, as observed in real data applications, and theoretical, summarising explanations for why this simple rule might be effective.
Proceedings Article

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.

TL;DR: The simple Bayesian classi er (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences as mentioned in this paper.
Related Papers (5)