scispace - formally typeset
Open AccessProceedings Article

Estimating continuous distributions in Bayesian classifiers

George H. John, +1 more
- pp 338-345
Reads0
Chats0
TLDR
In this paper, the authors use statistical methods for nonparametric density estimation for a naive Bayesian classifier, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using non-parametric kernel density estimation.
Abstract
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

read more

Citations
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

Bayesian Network Classifiers

TL;DR: Tree Augmented Naive Bayes (TAN) is single out, which outperforms naive Bayes, yet at the same time maintains the computational simplicity and robustness that characterize naive Baye.
Proceedings ArticleDOI

A detailed analysis of the KDD CUP 99 data set

TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
Journal ArticleDOI

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

TL;DR: The Bayesian classifier is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption, and will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain.
Journal ArticleDOI

Do we need hundreds of classifiers to solve real world classification problems

TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
References
More filters
BookDOI

Density estimation for statistics and data analysis

TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Journal ArticleDOI

Generalized Additive Models.

Journal ArticleDOI

A Bayesian Method for the Induction of Probabilistic Networks from Data

TL;DR: This paper presents a Bayesian method for constructing probabilistic networks from databases, focusing on constructing Bayesian belief networks, and extends the basic method to handle missing data and hidden variables.
Journal ArticleDOI

Modern Applied Statistics with S-Plus.

W. N. Venables, +1 more
- 01 Dec 1996 -