Estimating continuous distributions in Bayesian classifiers

Open AccessProceedings Article

Estimating continuous distributions in Bayesian classifiers

George H. John, +1 more

- pp 338-345

Chats0

TLDR

In this paper, the authors use statistical methods for nonparametric density estimation for a naive Bayesian classifier, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using non-parametric kernel density estimation.

Abstract:

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

Citations

PDF

Open Access

More filters

Book

Data Mining: Practical Machine Learning Tools and Techniques

Ian H. Witten, +2 more

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Journal ArticleDOI

Bayesian Network Classifiers

Nir Friedman, +2 more

- 01 Nov 1997 -

Machine Learning

TL;DR: Tree Augmented Naive Bayes (TAN) is single out, which outperforms naive Bayes, yet at the same time maintains the computational simplicity and robustness that characterize naive Baye.

...read moreread less

Proceedings ArticleDOI

A detailed analysis of the KDD CUP 99 data set

Mahbod Tavallaee, +3 more

TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.

...read moreread less

Journal ArticleDOI

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Pedro Domingos, +1 more

- 01 Nov 1997 -

Machine Learning

TL;DR: The Bayesian classifier is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption, and will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain.

...read moreread less

Journal ArticleDOI

Do we need hundreds of classifiers to solve real world classification problems

Manuel Fernández-Delgado, +3 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

BookDOI

Density estimation for statistics and data analysis

Bernard W. Silverman

TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.

...read moreread less

Journal ArticleDOI

Generalized Additive Models.

R. A. Brown, +2 more

- 01 Jun 1991 -

Biometrics

Journal ArticleDOI

A Bayesian Method for the Induction of Probabilistic Networks from Data

Gregory F. Cooper, +1 more

- 01 Oct 1992 -

Machine Learning

TL;DR: This paper presents a Bayesian method for constructing probabilistic networks from databases, focusing on constructing Bayesian belief networks, and extends the basic method to handle missing data and hidden variables.

...read moreread less

Journal ArticleDOI

Modern Applied Statistics with S-Plus.

W. N. Venables, +1 more

- 01 Dec 1996 -

Biometrics

Estimating continuous distributions in Bayesian classifiers

Citations

Data Mining: Practical Machine Learning Tools and Techniques

Bayesian Network Classifiers

A detailed analysis of the KDD CUP 99 data set

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Do we need hundreds of classifiers to solve real world classification problems

References

Maximum likelihood from incomplete data via the EM algorithm

Density estimation for statistics and data analysis

Generalized Additive Models.

A Bayesian Method for the Induction of Probabilistic Networks from Data

Modern Applied Statistics with S-Plus.

Related Papers (5)

C4.5: Programs for Machine Learning

Random Forests

The WEKA data mining software: an update

Data Mining: Practical Machine Learning Tools and Techniques

Data Mining