Open AccessProceedings Article
Estimating continuous distributions in Bayesian classifiers
George H. John,Pat Langley +1 more
- pp 338-345
Reads0
Chats0
TLDR
In this paper, the authors use statistical methods for nonparametric density estimation for a naive Bayesian classifier, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using non-parametric kernel density estimation.Abstract:
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.read more
Citations
More filters
Book
Data Mining: Practical Machine Learning Tools and Techniques
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI
Bayesian Network Classifiers
TL;DR: Tree Augmented Naive Bayes (TAN) is single out, which outperforms naive Bayes, yet at the same time maintains the computational simplicity and robustness that characterize naive Baye.
Proceedings ArticleDOI
A detailed analysis of the KDD CUP 99 data set
TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
Journal ArticleDOI
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
TL;DR: The Bayesian classifier is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption, and will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain.
Journal ArticleDOI
Do we need hundreds of classifiers to solve real world classification problems
TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
BookDOI
Density estimation for statistics and data analysis
TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Journal ArticleDOI
A Bayesian Method for the Induction of Probabilistic Networks from Data
TL;DR: This paper presents a Bayesian method for constructing probabilistic networks from databases, focusing on constructing Bayesian belief networks, and extends the basic method to handle missing data and hidden variables.