Open AccessProceedings Article
A Bayesian Approach to Filtering Junk E-Mail
TLDR
This work examines methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream, and shows the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.Abstract:
In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream. By casting this problem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunction with a notion of differential misclassification cost to produce filters Which are especially appropriate for the nuances of this task. While this may appear, at first, to be a straight-forward text classification problem, we show that by considering domain-specific features of this problem in addition to the raw text of E-mail messages, we can produce much more accurate filters. Finally, we show the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.read more
Citations
More filters
Book
Data Mining: Practical Machine Learning Tools and Techniques
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Proceedings ArticleDOI
Hierarchical Attention Networks for Document Classification
TL;DR: Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.
Proceedings Article
A comparison of event models for naive bayes text classification
Andrew McCallum,Kamal Nigam +1 more
TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Journal ArticleDOI
Text Classification from Labeled and Unlabeled Documents using EM
TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book
The Nature of Statistical Learning Theory
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Book
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Journal ArticleDOI
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.