Open AccessPosted Content
A Sequential Algorithm for Training Text Classifiers
David D. Lewis,William A. Gale +1 more
Reads0
Chats0
TLDR
An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task and reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.Abstract:
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.read more
Citations
More filters
Active Learning Literature Survey
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Proceedings Article
A comparison of event models for naive bayes text classification
Andrew McCallum,Kamal Nigam +1 more
TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Journal ArticleDOI
Support vector machine active learning with applications to text classification
Simon Tong,Daphne Koller +1 more
TL;DR: Experimental results showing that employing the active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings are presented.
Journal ArticleDOI
Text Classification from Labeled and Unlabeled Documents using EM
TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.
Book ChapterDOI
Naive (Bayes) at forty: the independence assumption in information retrieval
TL;DR: The naive Bayes classifier, currently experiencing a renaissance in machine learning, has long been a core technique in information retrieval, and some of the variations used for text retrieval and classification are reviewed.
References
More filters
Journal ArticleDOI
Pattern Classification and Scene Analysis.
Proceedings ArticleDOI
Query by committee
TL;DR: It is suggested that asymptotically finite information gain may be an important characteristic of good query algorithms, in which a committee of students is trained on the same data set.
Journal ArticleDOI
Generalization as search
TL;DR: The problem of concept learning, or forming a general description of a class of objects given a set of examples and non-examples, is viewed here as a search problem.