A sequential algorithm for training text classifiers
David D. Lewis,William A. Gale +1 more
- pp 3-12
Reads0
Chats0
TLDR
In this article, an algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task, which reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.Abstract:
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.read more
Citations
More filters
Journal ArticleDOI
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Active Learning Literature Survey
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Proceedings Article
A comparison of event models for naive bayes text classification
Andrew McCallum,Kamal Nigam +1 more
TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Journal ArticleDOI
Support vector machine active learning with applications to text classification
Simon Tong,Daphne Koller +1 more
TL;DR: Experimental results showing that employing the active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings are presented.
Journal ArticleDOI
Text Classification from Labeled and Unlabeled Documents using EM
TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.
References
More filters
Book
Generalized Linear Models
Peter McCullagh,John A. Nelder +1 more
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Book
Pattern classification and scene analysis
Richard O. Duda,Peter E. Hart +1 more
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Journal ArticleDOI
Queries and Concept Learning
TL;DR: This work considers the problem of using queries to learn an unknown concept, and several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries.
Proceedings ArticleDOI
Query by committee
TL;DR: It is suggested that asymptotically finite information gain may be an important characteristic of good query algorithms, in which a committee of students is trained on the same data set.
Journal ArticleDOI
Improving Generalization with Active Learning
TL;DR: A formalism for active concept learning called selective sampling is described and it is shown how it may be approximately implemented by a neural network.