A sequential algorithm for training text classifiers

doi:10.5555/188490.188495

Open AccessProceedings ArticleDOI

A sequential algorithm for training text classifiers

David D. Lewis, +1 more

- pp 3-12

Chats0

TLDR

In this article, an algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task, which reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

Abstract:

The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Machine learning in automated text categorization

Fabrizio Sebastiani

- 01 Mar 2002 -

ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Active Learning Literature Survey

Burr Settles

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

Proceedings Article

A comparison of event models for naive bayes text classification

Andrew McCallum, +1 more

TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.

...read moreread less

Journal ArticleDOI

Support vector machine active learning with applications to text classification

Simon Tong, +1 more

- 01 Mar 2002 -

Journal of Machine Learning Research

TL;DR: Experimental results showing that employing the active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings are presented.

...read moreread less

Journal ArticleDOI

Text Classification from Labeled and Unlabeled Documents using EM

Kamal Nigam, +3 more

- 01 May 2000 -

Machine Learning

TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Generalized Linear Models

Peter McCullagh, +1 more

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Book

Pattern classification and scene analysis

Richard O. Duda, +1 more

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Journal ArticleDOI

Queries and Concept Learning

Dana Angluin

- 01 Apr 1988 -

Machine Learning

TL;DR: This work considers the problem of using queries to learn an unknown concept, and several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries.

...read moreread less

Proceedings ArticleDOI

Query by committee

H. S. Seung, +2 more

TL;DR: It is suggested that asymptotically finite information gain may be an important characteristic of good query algorithms, in which a committee of students is trained on the same data set.

...read moreread less

Journal ArticleDOI

Improving Generalization with Active Learning

David Cohn, +2 more

- 01 May 1994 -

Machine Learning

TL;DR: A formalism for active concept learning called selective sampling is described and it is shown how it may be approximately implemented by a neural network.

...read moreread less

Collapse

A sequential algorithm for training text classifiers

Citations

Machine learning in automated text categorization

Active Learning Literature Survey

A comparison of event models for naive bayes text classification

Support vector machine active learning with applications to text classification

Text Classification from Labeled and Unlabeled Documents using EM

References

Generalized Linear Models

Pattern classification and scene analysis

Queries and Concept Learning

Query by committee

Improving Generalization with Active Learning

Related Papers (5)

Active Learning Literature Survey

Query by committee

Support vector machine active learning with applications to text classification

Selective Sampling Using the Query by Committee Algorithm

Improving Generalization with Active Learning