A Sequential Algorithm for Training Text Classifiers

Open AccessPosted Content

A Sequential Algorithm for Training Text Classifiers

David D. Lewis, +1 more

- 24 Jul 1994 -

arXiv: Computation and Language

Chats0

TLDR

An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task and reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

Abstract:

The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

Citations

PDF

Open Access

More filters

Active Learning Literature Survey

Burr Settles

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

Proceedings Article

A comparison of event models for naive bayes text classification

Andrew McCallum, +1 more

TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.

...read moreread less

Journal ArticleDOI

Support vector machine active learning with applications to text classification

Simon Tong, +1 more

- 01 Mar 2002 -

Journal of Machine Learning Research

TL;DR: Experimental results showing that employing the active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings are presented.

...read moreread less

Journal ArticleDOI

Text Classification from Labeled and Unlabeled Documents using EM

Kamal Nigam, +3 more

- 01 May 2000 -

Machine Learning

TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.

...read moreread less

Book ChapterDOI

Naive (Bayes) at forty: the independence assumption in information retrieval

David D. Lewis

TL;DR: The naive Bayes classifier, currently experiencing a renaissance in machine learning, has long been a core technique in information retrieval, and some of the variations used for text retrieval and classification are reviewed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Pattern Classification and Scene Analysis.

Ulf Grenander, +2 more

- 01 Sep 1974 -

Journal of the American Statistical Asso...

Book

Generalized Linear Models

John H. Schuenemeyer, +2 more

Proceedings ArticleDOI

Query by committee

H. S. Seung, +2 more

TL;DR: It is suggested that asymptotically finite information gain may be an important characteristic of good query algorithms, in which a committee of students is trained on the same data set.

...read moreread less

Journal ArticleDOI

Generalization as search

Tom M. Mitchell

- 01 Mar 1982 -

Artificial Intelligence

TL;DR: The problem of concept learning, or forming a general description of a class of objects given a set of examples and non-examples, is viewed here as a search problem.

...read moreread less

Journal ArticleDOI

The condensed nearest neighbor rule (Corresp.)

Peter E. Hart

- 01 May 1968 -

IEEE Transactions on Information Theory

A Sequential Algorithm for Training Text Classifiers

Citations

Active Learning Literature Survey

A comparison of event models for naive bayes text classification

Support vector machine active learning with applications to text classification

Text Classification from Labeled and Unlabeled Documents using EM

Naive (Bayes) at forty: the independence assumption in information retrieval

References

Pattern Classification and Scene Analysis.

Generalized Linear Models

Query by committee

Generalization as search

The condensed nearest neighbor rule (Corresp.)

Related Papers (5)

A sequential algorithm for training text classifiers

Document Retrieval with Unlimited Vocabulary

Reducing the human overhead in text categorization

Active learning approach using a modified least confidence sampling strategy for named entity recognition

Training Data Cleaning for Text Classification