An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data

Open AccessProceedings Article

An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data

Katrin Tomanek, +2 more

- pp 486-495

Chats0

TLDR

The issue whether a corpus annotated by means of AL can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well is addressed.

Abstract:

We consider the impact Active Learning (AL) has on effective and efficient text corpus annotation, and report on reduction rates for annotation efforts ranging up until 72%. We also address the issue whether a corpus annotated by means of AL ‐ using a particular classifier and a particular feature set ‐ can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well. We, finally, report on our experience with the AL paradigm under real-world conditions, i.e., the annotation of large-scale document corpora for the life sciences.

Citations

PDF

Open Access

More filters

Active Learning Literature Survey

Burr Settles

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

Book

Active Learning

Burr Settles

TL;DR: Active learning as discussed by the authors is a general approach that allows a machine learning algorithm to choose the data from which it learns by posing "queries", usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator) that already understands the nature of the problem.

...read moreread less

Journal ArticleDOI

Methodological Review: What can natural language processing do for clinical decision support?

Dina Demner-Fushman, +2 more

- 01 Oct 2009 -

Journal of Biomedical Informatics

TL;DR: This review focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS, and the current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals.

...read moreread less

A literature survey of active machine learning in the context of natural language processing

Fredrik Olsson

TL;DR: Active learning has been successfully applied to a number of natural language processing tasks, such as, information extraction, named entity recognition, text categorization, part-of-speech tagging, parsing, and word sense disambiguation.

...read moreread less

From Theories to Queries: Active Learning in Practice

Burr Settles

TL;DR: This article surveys recent work in active learning aimed at making it more practical for real-world use, and reviews some of the issues facing active learning in real ongoing learning systems and data annotation projects.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +3 more

Proceedings ArticleDOI

Combining labeled and unlabeled data with co-training

Avrim Blum, +1 more

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

...read moreread less

Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Erik Tjong Kim Sang, +1 more

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.

...read moreread less

Journal ArticleDOI

A maximum entropy approach to natural language processing

Adam L. Berger, +2 more

- 01 Mar 1996 -

Computational Linguistics

TL;DR: A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.

...read moreread less

An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data

Citations

Active Learning Literature Survey

Active Learning

Methodological Review: What can natural language processing do for clinical decision support?

A literature survey of active machine learning in the context of natural language processing

From Theories to Queries: Active Learning in Practice

References

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Probabilistic Models for Segmenting and Labeling Sequence Data

Combining labeled and unlabeled data with co-training

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

A maximum entropy approach to natural language processing

Related Papers (5)

A sequential algorithm for training text classifiers

Query by committee

An Analysis of Active Learning Strategies for Sequence Labeling Tasks

Support vector machine active learning with applications to text classification

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data