scispace - formally typeset
Open AccessProceedings Article

An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data

Katrin Tomanek, +2 more
- pp 486-495
Reads0
Chats0
TLDR
The issue whether a corpus annotated by means of AL can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well is addressed.
Abstract
We consider the impact Active Learning (AL) has on effective and efficient text corpus annotation, and report on reduction rates for annotation efforts ranging up until 72%. We also address the issue whether a corpus annotated by means of AL ‐ using a particular classifier and a particular feature set ‐ can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well. We, finally, report on our experience with the AL paradigm under real-world conditions, i.e., the annotation of large-scale document corpora for the life sciences.

read more

Citations
More filters

Active Learning Literature Survey

Burr Settles
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Book

Active Learning

Burr Settles
TL;DR: Active learning as discussed by the authors is a general approach that allows a machine learning algorithm to choose the data from which it learns by posing "queries", usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator) that already understands the nature of the problem.
Journal ArticleDOI

Methodological Review: What can natural language processing do for clinical decision support?

TL;DR: This review focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS, and the current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals.

A literature survey of active machine learning in the context of natural language processing

TL;DR: Active learning has been successfully applied to a number of natural language processing tasks, such as, information extraction, named entity recognition, text categorization, part-of-speech tagging, parsing, and word sense disambiguation.

From Theories to Queries: Active Learning in Practice

TL;DR: This article surveys recent work in active learning aimed at making it more practical for real-world use, and reviews some of the issues facing active learning in real ongoing learning systems and data annotation projects.
References
More filters
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Proceedings ArticleDOI

Combining labeled and unlabeled data with co-training

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
Proceedings ArticleDOI

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Journal ArticleDOI

A maximum entropy approach to natural language processing

TL;DR: A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.