scispace - formally typeset
Open AccessProceedings Article

Interactive information extraction with constrained conditional random fields

Reads0
Chats0
TLDR
This work applies a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted.
Abstract
Information Extraction methods can be used to automatically "fill-in" database forms from unstructured data such as Web documents or email. State-of-the-art methods have achieved low error rates but invariably make a number of errors. The goal of an interactive information extraction system is to assist the user in filling in database fields while giving the user confidence in the integrity of the data. The user is presented with an interactive interface that allows both the rapid verification of automatic field assignments and the correction of errors. In cases where there are multiple errors, our system takes into account user corrections, and immediately propagates these constraints such that other fields are often corrected automatically. Linear-chain conditional random fields (CRFs) have been shown to perform well for information extraction and other language modelling tasks due to their ability to capture arbitrary, overlapping features of the input in a Markov model. We apply this framework with two extensions: a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted. Both of these mechanisms are incorporated in a novel user interface for form filling that is intuitive and speeds the entry of data--providing a 23% reduction in error due to automated corrections.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

ArnetMiner: extraction and mining of academic social networks

TL;DR: The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.
Proceedings ArticleDOI

Hidden Conditional Random Fields for Gesture Recognition

TL;DR: This paper derives a discriminative sequence model with a hidden state structure, and demonstrates its utility both in a detection and in a multi-way classification formulation.
Journal ArticleDOI

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields

TL;DR: The proposed system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons, and shows significant improvements over existing techniques.
ReportDOI

Reducing labeling effort for structured prediction tasks

TL;DR: A new active learning paradigm is proposed which reduces not only how many instances the annotator must label, but also how difficult each instance is to annotate, which can vary widely in structured prediction tasks.
Journal ArticleDOI

Active learning for logistic regression: an evaluation

TL;DR: A re-derive of the variance reduction method known in experimental design circles as ‘A-optimality’ and comparisons against different variations of the most widely used heuristic schemes are run to discover which methods work best for different classes of problems and why.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Proceedings ArticleDOI

Shallow parsing with conditional random fields

TL;DR: This work shows how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model.
Proceedings ArticleDOI

Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

TL;DR: This work has shown that conditionally-trained models, such as conditional maximum entropy models, handle inter-dependent features of greedy sequence modeling in NLP well.