scispace - formally typeset
Search or ask a question

Showing papers on "Conditional random field published in 2002"


Proceedings ArticleDOI
Michael Collins1
06 Jul 2002
TL;DR: Experimental results on part-of-speech tagging and base noun phrase chunking are given, in both cases showing improvements over results for a maximum-entropy tagger.
Abstract: We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on part-of-speech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximum-entropy tagger.

2,221 citations


Book ChapterDOI
TL;DR: This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems, including sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks.
Abstract: Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues.

698 citations


Proceedings Article
07 Aug 2002
TL;DR: In this article, an efficient feature induction method for CRFs is presented, based on the principle of iteratively constructing feature conjunctions that would significantly increase conditional log-likelihood if added to the model.
Abstract: Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, non-independent features of the input. Faced with this freedom, however, an important question remains: what features should be used? This paper presents an efficient feature induction method for CRFs. The method is founded on the principle of iteratively constructing feature conjunctions that would significantly increase conditional log-likelihood if added to the model. Automated feature induction enables not only improved accuracy and dramatic reduction in parameter count, but also the use of larger cliques, and more freedom to liberally hypothesize atomic input variables that may be relevant to a task. The method applies to linear-chain CRFs, as well as to more arbitrary CRF structures, such as Relational Markov Networks, where it corresponds to learning clique templates, and can also be understood as supervised structure learning. Experimental results on named entity extraction and noun phrase segmentation tasks are presented.

459 citations


01 Jan 2002
TL;DR: This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced probabilistic model for labelling and segmenting sequential data, and hypothesises that general numerical optimisation techniques result in improved performance over iterative scaling algorithms for training CRFs.
Abstract: This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced [31] probabilistic model for labelling and segmenting sequential data. Theoretical and practical disadvantages of the training techniques reported in current literature on CRFs are discussed. We hypothesise that general numerical optimisation techniques result in improved performance over iterative scaling algorithms for training CRFs. Experiments run on a a subset of a well-known text chunking data set [28] confirm that this is indeed the case. This is a highly promising result, indicating that such parameter estimation techniques make CRFs a practical and efficient choice for labelling sequential data, as well as a theoretically sound and principled probabilistic framework.

164 citations


Proceedings Article
01 Jan 2002
TL;DR: The proposed sequence boosting algorithm offers an interesting alternative to methods based on HMMs and the more recently proposed Conditional Random Fields and is attractive both, conceptually and computationally.
Abstract: This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function. The proposed method combines many of the advantages of boosting schemes with the efficiency of dynamic programming methods and is attractive both, conceptually and computationally. In addition, we also discuss alternative approaches based on the Hamming loss for label sequences. The sequence boosting algorithm offers an interesting alternative to methods based on HMMs and the more recently proposed Conditional Random Fields. Applications areas for the presented technique range from natural language processing and information extraction to computational biology. We include experiments on named entity recognition and part-of-speech tagging which demonstrate the validity and competitiveness of our approach.

59 citations