Efficient Higher-Order CRFs for Morphological Tagging

Open AccessProceedings Article

Efficient Higher-Order CRFs for Morphological Tagging

- pp 322-332

TLDR

This work presents an approximated conditional random field using coarse-to-fine decoding and early updating that yields fast and accurate morphological taggers across six languages with different morphological properties and that across languages higher-order models give significant improvements over 1- order models.

Abstract:

Training higher-order conditional random fields is prohibitive for huge tag sets. We present an approximated conditional random field using coarse-to-fine decoding and early updating. We show that our implementation yields fast and accurate morphological taggers across six languages with different morphological properties and that across languages higher-order models give significant improvements over 1-order models.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Wang Ling, +7 more

TL;DR: A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.

...read moreread less

Posted Content

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Wang Ling, +7 more

- 09 Aug 2015 -

arXiv: Computation and Language

Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).

...read moreread less

Proceedings ArticleDOI

Recurrent Neural Network Grammars

Chris Dyer, +3 more

TL;DR: The authors presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 de juny 2016.

...read moreread less

Proceedings ArticleDOI

What do neural machine translation models learn about morphology

Yonatan Belinkov, +4 more

TL;DR: This work analyzes the representations learned by neural MT models at various levels of granularity and empirically evaluates the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks.

...read moreread less

Posted Content

Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs

Miguel Ballesteros, +2 more

- 04 Aug 2015 -

arXiv: Computation and Language

TL;DR: Extensions to a continuousstate dependency parsing method that makes it applicable to morphologically rich languages replace lookup-based word representations with representations constructed from the orthographic representations of the words, also using LSTMs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +3 more

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Proceedings ArticleDOI

Feature-rich part-of-speech tagging with a cyclic dependency network

Kristina Toutanova, +3 more

TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.

...read moreread less

Proceedings ArticleDOI

Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

Michael Collins

TL;DR: Experimental results on part-of-speech tagging and base noun phrase chunking are given, in both cases showing improvements over results for a maximum-entropy tagger.

...read moreread less