Minimum Risk Annealing for Training Log-Linear Models

doi:10.3115/1273073.1273174

Open AccessProceedings ArticleDOI

Minimum Risk Annealing for Training Log-Linear Models

David A. Smith, +1 more

- pp 787-794

Chats0

TLDR

This work proposes training log-linear combinations of models for dependency parsing and for machine translation, and describes techniques for optimizing nonlinear functions such as precision or the BLEU metric.

Abstract:

When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled dependency parsing.

Citations

PDF

Open Access

More filters

Proceedings Article

Batch Tuning Strategies for Statistical Machine Translation

Colin Cherry, +1 more

TL;DR: It is found that a simple and efficient batch version of MIRA performs at least as well as training online, and consistently outperforms other options.

...read moreread less

Proceedings ArticleDOI

Lattice-based Minimum Error Rate Training for Statistical Machine Translation

Wolfgang Macherey, +3 more

TL;DR: A novel algorithm is presented that allows for efficiently constructing and representing the exact error surface of all translations that are encoded in a phrase lattice and is used to train the feature function weights of a phrase-based statistical machine translation system.

...read moreread less

Proceedings ArticleDOI

Minimum Risk Training for Neural Machine Translation

Shiqi Shen, +6 more

TL;DR: This paper proposed minimum risk training for end-to-end NMT, which is capable of optimizing model parameters directly with respect to evaluation metrics and achieves significant improvements over maximum likelihood estimation on a state-of-the-art NMT system.

...read moreread less

Proceedings ArticleDOI

Online Large-Margin Training of Syntactic and Structural Translation Features

David Chiang, +2 more

TL;DR: This work explores the use of the MIRA algorithm of Crammer et al. as an alternative to MERT and shows that by parallel processing and exploiting more of the parse forest, it can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost.

...read moreread less

Proceedings ArticleDOI

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

Zhifei Li, +7 more

TL;DR: In this article, a synchronous context free grammars (SCFGs) are used for statistical machine translation. And the toolkit achieves state-of-the-art performance on the WMT09 French-English translation task.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +3 more

Proceedings ArticleDOI

Statistical phrase-based translation

Philipp Koehn, +2 more

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.

...read moreread less

Proceedings ArticleDOI

Minimum Error Rate Training in Statistical Machine Translation

Franz Josef Och

TL;DR: It is shown that significantly better results can often be obtained if the final evaluation criterion is taken directly into account as part of the training procedure.

...read moreread less

Minimum Risk Annealing for Training Log-Linear Models

Citations

Batch Tuning Strategies for Statistical Machine Translation

Lattice-based Minimum Error Rate Training for Statistical Machine Translation

Minimum Risk Training for Neural Machine Translation

Online Large-Margin Training of Syntactic and Structural Translation Features

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

References

Bleu: a Method for Automatic Evaluation of Machine Translation

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Probabilistic Models for Segmenting and Labeling Sequence Data

Statistical phrase-based translation

Minimum Error Rate Training in Statistical Machine Translation

Related Papers (5)

Minimum Error Rate Training in Statistical Machine Translation

Bleu: a Method for Automatic Evaluation of Machine Translation

Moses: Open Source Toolkit for Statistical Machine Translation

Statistical phrase-based translation

Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms