scispace - formally typeset
Open AccessProceedings ArticleDOI

Minimum Risk Annealing for Training Log-Linear Models

Reads0
Chats0
TLDR
This work proposes training log-linear combinations of models for dependency parsing and for machine translation, and describes techniques for optimizing nonlinear functions such as precision or the BLEU metric.
Abstract
When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled dependency parsing.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Batch Tuning Strategies for Statistical Machine Translation

TL;DR: It is found that a simple and efficient batch version of MIRA performs at least as well as training online, and consistently outperforms other options.
Proceedings ArticleDOI

Lattice-based Minimum Error Rate Training for Statistical Machine Translation

TL;DR: A novel algorithm is presented that allows for efficiently constructing and representing the exact error surface of all translations that are encoded in a phrase lattice and is used to train the feature function weights of a phrase-based statistical machine translation system.
Proceedings ArticleDOI

Minimum Risk Training for Neural Machine Translation

TL;DR: This paper proposed minimum risk training for end-to-end NMT, which is capable of optimizing model parameters directly with respect to evaluation metrics and achieves significant improvements over maximum likelihood estimation on a state-of-the-art NMT system.
Proceedings ArticleDOI

Online Large-Margin Training of Syntactic and Structural Translation Features

TL;DR: This work explores the use of the MIRA algorithm of Crammer et al. as an alternative to MERT and shows that by parallel processing and exploiting more of the parse forest, it can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost.
Proceedings ArticleDOI

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

TL;DR: In this article, a synchronous context free grammars (SCFGs) are used for statistical machine translation. And the toolkit achieves state-of-the-art performance on the WMT09 French-English translation task.
References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Proceedings ArticleDOI

Statistical phrase-based translation

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.
Proceedings ArticleDOI

Minimum Error Rate Training in Statistical Machine Translation

TL;DR: It is shown that significantly better results can often be obtained if the final evaluation criterion is taken directly into account as part of the training procedure.
Related Papers (5)