scispace - formally typeset
Open AccessJournal ArticleDOI

Context Gates for Neural Machine Translation

Reads0
Chats0
TLDR
The authors propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words, which can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts.
Abstract
In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy  of a translation while target contexts affect the fluency . Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Globally Coherent Text Generation with Neural Checklist Models

TL;DR: The neural checklist model is presented, a recurrent neural network that models global coherence by storing and updating an agenda of text strings which should be mentioned somewhere in the output, and demonstrates high coherence with greatly improved semantic coverage of the agenda.
Proceedings ArticleDOI

Visualizing and Understanding Neural Machine Translation

TL;DR: This work proposes to use layer-wise relevance propagation (LRP) to compute the contribution of each contextual word to arbitrary hidden states in the attention-based encoder-decoder framework and shows that visualization with LRP helps to interpret the internal workings of NMT and analyze translation errors.
Proceedings ArticleDOI

Exploiting Cross-Sentence Context for Neural Machine Translation

TL;DR: This article proposed a cross-sentence context-aware approach and investigated the influence of historical contextual information on the performance of neural machine translation (NMT) in Chinese-English translation.
Journal ArticleDOI

Learning to Remember Translation History with a Continuous Cache

TL;DR: The authors propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history, and the probability distribution over generated words is updated online depending on the translation history retrieved from the memory.
Proceedings ArticleDOI

Modeling Source Syntax for Neural Machine Translation.

TL;DR: The authors propose three different kinds of encoders to incorporate source syntax into NMT: 1) Parallel RNN encoder that learns word and label annotation vectors parallelly, 2) Hierarchical RNN encoding that learns both word and annotation vectors in a two-level hierarchy, and 3) Mixed RNN encode that stitchingly learns both labels and words over sequences where words and labels are mixed.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Proceedings Article

Sequence to Sequence Learning with Neural Networks

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.