On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

doi:10.3115/V1/W14-4012

Open AccessProceedings ArticleDOI

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

Kyunghyun Cho, +5 more

- pp 103-111

Chats0

TLDR

In this paper, a gated recursive convolutional neural network (GRNN) was proposed to learn a grammatical structure of a sentence automatically, which performed well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.

Abstract:

Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder‐Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Empirical evaluation of gated recurrent neural networks on sequence modeling

Junyoung Chung, +5 more

- 11 Dec 2014 -

arXiv: Neural and Evolutionary Computing

TL;DR: These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Posted Content

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 10 Sep 2014 -

arXiv: Computation and Language

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

Posted Content

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

- 22 Dec 2012 -

arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Proceedings ArticleDOI

Statistical phrase-based translation

Philipp Koehn, +2 more

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.

...read moreread less

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

Citations

Deep Learning

Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate

Empirical evaluation of gated recurrent neural networks on sequence modeling

Deep contextualized word representations

References

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Sequence to Sequence Learning with Neural Networks

Sequence to Sequence Learning with Neural Networks

ADADELTA: An Adaptive Learning Rate Method

Statistical phrase-based translation

Related Papers (5)

Long short-term memory

Adam: A Method for Stochastic Optimization

Attention is All you Need

Deep Residual Learning for Image Recognition

Glove: Global Vectors for Word Representation