Neural Machine Translation by Jointly Learning to Align and Translate

Open AccessProceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TLDR

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Abstract:

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

Citations

PDF

Open Access

More filters

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Posted Content

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 10 Sep 2014 -

arXiv: Computation and Language

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

Proceedings ArticleDOI

Effective Approaches to Attention-based Neural Machine Translation

Minh-Thang Luong, +2 more

TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

...read moreread less

Posted Content

Attention Is All You Need

Ashish Vaswani, +7 more

- 12 Jun 2017 -

arXiv: Computation and Language

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 23 Oct 2019 -

arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Maxout Networks

Ian Goodfellow, +4 more

TL;DR: A simple new model called maxout is defined designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique.

...read moreread less

Proceedings ArticleDOI

Hybrid speech recognition with Deep Bidirectional LSTM

Alex Graves, +2 more

TL;DR: The hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates, and the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy.

...read moreread less

Book

Statistical Machine Translation

Philipp Koehn

TL;DR: This introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator, such as Google Language Tools and Babelfish, and the companion website provides open-source corpora and tool-kits.

...read moreread less

Proceedings Article

Recurrent Continuous Translation Models

Nal Kalchbrenner, +1 more

TL;DR: A class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units are introduced.

...read moreread less

Posted Content

Sequence Transduction with Recurrent Neural Networks

Alex Graves

- 14 Nov 2012 -

arXiv: Neural and Evolutionary Computing

TL;DR: This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.

...read moreread less

Neural Machine Translation by Jointly Learning to Align and Translate

Citations

Sequence to Sequence Learning with Neural Networks

Sequence to Sequence Learning with Neural Networks

Effective Approaches to Attention-based Neural Machine Translation

Attention Is All You Need

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

References

Maxout Networks

Hybrid speech recognition with Deep Bidirectional LSTM

Statistical Machine Translation

Recurrent Continuous Translation Models

Sequence Transduction with Recurrent Neural Networks

Related Papers (5)

Long short-term memory

Attention is All you Need

Adam: A Method for Stochastic Optimization

Bleu: a Method for Automatic Evaluation of Machine Translation

Deep Residual Learning for Image Recognition