Open AccessJournal Article
On using monolingual corpora in neural machine translation
Caglar Gulcehre,Orhan Firat,Kelvin Xu,Kyunghyun Cho,Loïc Barrault,Huei-Chi Lin,Fethi Bougares,Holger Schwenk,Yoshua Bengio +8 more
TLDR
This work investigates how to leverage abundant monolingual corpora for neural machine translation to improve results for En-Fr and En-De translation and extends to high resource languages such as Cs-En and De-En.Abstract:
Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hierarchical baseline, we obtain up to $1.96$ BLEU improvement on the low-resource language pair Turkish-English, and $1.59$ BLEU on the focused domain task of Chinese-English chat messages. While our method was initially targeted toward such tasks with less parallel data, we show that it also extends to high resource languages such as Cs-En and De-En where we obtain an improvement of $0.39$ and $0.47$ BLEU scores over the neural machine translation baselines, respectively.read more
Citations
More filters
Proceedings ArticleDOI
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
TL;DR: This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
Proceedings ArticleDOI
Improving Neural Machine Translation Models with Monolingual Data
TL;DR: The authors used target-side monolingual data for NMT and obtained state-of-the-art performance for several NMT tasks, while only using parallel data for training.
Proceedings Article
Attention-based models for speech recognition
TL;DR: The authors proposed a location-aware attention mechanism for the TIMET phoneme recognition task, which achieved an improved 18.7% phoneme error rate (PER) on utterances which are roughly as long as the ones it was trained on.
Posted Content
Attention-Based Models for Speech Recognition
TL;DR: The attention-mechanism is extended with features needed for speech recognition and a novel and generic method of adding location-awareness to the attention mechanism is proposed to alleviate the issue of high phoneme error rate.
Proceedings ArticleDOI
End-to-end attention-based large vocabulary speech recognition
TL;DR: This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings ArticleDOI
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.