An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences We find that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective Our analysis shows that sampling or noisy synthetic data gives a much stronger training signal than data generated by beam or greedy search We also compare how synthetic data compares to genuine bitext and study various domain effects Finally, we scale to hundreds of millions of monolingual sentences and achieve a new state of the art of 35 BLEU on the WMT’14 English-German test set

/pdf/understanding-back-translation-at-scale-2a1jm02ac2.pdf

Understanding Back-Translation at Scale.

我的台灣, 看見心靈的故鄉 =2009林磐聳藝術與設計展

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix.

/pdf/switchout-an-efficient-data-augmentation-algorithm-for-2fdzqp656x.pdf

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, redefining the state-of-the-art on the former.

/pdf/tagged-back-translation-1hk4t9zxtr.pdf

Tagged Back-Translation

A prerequisite for training corpus-based machine translation (MT) systems – either Statistical MT (SMT) or Neural MT (NMT) – is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a high-quality NMT system. Given that large collections of new parallel text become available only quite rarely, back-translation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus – both as a separate standalone dataset as well as combined with human-generated parallel data – affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.

/pdf/investigating-backtranslation-in-neural-machine-translation-1uvhzp0fai.pdf

Investigating Backtranslation in Neural Machine Translation

A prerequisite for training corpus-based machine translation (MT) systems -- either Statistical MT (SMT) or Neural MT (NMT) -- is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. 
Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a high-quality NMT system. Given that large collections of new parallel text become available only quite rarely, backtranslation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. 
However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus -- both as a separate standalone dataset as well as combined with human-generated parallel data -- affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.

In this paper, we provide a preliminary comparison of statistical machine translation (SMT)
and neural machine translation (NMT) for English→Irish in the fixed domain of public administration. We discuss the challenges for SMT and NMT of a less-resourced language such
as Irish, and show that while an out-of-the-box NMT system may not fare quite as well as
our tailor-made domain-specific SMT system, the future may still be promising for EN→GA
NMT

SMT versus NMT: Preliminary comparisons for Irish

Data selection is a process used in selecting a subset of parallel data for the training of machine translation (MT) systems, so that 1) resources for training might be reduced, 2) trained models could perform better than those trained with the whole corpus, and/or 3) trained models are more tailored to specific domains. It has been shown that for statistical MT (SMT), the use of data selection helps improve the MT performance significantly. In this study, we reviewed three data selection approaches for MT, namely Term Frequency– Inverse Document Frequency, Cross-Entropy Difference and Feature Decay Algorithm, and conducted experiments on Neural Machine Translation (NMT) with the selected data using the three approaches. The results showed that for NMT systems, using data selection also improved the performance, though the gain is not as much as for SMT systems.

/pdf/extracting-in-domain-training-corpora-for-neural-machine-1d4pmydju7.pdf

Extracting In-domain Training Corpora for Neural Machine Translation Using Data Selection Methods

Neural Machine Translation (NMT) systems require a lot of data to be competitive. For this reason, data selection techniques are used only for fine-tuning systems that have been trained with larger amounts of data. In this work we aim to use Feature Decay Algorithms (FDA) data selection techniques not only to fine-tune a system but also to build a complete system with less data. Our findings reveal that it is possible to find a subset of sentence pairs, that outperforms by 1.11 BLEU points the full training corpus, when used for training a German-English NMT system.

/pdf/feature-decay-algorithms-for-neural-machine-translation-33kldx9xf3.pdf

Alberto Poncelas

Papers

Investigating Backtranslation in Neural Machine Translation

Investigating Backtranslation in Neural Machine Translation

SMT versus NMT: Preliminary comparisons for Irish

Extracting In-domain Training Corpora for Neural Machine Translation Using Data Selection Methods

Feature decay algorithms for neural machine translation