Top 28 papers published by Rico Sennrich from University of Zurich in 2017

Proceedings Article•DOI•

Nematus: a Toolkit for Neural Machine Translation

[...]

Rico Sennrich¹, Orhan Firat², Kyunghyun Cho³, Alexandra Birch¹, Barry Haddow¹, Julian Hitschler⁴, Marcin Junczys-Dowmunt¹, Samuel Läubli⁵, Antonio Valerio Miceli Barone¹, Jozef Mokry¹, Maria Nadejde¹ - Show less +7 more•Institutions (5)

University of Edinburgh¹, Middle East Technical University², New York University³, Heidelberg University⁴, University of Zurich⁵

07 Apr 2017

TL;DR: Nematus is a toolkit for Neural Machine Translation that prioritizes high translation accuracy, usability, and extensibility and was used to build top-performing submissions to shared translation tasks at WMT and IWSLT.

...read moreread less

Abstract: We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

...read moreread less

368 citations

Proceedings Article•DOI•

Paraphrasing Revisited with Neural Machine Translation

[...]

Jonathan Mallinson, Rico Sennrich, Mirella Lapata

07 Apr 2017

TL;DR: This paper revisit bilingual pivoting in the context of neural machine translation and presents a paraphrasing model based purely on neural networks, which represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrase for any source input.

...read moreread less

Abstract: Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by “pivoting” over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches.

...read moreread less

246 citations

Posted Content•

The University of Edinburgh's Neural MT Systems for WMT17

[...]

Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, Philip Williams - Show less +4 more

02 Aug 2017-arXiv: Computation and Language

TL;DR: The University of Edinburgh's submissions to the WMT17 shared news translation and biomedical translation tasks are described, with novelties this year include the use of deep architectures, layer normalization, and more compact models due to weight tying and improvements in BPE segmentations.

...read moreread less

Abstract: This paper describes the University of Edinburgh's submissions to the WMT17 shared news translation and biomedical translation tasks. We participated in 12 translation directions for news, translating between English and Czech, German, Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted systems for English to Czech, German, Polish and Romanian. Our systems are neural machine translation systems trained with Nematus, an attentional encoder-decoder. We follow our setup from last year and build BPE-based models with parallel and back-translated monolingual training data. Novelties this year include the use of deep architectures, layer normalization, and more compact models due to weight tying and improvements in BPE segmentations. We perform extensive ablative experiments, reporting on the effectivenes of layer normalization, deep architectures, and different ensembling techniques.

...read moreread less

145 citations

Proceedings Article•DOI•

The University of Edinburgh's Neural MT Systems for WMT17

[...]

Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, Philip Williams - Show less +4 more

08 Sep 2017

TL;DR: This paper used an attentional encoder-decoder architecture for the WMT17 shared news translation and biomedical translation tasks and reported extensive ablative experiments, reporting on the effectivenes of layer normalization, deep architectures, and different ensembling techniques.

...read moreread less

Abstract: This paper describes the University of Edinburgh's submissions to the WMT17 shared news translation and biomedical translation tasks. We participated in 12 translation directions for news, translating between English and Czech, German, Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted systems for English to Czech, German, Polish and Romanian. Our systems are neural machine translation systems trained with Nematus, an attentional encoder-decoder. We follow our setup from last year and build BPE-based models with parallel and back-translated monolingual training data. Novelties this year include the use of deep architectures, layer normalization, and more compact models due to weight tying and improvements in BPE segmentations. We perform extensive ablative experiments, reporting on the effectivenes of layer normalization, deep architectures, and different ensembling techniques.

...read moreread less

138 citations

Proceedings Article•DOI•

Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings

[...]

Annette Rios Gonzales, Laura Mascarell, Rico Sennrich

08 Sep 2017

TL;DR: While a baseline NMT system disambiguates frequent word senses quite reliably, the annotation with both sense labels and lexical chains improves the neural models’ performance on rare word senses.

...read moreread less

Abstract: Word sense disambiguation is necessary in translation because different word senses often have different translations. Neural machine translation models learn different senses of words as part of an end-to-end translation task, and their capability to perform word sense disambiguation has so far not been quantified. We exploit the fact that neural translation models can score arbitrary translations to design a novel cross-lingual word sense disambiguation task that is tailored towards evaluating neural machine translation models. We present a test set of 7,200 lexical ambiguities for German → English, and 6,700 for German → French, and report baseline results. With 70% of lexical ambiguities correctly disambiguated, we find that word sense disambiguation remains a challenging problem for neural machine translation, especially for rare word senses. To improve word sense disambiguation in neural machine translation, we experiment with two methods to integrate sense embeddings. In a first approach we pass sense embeddings as additional input to the neural machine translation system. For the second experiment, we extract lexical chains based on sense embeddings from the document and integrate this information into the NMT model. While a baseline NMT system disambiguates frequent word senses quite reliably, the annotation with both sense labels and lexical chains improves the neural models’ performance on rare word senses.

...read moreread less

111 citations

Proceedings Article•DOI•

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs

[...]

Rico Sennrich¹•Institutions (1)

University of Edinburgh¹

01 Apr 2017

TL;DR: This article proposed a method to assess how well NMT systems model specific linguistic phenomena such as agreement over long distances, the production of novel words, and the faithful translation of polarity.

...read moreread less

Abstract: Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming. Neural machine translation has the attractive property that it can produce scores for arbitrary translations, and we propose a novel method to assess how well NMT systems model specific linguistic phenomena such as agreement over long distances, the production of novel words, and the faithful translation of polarity. The core idea is that we measure whether a reference translation is more probable under a NMT model than a contrastive translation which introduces a specific type of error. We present LingEval97, a large-scale data set of 97000 contrastive translation pairs based on the WMT English->German translation task, with errors automatically created with simple rules. We report results for a number of systems, and find that recently introduced character-level NMT systems perform better at transliteration than models with byte-pair encoding (BPE) segmentation, but perform more poorly at morphosyntactic agreement, and translating discontiguous units of meaning.

...read moreread less

99 citations

Proceedings Article•DOI•

Deep Architectures for Neural Machine Translation

[...]

Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch - Show less +1 more

08 Sep 2017

TL;DR: This paper proposed a novel BiDeep RNN architecture that combines deep transition RNNs and stacked RNN to increase model depth and achieved the best performance on the English-to-German WMT news translation dataset.

...read moreread less

Abstract: It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.

...read moreread less

98 citations

Proceedings Article•DOI•

Regularization techniques for fine-tuning in neural machine translation

[...]

Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann, Rico Sennrich

11 Sep 2017

TL;DR: In this paper, the authors investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-ofdomain dataset is adapted to a small in-domain dataset.

...read moreread less

Abstract: We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for English→German and English→Russian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.

...read moreread less

89 citations

Posted Content•

Image Pivoting for Learning Multilingual Multimodal Representations

[...]

Spandana Gella¹, Rico Sennrich, Frank Keller, Mirella Lapata•Institutions (1)

Microsoft¹

24 Jul 2017-arXiv: Computation and Language

TL;DR: A model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding is proposed.

...read moreread less

Abstract: In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

...read moreread less

68 citations

Proceedings Article•DOI•

Image Pivoting for Learning Multilingual Multimodal Representations

[...]

Spandana Gella¹, Rico Sennrich, Frank Keller, Mirella Lapata•Institutions (1)

Microsoft¹

11 Sep 2017

TL;DR: This article proposed a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding.

...read moreread less

Abstract: In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

...read moreread less

59 citations

Proceedings Article•DOI•

Predicting Target Language CCG Supertags Improves Neural Machine Translation

[...]

Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch - Show less +3 more

11 Sep 2017

TL;DR: The authors introduce syntactic information in the form of CCG supertags in the decoder by interleaving the target supertags with the word sequence, which improves machine translation quality more than multitask training.

...read moreread less

Abstract: Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German→English, a high-resource pair, and for Romanian→English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training. By combining target-syntax with adding source-side dependency labels in the embedding layer, we obtain a total improvement of 0.9 BLEU for German→English and 1.2 BLEU for Romanian→English.

...read moreread less

Posted Content•

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

[...]

Antonio Valerio Miceli Barone¹, Rico Sennrich•Institutions (1)

University of Edinburgh¹

07 Jul 2017-arXiv: Computation and Language

TL;DR: In this article, a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings ("docstrings") generated by scraping open source repositories on GitHub is introduced. And they describe baseline results for the code documentation and code generation tasks obtained by neural machine translation.

...read moreread less

Abstract: Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings ("docstrings") generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas.

...read moreread less

Posted Content•

Predicting Target Language CCG Supertags Improves Neural Machine Translation

[...]

Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch - Show less +3 more

03 Feb 2017-arXiv: Computation and Language

TL;DR: This work introduces syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence, and shows that explicitly modeling target-syntax improves machine translation quality for German->English and for Romanian->English.

...read moreread less

Abstract: Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training. By combining target-syntax with adding source-side dependency labels in the embedding layer, we obtain a total improvement of 0.9 BLEU for German->English and 1.2 BLEU for Romanian->English.

...read moreread less

Proceedings Article•

A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation

[...]

Antonio Valerio Miceli Barone¹, Rico Sennrich•Institutions (1)

University of Edinburgh¹

07 Jul 2017

TL;DR: A large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (“docstrings”) generated by scraping open source repositories on GitHub is introduced.

...read moreread less

Abstract: Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (“docstrings”) generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas.

...read moreread less

A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators

[...]

Sheila Castilho, Joss Moorkens, Federico Gaspari, Rico Sennrich, Vilelmini Sosoni, Yota Georgakopoulou, Pintu Lohar, Andy Way, Antonio Valerio Miceli Barone, Maria Gialama - Show less +6 more

11 Sep 2017

TL;DR: Results are mixed for perceived adequacy and for errors of omission, addition, and mistranslation, but show a preference for NMT in side-by-side ranking for all language pairs, texts, and segment lengths.

...read moreread less

Abstract: This paper reports on a comparative evaluation of phrase-based statistical machine translation (PBSMT) and neural machine translation (NMT) for four language pairs, using the PET interface to compare educational domain output from both systems using a variety of metrics, including automatic evaluation as well as human rankings of adequacy and fluency, error-type markup, and post-editing (technical and temporal) effort, performed by professional translators. Our results show a preference for NMT in side-by-side ranking for all language pairs, texts, and segment lengths. In addition, perceived fluency is improved and annotated errors are fewer in the NMT output. Results are mixed for perceived adequacy and for errors of omission, addition, and mistranslation. Despite far fewer segments requiring post-editing, document-level post-editing performance was not found to have significantly improved in NMT compared to PBSMT. This evaluation was conducted as part of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality machine translation of educational data.

...read moreread less

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs

[...]

Rico Sennrich¹•Institutions (1)

University of Edinburgh¹

07 Apr 2017

TL;DR: This work presents LingEval97, a large-scale data set of 97000 contrastive translation pairs based on the WMT English->German translation task, with errors automatically created with simple rules, and finds that recently introduced character-level NMT systems perform better at transliteration than models with byte-pair encode segmentation, but perform more poorly at morphosyntactic agreement, and translating discontiguous units of meaning.

...read moreread less

Abstract: Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming. Neural machine translation has the attractive property that it can produce scores for arbitrary translations, and we propose a novel method to assess how well NMT systems model specific linguistic phenomena such as agreement over long distances, the production of novel words, and the faithful translation of polarity. The core idea is that we measure whether a reference translation is more probable under a NMT model than a contrastive translation which introduces a specific type of error. We present LingEval97, a large-scale data set of 97000 contrastive translation pairs based on the WMT English->German translation task, with errors automatically created with simple rules. We report results for a number of systems, and find that recently introduced character-level NMT systems perform better at transliteration than models with byte-pair encoding (BPE) segmentation, but perform more poorly at morphosyntactic agreement, and translating discontiguous units of meaning.

...read moreread less

Posted Content•

Syntax-aware Neural Machine Translation Using CCG.

[...]

Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch - Show less +3 more

03 Feb 2017

TL;DR: This work introduces syntactic information in the form of CCG supertags either in the source as an extra feature in the embedding, or in the target, by interleaving the target supertags with the word sequence.

...read moreread less

Abstract: Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling source or target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags either in the source as an extra feature in the embedding, or in the target, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling syntax improves machine translation quality for English-German, a high-resource pair, and for English-Romanian, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training.

...read moreread less

Posted Content•

Regularization techniques for fine-tuning in neural machine translation

[...]

Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann, Rico Sennrich

31 Jul 2017-arXiv: Computation and Language

TL;DR: This work investigates techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in- domain dataset, and introduces tuneout, a novel regularization technique inspired by dropout.

...read moreread less

Abstract: We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for English->German and English->Russian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.

...read moreread less

Posted Content•

Nematus: a Toolkit for Neural Machine Translation

[...]

Rico Sennrich¹, Orhan Firat², Kyunghyun Cho³, Alexandra Birch¹, Barry Haddow¹, Julian Hitschler⁴, Marcin Junczys-Dowmunt¹, Samuel Läubli⁵, Antonio Valerio Miceli Barone¹, Jozef Mokry¹, Maria Nadejde¹ - Show less +7 more•Institutions (5)

University of Edinburgh¹, Middle East Technical University², New York University³, Heidelberg University⁴, University of Zurich⁵

13 Mar 2017-arXiv: Computation and Language

TL;DR: Nematus as discussed by the authors is a toolkit for NMT that prioritizes high translation accuracy, usability, and extensibility, and has been used for shared translation tasks at WMT and IWSLT.

...read moreread less

Abstract: We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

...read moreread less

Posted Content•

Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method

[...]

Yutong Shao, Rico Sennrich¹, Bonnie Webber¹, Federico Fancellu²•Institutions (2)

University of Edinburgh¹, Samsung²

21 Nov 2017-arXiv: Computation and Language

TL;DR: A new evaluation method based on an idiom-specific blacklist of literal translations, based on the insight that the occurrence of any blacklisted words in the translation output indicates a likely translation error, is introduced.

...read moreread less

Abstract: Idiom translation is a challenging problem in machine translation because the meaning of idioms is non-compositional, and a literal (word-by-word) translation is likely to be wrong. In this paper, we focus on evaluating the quality of idiom translation of MT systems. We introduce a new evaluation method based on an idiom-specific blacklist of literal translations, based on the insight that the occurrence of any blacklisted words in the translation output indicates a likely translation error. We introduce a dataset, CIBB (Chinese Idioms Blacklists Bank), and perform an evaluation of a state-of-the-art Chinese-English neural MT system. Our evaluation confirms that a sizable number of idioms in our test set are mistranslated (46.1%), that literal translation error is a common error type, and that our blacklist method is effective at identifying literal translation errors.

...read moreread less

Posted Content•

Deep Architectures for Neural Machine Translation

[...]

Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch - Show less +1 more

24 Jul 2017-arXiv: Computation and Language

TL;DR: This work describes and evaluates several existing approaches to introduce depth in neural machine translation, and introduces a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNS.

...read moreread less

Abstract: It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.

...read moreread less

Proceedings Article•

Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method

[...]

Yutong Shao, Rico Sennrich¹, Bonnie Webber¹, Federico Fancellu²•Institutions (2)

University of Edinburgh¹, Samsung²

21 Nov 2017

TL;DR: The authors introduce a new evaluation method based on an idiom-specific blacklist of literal translations, based on the insight that the occurrence of any blacklisted words in the translation output indicates a likely translation error.

...read moreread less

Abstract: Idiom translation is a challenging problem in machine translation because the meaning of idioms is non-compositional, and a literal (word-by-word) translation is likely to be wrong. In this paper, we focus on evaluating the quality of idiom translation of MT systems. We introduce a new evaluation method based on an idiom-specific blacklist of literal translations, based on the insight that the occurrence of any blacklisted words in the translation output indicates a likely translation error. We introduce a dataset, CIBB (Chinese Idioms Blacklists Bank), and perform an evaluation of a state-of-the-art Chinese-English neural MT system. Our evaluation confirms that a sizable number of idioms in our test set are mistranslated (46.1%), that literal translation error is a common error type, and that our blacklist method is effective at identifying literal translation errors.

...read moreread less

Proceedings Article•DOI•

The SUMMA Platform Prototype

[...]

Renars Liepins, Ulrich Germann, Guntis Barzdins, Alexandra Birch¹, Steve Renals¹, Susanne Weber, Peggy van der Kreeft², Hervé Bourlard³, João Prieto, Ondrej Klejch¹, Peter Bell¹, Alexandros Lazaridis, Afonso Mendes³, Sebastian Riedel⁴, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen¹, Tomasz Dwojak¹, Philip N. Garner, Andreas Giefer³, Marcin Junczys-Dowmunt¹, Hina Imran³, David Nogueira, Ahmed Ali⁵, Sebastião Miranda, Andrei Popescu-Belis³, Lesly Miculicich Werlen³, Nikos Papasarantopoulos¹, Abiola Obamuyide⁶, Clive Jones, Fahim Dalvi⁵, Andreas Vlachos⁶, Yang Wang³, Sibo Tong³, Rico Sennrich¹, Nikolaos Pappas³, Shashi Narayan¹, Marco Damonte¹, Nadir Durrani⁷, Sameer Khurana⁷, Ahmed Abdelali⁵, Hassan Sajjad⁵, Stephan Vogel⁵, David Sheppey, Chris Hernon, Jeff Mitchell⁴ - Show less +42 more•Institutions (7)

University of Edinburgh¹, Deutsche Welle², Idiap Research Institute³, University College London⁴, Qatar Computing Research Institute⁵, University of Sheffield⁶, Khalifa University⁷

07 Apr 2017

TL;DR: The first prototype of the SUMMA Platform is presented: an integrated platform for multilingual media monitoring that contains a rich suite of low-level and high-level natural language processing technologies.

...read moreread less

Abstract: We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

...read moreread less

Posted Content•

Evaluating Discourse Phenomena in Neural Machine Translation

[...]

Rachel Bawden¹, Rico Sennrich, Alexandra Birch, Barry Haddow•Institutions (1)

University of Paris-Sud¹

01 Nov 2017-arXiv: Computation and Language

TL;DR: The authors investigated the performance of multi-encoder NMT models trained on subtitles for English to French and found that decoding the concatenation of the previous and current sentence leads to good performance.

...read moreread less

Abstract: For machine translation to tackle discourse phenomena, models must have access to extra-sentential linguistic context. There has been recent interest in modelling context in neural machine translation (NMT), but models have been principally evaluated with standard automatic metrics, poorly adapted to evaluating discourse phenomena. In this article, we present hand-crafted, discourse test sets, designed to test the models' ability to exploit previous source and target sentences. We investigate the performance of recently proposed multi-encoder NMT models trained on subtitles for English to French. We also explore a novel way of exploiting context from the previous sentence. Despite gains using BLEU, multi-encoder models give limited improvement in the handling of discourse phenomena: 50% accuracy on our coreference test set and 53.5% for coherence/cohesion (compared to a non-contextual baseline of 50%). A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance, and our novel strategy of multi-encoding and decoding of two sentences leads to the best performance (72.5% for coreference and 57% for coherence/cohesion), highlighting the importance of target-side context.

...read moreread less

Journal Article•

Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers

[...]

Antonio Valerio Miceli Barone, Jindrich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch - Show less +1 more

01 Jul 2017-The Association for Computational Linguistics

The Samsung and University of Edinburgh’s submission to IWSLT17

[...]

Pawel Przybysz, Marcin Chochowski, Rico Sennrich, Barry Haddow, Alexandra Birch-Mayne - Show less +1 more

01 Dec 2017

TL;DR: This paper describes the joint submission of Samsung Research and Development, Warsaw, Poland and the University of Edinburgh team to the IWSLT MT task for TED talks, and demonstrates the effectiveness of the different techniques that were applied via ablation studies.

...read moreread less

TraMOOC - Translation for Massive Open Online Courses: Recent Developments in Machine Translation

[...]

Rico Sennrich¹, Antonio Valerio Miceli Barone¹, Joss Moorkens², Sheila Castilho², Andy Way², Federico Gaspari², Valia Kordoni, Markus Egg, Maja Popović, Yota Georgakopoulou, Maria Gialama, Menno van Zaanen - Show less +8 more•Institutions (2)

University of Edinburgh¹, Dublin City University²

01 May 2017

TL;DR: A comparative human evaluation of phrase-based SMT and NMT for four language pairs to compare educational domain output from both systems using a variety of metrics shows a preference for NMT in side-byside ranking for all language pairs, texts, and segment lengths.

...read moreread less

Abstract: Massive open online courses have been growing rapidly in size and impact. TraMOOC1 aims at developing highquality translation of all types of text genre included in MOOCs from English into eleven European and BRIC languages that are hard to translate into and have weak MT support. 1 Recent developments In TraMOOC, we have developed machine translation prototypes for 11 target languages, from English into German, Italian, Portuguese, Dutch, Bulgarian, Greek, Polish, Czech, Croatian, Russian, and Chinese. The translation systems are based on phrase-based SMT and neural machine translation. The latter has achieved state-of-the-art performance in recent evaluation campaigns (Bojar, 2016). We use the Nematus toolkit (Sennrich, 2017) for training; the translation server is based on the amuNMT toolkit (Junczys-Dowmunt et al., 2016). The translation systems have been adapted to MOOC texts via fine-tuning of the model parameters on in-domain training data to maximize translation quality on this domain. c © 2017 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CCBY-ND. TraMOOC is a H2020 Innovation Action project funded by the European Commission (H2020-ICT-2014-1-ICT-172014/644333) and runs from February 2015 to February 2018. For more details on the project, please, visit http://www. tramooc.eu We have also completed a comparative human evaluation of phrase-based SMT and NMT for four language pairs to compare educational domain output from both systems using a variety of metrics. These include automatic evaluation, human rankings of adequacy and fluency, error-type markup, and technical and temporal post-editing effort. The results show a preference for NMT in side-byside ranking for all language pairs, texts, and segment lengths. In addition, perceived fluency is improved and annotated errors are fewer in the NMT output. However, results are mixed for some error categories. Despite far fewer segments requiring post-editing, document-level post-editing performance was not found to have significantly improved when using NMT in this study, suggesting that NMT may not show an enormous improvement over SMT when used in a production scenario. We have subsequently prepared data and a slightly amended quality evaluation methodology to apply to all TraMOOC NMT systems later in 2017.

...read moreread less

Proceedings Article•

Practical Neural Machine Translation

[...]

Rico Sennrich¹, Barry Haddow¹•Institutions (1)

University of Edinburgh¹

01 Apr 2017

TL;DR: This tutorial will cover a basic theoretical introduction to NMT, discuss the components of state-of-the-art systems, and provide practical advice for building NMT systems.

...read moreread less

Abstract: Neural Machine Translation (NMT) has achieved new breakthroughs in machine translation in recent years. It has dominated recent shared translation tasks in machine translation research, and is also being quickly adopted in industry. The technical differences between NMT and the previously dominant phrase-based statistical approach require that practictioners learn new best practices for building MT systems, ranging from different hardware requirements, new techniques for handling rare words and monolingual data, to new opportunities in continued learning and domain adaptation.This tutorial is aimed at researchers and users of machine translation interested in working with NMT. The tutorial will cover a basic theoretical introduction to NMT, discuss the components of state-of-the-art systems, and provide practical advice for building NMT systems.

...read moreread less

Showing papers by "Rico Sennrich published in 2017"