Showing papers on "Rule-based machine translation published in 2017"

PDF

Open Access

Journal Article•DOI•

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

[...]

Melvin Johnson¹, Mike Schuster¹, Quoc V. Le¹, Maxim Krikun¹, Yonghui Wu¹, Zhifeng Chen¹, Nikhil Thorat¹, Fernanda B. Viégas¹, Martin Wattenberg¹, Greg S. Corrado¹, Macduff Hughes¹, Jeffrey Dean¹ - Show less +8 more•Institutions (1)

Google¹

09 Oct 2017-Transactions of the Association for Computational Linguistics

TL;DR: This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.

...read moreread less

Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

...read moreread less

1,288 citations

Journal Article•DOI•

Fully Character-Level Neural Machine Translation without Explicit Segmentation

[...]

Jason Lee¹, Kyunghyun Cho², Thomas Hofmann¹•Institutions (2)

ETH Zurich¹, New York University²

17 Oct 2017-Transactions of the Association for Computational Linguistics

TL;DR: A neural machine translation model that maps a source character sequence to a target character sequence without any segmentation is introduced, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities.

...read moreread less

Abstract: Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT'15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of BLEU score and human judgment.

...read moreread less

445 citations

Proceedings Article•DOI•

Data Augmentation for Low-Resource Neural Machine Translation

[...]

Marzieh Fadaee¹, Arianna Bisazza², Christof Monz²•Institutions (2)

University of Tehran¹, University of Amsterdam²

01 May 2017-arXiv: Computation and Language

TL;DR: This article proposed a data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts, which improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2BLEU over back-translation.

...read moreread less

Abstract: The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.

...read moreread less

181 citations

Proceedings Article•DOI•

Learning to Translate in Real-time with Neural Machine Translation

[...]

Jiatao Gu¹, Graham Neubig², Kyunghyun Cho³, Victor O. K. Li¹•Institutions (3)

University of Hong Kong¹, Carnegie Mellon University², New York University³

01 Apr 2017

TL;DR: A neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment is proposed.

...read moreread less

Abstract: Translating in real-time, a.k.a.simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively.

...read moreread less

168 citations

Proceedings Article•DOI•

What Do Recurrent Neural Network Grammars Learn About Syntax

[...]

Adhiguna Kuncoro¹, Miguel Ballesteros², Lingpeng Kong³, Chris Dyer³, Graham Neubig³, Noah A. Smith⁴ - Show less +2 more•Institutions (4)

Google¹, IBM², Carnegie Mellon University³, University of Pittsburgh⁴

01 Apr 2017

TL;DR: By training grammars without nonterminal labels, it is found that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.

...read moreread less

Abstract: Recurrent neural network grammars (RNNG) are a recently proposed probablistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model’s latent attention largely agreeing with predictions made by hand-crafted head rules, albeit with some important differences). By training grammars without nonterminal labels, we find that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.

...read moreread less

166 citations

Journal Article•DOI•

On the syntax and semantics of virtual linguistic terms for information fusion in decision making

[...]

Zeshui Xu¹, Hai Wang²•Institutions (2)

Nanjing University of Information Science and Technology¹, Southeast University²

01 Mar 2017-Information Fusion

TL;DR: Based on the syntax and semantics of virtual linguistic terms, VLTs could be a possible alternative for solving some current challenges of qualitative information fusion in decision making.

...read moreread less

162 citations

Proceedings Article•DOI•

Learning to Parse and Translate Improves Neural Machine Translation

[...]

Akiko Eriguchi¹, Yoshimasa Tsuruoka¹, Kyunghyun Cho²•Institutions (2)

University of Tokyo¹, New York University²

12 Feb 2017

TL;DR: The authors proposed a hybrid model, called NMT+RNNG, that learns to parse and translate by combining the recurrent neural network grammar into the attention-based neural machine translation.

...read moreread less

Abstract: There has been relatively little attention to incorporating linguistic prior to neural machine translation. Much of the previous work was further constrained to considering linguistic prior on the source side. In this paper, we propose a hybrid model, called NMT+RNNG, that learns to parse and translate by combining the recurrent neural network grammar into the attention-based neural machine translation. Our approach encourages the neural machine translation model to incorporate linguistic prior during training, and lets it translate on its own afterward. Extensive experiments with four language pairs show the effectiveness of the proposed NMT+RNNG.

...read moreread less

159 citations

Proceedings Article•DOI•

End-to-End Neural Relation Extraction with Global Optimization

[...]

Meishan Zhang¹, Yue Zhang², Guohong Fu¹•Institutions (2)

Heilongjiang University¹, Singapore University of Technology and Design²

01 Sep 2017

TL;DR: This work builds a globally optimized neural model for end-to-end relation extraction, proposing novel LSTM features in order to better learn context representations, and presents a novel method to integrate syntactic information to facilitate global learning.

...read moreread less

Abstract: Neural networks have shown promising results for relation extraction. State-of-the-art models cast the task as an end-to-end problem, solved incrementally using a local classifier. Yet previous work using statistical models have demonstrated that global optimization can achieve better performances compared to local classification. We build a globally optimized neural model for end-to-end relation extraction, proposing novel LSTM features in order to better learn context representations. In addition, we present a novel method to integrate syntactic information to facilitate global learning, yet requiring little background on syntactic grammars thus being easy to extend. Experimental results show that our proposed model is highly effective, achieving the best performances on two standard benchmarks.

...read moreread less

155 citations

Posted Content•

Adversarial Generation of Natural Language

[...]

Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Chris Pal, Aaron Courville - Show less +1 more

31 May 2017-arXiv: Computation and Language

TL;DR: The authors used GANs to generate sentences from context-free and probabilistic context free grammars, and qualitative language modeling results. But their results were not commensurate with the progress made in generating images, and still lag far behind likelihood based methods.

...read moreread less

Abstract: Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods In this paper, we take a step towards generating natural language with a GAN objective alone We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results A conditional version is also described that can generate sequences conditioned on sentence characteristics

...read moreread less

155 citations

Proceedings Article•DOI•

Language independent end-to-end architecture for joint language identification and speech recognition

[...]

Shinji Watanabe¹, Takaaki Hori¹, John R. Hershey¹•Institutions (1)

Mitsubishi Electric Research Laboratories¹

01 Dec 2017

TL;DR: This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture.

...read moreread less

Abstract: End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries This also creates an opportunity, which we fully exploit in this paper, to build a monolithic multilingual ASR system with a language-independent neural network architecture We present a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition The model is based on our hybrid attention/connectionist temporal classification (CTC) architecture which has previously been shown to achieve the state-of-the-art performance in several ASR benchmarks Here we augment its set of output symbols to include the union of character sets appearing in all the target languages These include Roman and Cyrillic Alphabets, Arabic numbers, simplified Chinese, and Japanese Kanji/Hiragana/Katakana characters (5,500 characters in all) This allows training of a single multilingual model, whose parameters are shared across all the languages The model can jointly identify the language and recognize the speech, automatically formatting the recognized text in the appropriate character set The experiments, which used speech databases composed of Wall Street Journal (English), Corpus of Spontaneous Japanese, HKUST Mandarin CTS, and Voxforge (German, Spanish, French, Italian, Dutch, Portuguese, Russian), demonstrate comparable/superior performance relative to language-dependent end-to-end ASR systems

...read moreread less

139 citations

Posted Content•

Does Neural Machine Translation Benefit from Larger Context

[...]

Sébastien Jean, Stanislas Lauly, Orhan Firat, Kyunghyun Cho

17 Apr 2017-arXiv: Machine Learning

TL;DR: A neural machine translation architecture that models the surrounding text in addition to the source sentence leads to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus.

...read moreread less

Abstract: We propose a neural machine translation architecture that models the surrounding text in addition to the source sentence. These models lead to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus. We also discover that attention-based neural machine translation is well suited for pronoun prediction and compares favorably with other approaches that were specifically designed for this task.

...read moreread less

Journal Article•DOI•

On integrating a language model into neural machine translation

[...]

Caglar Gulcehre¹, Orhan Firat², Kelvin Xu¹, Kyunghyun Cho¹, Yoshua Bengio³ - Show less +1 more•Institutions (3)

Université de Montréal¹, Middle East Technical University², Canadian Institute for Advanced Research³

01 Sep 2017-Computer Speech & Language

TL;DR: This work combines scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models, and obtains up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish English.

...read moreread less

Proceedings Article•DOI•

A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions

[...]

Antonio Toral¹, Víctor M. Sánchez-Cartagena²•Institutions (2)

Dublin City University¹, University of Alicante²

01 Jan 2017

TL;DR: It is found that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems.

...read moreread less

Abstract: We aim to shed light on the strengths and weaknesses of the newly introduced neural machine translation paradigm. To that end, we conduct a multifaceted evaluation in which we compare outputs produced by state-of-the-art neural machine translation and phrase-based machine translation systems for 9 language directions across a number of dimensions. Specifically, we measure the similarity of the outputs, their fluency and amount of reordering, the effect of sentence length and performance across different error categories. We find out that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems. Neural machine translation systems are also more accurate at producing inflected forms, but they perform poorly when translating very long sentences.

...read moreread less

Proceedings Article•DOI•

Sentence Embedding for Neural Machine Translation Domain Adaptation

[...]

Rui Wang¹, Andrew Finch², Masao Utiyama², Eiichiro Sumita²•Institutions (2)

Shanghai Jiao Tong University¹, National Institute of Information and Communications Technology²

01 Jul 2017

TL;DR: The NMT’s internal embedding of the source sentence is exploited and the sentence embedding similarity is used to select the sentences which are close to in-domain data to substantially improve NMT performance.

...read moreread less

Abstract: Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT’s internal embedding of the source sentence and use the sentence embedding similarity to select the sentences which are close to in-domain data. The empirical adaptation results on the IWSLT English-French and NIST Chinese-English tasks show that the proposed methods can substantially improve NMT performance by 2.4-9.0 BLEU points, outperforming the existing state-of-the-art baseline by 2.3-4.5 BLEU points.

...read moreread less

Book Chapter•DOI•

RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines

[...]

Alex Guimarães Cardoso de Sá¹, Walter José G. S. Pinto¹, Luiz Otávio Vilas Boas Oliveira¹, Gisele L. Pappa¹•Institutions (1)

Universidade Federal de Minas Gerais¹

19 Apr 2017

TL;DR: RECIPE overcomes the drawbacks of previous evolutionary-based frameworks, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar, and represents a first step towards a complete framework for dealing with different machine learning tasks with the minimum required human intervention.

...read moreread less

Abstract: Automatic Machine Learning is a growing area of machine learning that has a similar objective to the area of hyper-heuristics: to automatically recommend optimized pipelines, algorithms or appropriate parameters to specific tasks without much dependency on user knowledge. The background knowledge required to solve the task at hand is actually embedded into a search mechanism that builds personalized solutions to the task. Following this idea, this paper proposes RECIPE (REsilient ClassifIcation Pipeline Evolution), a framework based on grammar-based genetic programming that builds customized classification pipelines. The framework is flexible enough to receive different grammars and can be easily extended to other machine learning tasks. RECIPE overcomes the drawbacks of previous evolutionary-based frameworks, such as generating invalid individuals, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar. Results of f-measure obtained by RECIPE are compared to those two state-of-the-art methods, and shown to be as good as or better than those previously reported in the literature. RECIPE represents a first step towards a complete framework for dealing with different machine learning tasks with the minimum required human intervention.

...read moreread less

Posted Content•

Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

[...]

Desmond Elliott, Stella Frank¹, Loïc Barrault, Fethi Bougares, Lucia Specia - Show less +1 more•Institutions (1)

University of Edinburgh¹

19 Oct 2017-arXiv: Computation and Language

TL;DR: The second shared task on multimodal machine translation and multilingual image description was held in 2016 as discussed by the authors, where the source sentence is supplemented by an image and the image is only given at test time.

...read moreread less

Abstract: We present the results from the second shared task on multimodal machine translation and multilingual image description Nine teams submitted 19 systems to two tasks The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets The multilingual image description task was changed such that at test time, only the image is given Compared to last year, multimodal systems improved, but text-only systems remain competitive

...read moreread less

Journal Article•DOI•

Multi-way, multilingual neural machine translation

[...]

Orhan Firat¹, Kyunghyun Cho², Baskaran Sankaran³, Fatos T. Yarman Vural¹, Yoshua Bengio⁴ - Show less +1 more•Institutions (4)

Middle East Technical University¹, New York University², IBM³, Canadian Institute for Advanced Research⁴

01 Sep 2017-Computer Speech & Language

TL;DR: The first attention-based neural-MT for multi-way, multilingual translation is proposed and it outperforms strong conventional statistical machine translation systems on Turkish-English and Uzbek-English by incorporating the resources of other language pairs.

...read moreread less

Proceedings Article•DOI•

Adversarial Generation of Natural Language

[...]

Sandeep Subramanian¹, Sai Rajeswar², Francis Dutil³, Chris Pal⁴, Aaron Courville⁵ - Show less +1 more•Institutions (5)

Carnegie Mellon University¹, Indian Institute of Technology Delhi², Université de Sherbrooke³, École Polytechnique de Montréal⁴, Université de Montréal⁵

31 May 2017

TL;DR: A simple baseline is introduced that addresses the discrete output space problem without relying on gradient estimators and is able to achieve state-of-the-art results on a Chinese poem generation dataset.

...read moreread less

Abstract: Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

...read moreread less

Posted Content•

Six Challenges for Neural Machine Translation

[...]

Philipp Koehn¹, Rebecca Knowles¹•Institutions (1)

Johns Hopkins University¹

12 Jun 2017-arXiv: Computation and Language

TL;DR: This work explores six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search and shows both deficiencies and improvements over the quality of phrase-based statistical machine translation.

...read moreread less

Abstract: We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

...read moreread less

Proceedings Article•DOI•

A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output

[...]

Xing Niu¹, Marianna J. Martindale¹, Marine Carpuat¹•Institutions (1)

University of Maryland, College Park¹

01 Sep 2017

TL;DR: This work proposes to use lexical formality models to control the formality level of machine translation output and demonstrates the effectiveness of this approach in empirical evaluations, as measured by automatic metrics and human assessments.

...read moreread less

Abstract: Stylistic variations of language, such as formality, carry speakers’ intention beyond literal meaning and should be conveyed adequately in translation. We propose to use lexical formality models to control the formality level of machine translation output. We demonstrate the effectiveness of our approach in empirical evaluations, as measured by automatic metrics and human assessments.

...read moreread less

Proceedings Article•DOI•

A Hybrid Framework for Text Modeling with Convolutional RNN

[...]

Chenglong Wang¹, Feijun Jiang¹, Hongxia Yang¹•Institutions (1)

Alibaba Group¹

13 Aug 2017

TL;DR: A generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text is introduced, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework.

...read moreread less

Abstract: In this paper, we introduce a generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework. Besides, based on conv-RNN, we also propose a novel sentence classification model and an attention based answer selection model with strengthening power for the sentence matching and classification respectively. We validate the proposed models on a very wide variety of data sets, including two challenging tasks of answer selection (AS) and five benchmark datasets for sentence classification (SC). To the best of our knowledge, it is by far the most complete comparison results in both AS and SC. We empirically show superior performances of conv-RNN in these different challenging tasks and benchmark datasets and also summarize insights on the performances of other state-of-the-arts methodologies.

...read moreread less

Proceedings Article•DOI•

Neural Machine Translation with Source Dependency Representation.

[...]

Kehai Chen¹, Rui Wang², Masao Utiyama², Lemao Liu², Akihiro Tamura², Eiichiro Sumita², Tiejun Zhao¹ - Show less +3 more•Institutions (2)

Harbin Institute of Technology¹, National Institute of Information and Communications Technology²

01 Sep 2017

TL;DR: A novel NMT with source dependency representation to improve translation performance of NMT, especially long sentences, is proposed and empirical results show that this method achieves 1.6 BLEU improvements on average over a strong NMT system.

...read moreread less

Abstract: Source dependency information has been successfully introduced into statistical machine translation However, there are only a few preliminary attempts for Neural Machine Translation (NMT), such as concatenating representations of source word and its dependency label together In this paper, we propose a novel NMT with source dependency representation to improve translation performance of NMT, especially long sentences Empirical results on NIST Chinese-to-English translation task show that our method achieves 16 BLEU improvements on average over a strong NMT system

...read moreread less

Journal Article•DOI•

SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation

[...]

Mariella Dimiccoli¹, Marc Bolaños¹, Estefania Talavera², Estefania Talavera¹, Maedeh Aghaei¹, Stavri G. Nikolov, Petia Radeva¹ - Show less +3 more•Institutions (2)

University of Barcelona¹, University of Groningen²

01 Feb 2017-Computer Vision and Image Understanding

TL;DR: This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval.

...read moreread less

Proceedings Article•DOI•

Target-side Word Segmentation Strategies for Neural Machine Translation.

[...]

Matthias Huck, Simon Riess, Alexander Fraser

01 Sep 2017

TL;DR: It is demonstrated that linguistically informed target word segmentation is better suited for NMT, leading to improved translation quality on the order of magnitude of +0.5 BLEU and −0.9 TER for a medium-scale English→German translation task.

...read moreread less

Abstract: For efficiency considerations, state-of-theart neural machine translation (NMT) requires the vocabulary to be restricted to a limited-size set of several thousand symbols. This is highly problematic when translating into inflected or compounding languages. A typical remedy is the use of subword units, where words are segmented into smaller components. Byte pair encoding, a purely corpus-based approach, has proved effective recently. In this paper, we investigate word segmentation strategies that incorporate more linguistic knowledge. We demonstrate that linguistically informed target word segmentation is better suited for NMT, leading to improved translation quality on the order of magnitude of +0.5 BLEU and −0.9 TER for a medium-scale English→German translation task. Our work is important in that it shows that linguistic knowledge can be used to improve NMT results over results based only on the language-agnostic byte pair encoding vocabulary reduction technique.

...read moreread less

Journal Article•DOI•

On qualitative multi-attribute group decision making and its consensus measure: A probability based perspective

[...]

Hong-Bin Yan¹, Tieju Ma¹, Van-Nam Huynh²•Institutions (2)

East China University of Science and Technology¹, Japan Advanced Institute of Science and Technology²

01 Jul 2017-Omega-international Journal of Management Science

TL;DR: A probability based method is proposed to measure the consensus degree between individual and collective overall random preferences based on the concept of stochastic dominance, which also takes both the importance weights and the fuzzy majority into account.

...read moreread less

Abstract: This paper focuses on qualitative multi-attribute group decision making (MAGDM) with linguistic information in terms of single linguistic terms and/or flexible linguistic expressions. To do so, we propose a new linguistic decision rule based on the concepts of random preference and stochastic dominance, by a probability based interpretation of weight information. The importance weights and the concept of fuzzy majority are incorporated into both the multi-attribute and collective decision rule by the so-called weighted ordered weighted averaging operator with the input parameters expressed as probability distributions over a linguistic term set. Moreover, a probability based method is proposed to measure the consensus degree between individual and collective overall random preferences based on the concept of stochastic dominance, which also takes both the importance weights and the fuzzy majority into account. As such, our proposed approaches are based on the ordinal semantics of linguistic terms and voting statistics. By this, on one hand, the strict constraint of the uniform linguistic term set in linguistic decision making can be released; on the other hand, the difference and variation of individual opinions can be captured. The proposed approaches can deal with qualitative MAGDM with single linguistic terms and flexible linguistic expressions. Two application examples taken from the literature are used to illuminate the proposed techniques by comparisons with existing studies. The results show that our proposed approaches are comparable with existing studies.

...read moreread less

Journal Article•DOI•

Zero-resource machine translation by multimodal encoder---decoder network with multimedia pivot

[...]

Hideki Nakayama¹, Noriki Nishida¹•Institutions (1)

University of Tokyo¹

01 Jun 2017-Machine Translation

TL;DR: This work proposes an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images using multimedia as the “pivot” and finds that an end-to-end model that simultaneously optimized both rank loss in multimodAL encoders and cross-entropy loss in decoders performed the best.

...read moreread less

Abstract: We propose an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images. Based on the assumption that text documents are often likely to be described with other multimedia information (e.g., images) somewhat related to the content, we try to indirectly estimate the relevance between two languages. Using multimedia as the "pivot", we project all modalities into one common hidden space where samples belonging to similar semantic concepts should come close to each other, whatever the observed space of each sample is. This modality-agnostic representation is the key to bridging the gap between different modalities. Putting a decoder on top of it, our network can flexibly draw the outputs from any input modality. Notably, in the testing phase, we need only source language texts as the input for translation. In experiments, we tested our method on two benchmarks to show that it can achieve reasonable translation performance. We compared and investigated several possible implementations and found that an end-to-end model that simultaneously optimized both rank loss in multimodal encoders and cross-entropy loss in decoders performed the best.

...read moreread less

Proceedings Article•DOI•

Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning

[...]

Tobias Domhan¹, Felix Hieber²•Institutions (2)

University of Freiburg¹, Heidelberg University²

01 Sep 2017

TL;DR: This work proposes to modify the decoder in a neural sequence-to-sequence model to enable multi-task learning for two strongly related tasks: target-side language modeling and translation.

...read moreread less

Abstract: The performance of Neural Machine Translation (NMT) models relies heavily on the availability of sufficient amounts of parallel data, and an efficient and effective way of leveraging the vastly available amounts of monolingual data has yet to be found. We propose to modify the decoder in a neural sequence-to-sequence model to enable multi-task learning for two strongly related tasks: target-side language modeling and translation. The decoder predicts the next target word through two channels, a target-side language model on the lowest layer, and an attentional recurrent model which is conditioned on the source representation. This architecture allows joint training on both large amounts of monolingual and moderate amounts of bilingual data to improve NMT performance. Initial results in the news domain for three language pairs show moderate but consistent improvements over a baseline trained on bilingual data only.

...read moreread less

Posted Content•

Improving Lexical Choice in Neural Machine Translation

[...]

Toan Q. Nguyen¹, David Chiang²•Institutions (2)

Amazon.com¹, University of Notre Dame²

03 Oct 2017-arXiv: Computation and Language

TL;DR: Two solutions to the problem of mistranslating rare words in neural machine translation are explored, arguing that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately.

...read moreread less

Abstract: We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.

...read moreread less

Proceedings Article•DOI•

Neural Machine Translation with Source-Side Latent Graph Parsing

[...]

Kazuma Hashimoto¹, Yoshimasa Tsuruoka¹•Institutions (1)

University of Tokyo¹

08 Feb 2017

TL;DR: This paper proposed an end-to-end model which jointly learns a latent graph parser as part of the encoder of an attention-based NMT model, and thus the parser is optimized according to the translation objective.

...read moreread less

Abstract: This paper presents a novel neural machine translation model which jointly learns translation and source-side latent graph representations of sentences. Unlike existing pipelined approaches using syntactic parsers, our end-to-end model learns a latent graph parser as part of the encoder of an attention-based neural machine translation model, and thus the parser is optimized according to the translation objective. In experiments, we first show that our model compares favorably with state-of-the-art sequential and pipelined syntax-based NMT models. We also show that the performance of our model can be further improved by pre-training it with a small amount of treebank annotations. Our final ensemble model significantly outperforms the previous best models on the standard English-to-Japanese translation dataset.

...read moreread less

Proceedings Article•DOI•

What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator

[...]

Marc Tanti¹, Albert Gatt², Kenneth P. Camilleri¹•Institutions (2)

University of Malta¹, University of Aberdeen²

01 Aug 2017

TL;DR: It is found that, in general, late merging outperforms injection, suggesting that RNNs are better viewed as encoders, rather than generators.

...read moreread less

Abstract: Image captioning has evolved into a core task for Natural Language Generation and has also proved to be an important testbed for deep learning approaches to handling multimodal representations. Most contemporary approaches rely on a combination of a convolutional network to handle image features, and a recurrent network to encode linguistic information. The latter is typically viewed as the primary “generation” component. Beyond this high-level characterisation, a CNN+RNN model supports a variety of architectural designs. The dominant model in the literature is one in which visual features encoded by a CNN are “injected” as part of the linguistic encoding process, driving the RNN’s linguistic choices. By contrast, it is possible to envisage an architecture in which visual and linguistic features are encoded separately, and merged at a subsequent stage. In this paper, we address two related questions: (1) Is direct injection the best way of combining multimodal information, or is a late merging alternative better for the image captioning task? (2) To what extent should a recurrent network be viewed as actually generating, rather than simply encoding, linguistic information?

...read moreread less

Collapse