scispace - formally typeset
Search or ask a question

Showing papers on "Rule-based machine translation published in 2017"


Journal ArticleDOI
TL;DR: This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.
Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

1,288 citations


Journal ArticleDOI
TL;DR: A neural machine translation model that maps a source character sequence to a target character sequence without any segmentation is introduced, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities.
Abstract: Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT'15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of BLEU score and human judgment.

445 citations


Proceedings ArticleDOI
TL;DR: This article proposed a data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts, which improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2BLEU over back-translation.
Abstract: The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.

181 citations


Proceedings ArticleDOI
01 Apr 2017
TL;DR: A neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment is proposed.
Abstract: Translating in real-time, a.k.a.simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively.

168 citations


Proceedings ArticleDOI
01 Apr 2017
TL;DR: By training grammars without nonterminal labels, it is found that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.
Abstract: Recurrent neural network grammars (RNNG) are a recently proposed probablistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model’s latent attention largely agreeing with predictions made by hand-crafted head rules, albeit with some important differences). By training grammars without nonterminal labels, we find that phrasal representations depend minimally on nonterminals, providing support for the endocentricity hypothesis.

166 citations


Journal ArticleDOI
TL;DR: Based on the syntax and semantics of virtual linguistic terms, VLTs could be a possible alternative for solving some current challenges of qualitative information fusion in decision making.

162 citations


Proceedings ArticleDOI
12 Feb 2017
TL;DR: The authors proposed a hybrid model, called NMT+RNNG, that learns to parse and translate by combining the recurrent neural network grammar into the attention-based neural machine translation.
Abstract: There has been relatively little attention to incorporating linguistic prior to neural machine translation. Much of the previous work was further constrained to considering linguistic prior on the source side. In this paper, we propose a hybrid model, called NMT+RNNG, that learns to parse and translate by combining the recurrent neural network grammar into the attention-based neural machine translation. Our approach encourages the neural machine translation model to incorporate linguistic prior during training, and lets it translate on its own afterward. Extensive experiments with four language pairs show the effectiveness of the proposed NMT+RNNG.

159 citations


Proceedings ArticleDOI
01 Sep 2017
TL;DR: This work builds a globally optimized neural model for end-to-end relation extraction, proposing novel LSTM features in order to better learn context representations, and presents a novel method to integrate syntactic information to facilitate global learning.
Abstract: Neural networks have shown promising results for relation extraction. State-of-the-art models cast the task as an end-to-end problem, solved incrementally using a local classifier. Yet previous work using statistical models have demonstrated that global optimization can achieve better performances compared to local classification. We build a globally optimized neural model for end-to-end relation extraction, proposing novel LSTM features in order to better learn context representations. In addition, we present a novel method to integrate syntactic information to facilitate global learning, yet requiring little background on syntactic grammars thus being easy to extend. Experimental results show that our proposed model is highly effective, achieving the best performances on two standard benchmarks.

155 citations


Posted Content
TL;DR: The authors used GANs to generate sentences from context-free and probabilistic context free grammars, and qualitative language modeling results. But their results were not commensurate with the progress made in generating images, and still lag far behind likelihood based methods.
Abstract: Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods In this paper, we take a step towards generating natural language with a GAN objective alone We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results A conditional version is also described that can generate sequences conditioned on sentence characteristics

155 citations


Proceedings ArticleDOI
01 Dec 2017
TL;DR: This paper presents a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition, based on the hybrid attention/connectionist temporal classification (CTC) architecture.
Abstract: End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries This also creates an opportunity, which we fully exploit in this paper, to build a monolithic multilingual ASR system with a language-independent neural network architecture We present a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition The model is based on our hybrid attention/connectionist temporal classification (CTC) architecture which has previously been shown to achieve the state-of-the-art performance in several ASR benchmarks Here we augment its set of output symbols to include the union of character sets appearing in all the target languages These include Roman and Cyrillic Alphabets, Arabic numbers, simplified Chinese, and Japanese Kanji/Hiragana/Katakana characters (5,500 characters in all) This allows training of a single multilingual model, whose parameters are shared across all the languages The model can jointly identify the language and recognize the speech, automatically formatting the recognized text in the appropriate character set The experiments, which used speech databases composed of Wall Street Journal (English), Corpus of Spontaneous Japanese, HKUST Mandarin CTS, and Voxforge (German, Spanish, French, Italian, Dutch, Portuguese, Russian), demonstrate comparable/superior performance relative to language-dependent end-to-end ASR systems

139 citations


Posted Content
TL;DR: A neural machine translation architecture that models the surrounding text in addition to the source sentence leads to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus.
Abstract: We propose a neural machine translation architecture that models the surrounding text in addition to the source sentence. These models lead to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus. We also discover that attention-based neural machine translation is well suited for pronoun prediction and compares favorably with other approaches that were specifically designed for this task.

Journal ArticleDOI
TL;DR: This work combines scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models, and obtains up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish English.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: It is found that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems.
Abstract: We aim to shed light on the strengths and weaknesses of the newly introduced neural machine translation paradigm. To that end, we conduct a multifaceted evaluation in which we compare outputs produced by state-of-the-art neural machine translation and phrase-based machine translation systems for 9 language directions across a number of dimensions. Specifically, we measure the similarity of the outputs, their fluency and amount of reordering, the effect of sentence length and performance across different error categories. We find out that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems. Neural machine translation systems are also more accurate at producing inflected forms, but they perform poorly when translating very long sentences.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: The NMT’s internal embedding of the source sentence is exploited and the sentence embedding similarity is used to select the sentences which are close to in-domain data to substantially improve NMT performance.
Abstract: Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT’s internal embedding of the source sentence and use the sentence embedding similarity to select the sentences which are close to in-domain data. The empirical adaptation results on the IWSLT English-French and NIST Chinese-English tasks show that the proposed methods can substantially improve NMT performance by 2.4-9.0 BLEU points, outperforming the existing state-of-the-art baseline by 2.3-4.5 BLEU points.

Book ChapterDOI
19 Apr 2017
TL;DR: RECIPE overcomes the drawbacks of previous evolutionary-based frameworks, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar, and represents a first step towards a complete framework for dealing with different machine learning tasks with the minimum required human intervention.
Abstract: Automatic Machine Learning is a growing area of machine learning that has a similar objective to the area of hyper-heuristics: to automatically recommend optimized pipelines, algorithms or appropriate parameters to specific tasks without much dependency on user knowledge. The background knowledge required to solve the task at hand is actually embedded into a search mechanism that builds personalized solutions to the task. Following this idea, this paper proposes RECIPE (REsilient ClassifIcation Pipeline Evolution), a framework based on grammar-based genetic programming that builds customized classification pipelines. The framework is flexible enough to receive different grammars and can be easily extended to other machine learning tasks. RECIPE overcomes the drawbacks of previous evolutionary-based frameworks, such as generating invalid individuals, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar. Results of f-measure obtained by RECIPE are compared to those two state-of-the-art methods, and shown to be as good as or better than those previously reported in the literature. RECIPE represents a first step towards a complete framework for dealing with different machine learning tasks with the minimum required human intervention.

Posted Content
TL;DR: The second shared task on multimodal machine translation and multilingual image description was held in 2016 as discussed by the authors, where the source sentence is supplemented by an image and the image is only given at test time.
Abstract: We present the results from the second shared task on multimodal machine translation and multilingual image description Nine teams submitted 19 systems to two tasks The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets The multilingual image description task was changed such that at test time, only the image is given Compared to last year, multimodal systems improved, but text-only systems remain competitive

Journal ArticleDOI
TL;DR: The first attention-based neural-MT for multi-way, multilingual translation is proposed and it outperforms strong conventional statistical machine translation systems on Turkish-English and Uzbek-English by incorporating the resources of other language pairs.

Proceedings ArticleDOI
31 May 2017
TL;DR: A simple baseline is introduced that addresses the discrete output space problem without relying on gradient estimators and is able to achieve state-of-the-art results on a Chinese poem generation dataset.
Abstract: Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

Posted Content
TL;DR: This work explores six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search and shows both deficiencies and improvements over the quality of phrase-based statistical machine translation.
Abstract: We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: This work proposes to use lexical formality models to control the formality level of machine translation output and demonstrates the effectiveness of this approach in empirical evaluations, as measured by automatic metrics and human assessments.
Abstract: Stylistic variations of language, such as formality, carry speakers’ intention beyond literal meaning and should be conveyed adequately in translation. We propose to use lexical formality models to control the formality level of machine translation output. We demonstrate the effectiveness of our approach in empirical evaluations, as measured by automatic metrics and human assessments.

Proceedings ArticleDOI
13 Aug 2017
TL;DR: A generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text is introduced, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework.
Abstract: In this paper, we introduce a generic inference hybrid framework for Convolutional Recurrent Neural Network (conv-RNN) of semantic modeling of text, seamless integrating the merits on extracting different aspects of linguistic information from both convolutional and recurrent neural network structures and thus strengthening the semantic understanding power of the new framework. Besides, based on conv-RNN, we also propose a novel sentence classification model and an attention based answer selection model with strengthening power for the sentence matching and classification respectively. We validate the proposed models on a very wide variety of data sets, including two challenging tasks of answer selection (AS) and five benchmark datasets for sentence classification (SC). To the best of our knowledge, it is by far the most complete comparison results in both AS and SC. We empirically show superior performances of conv-RNN in these different challenging tasks and benchmark datasets and also summarize insights on the performances of other state-of-the-arts methodologies.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A novel NMT with source dependency representation to improve translation performance of NMT, especially long sentences, is proposed and empirical results show that this method achieves 1.6 BLEU improvements on average over a strong NMT system.
Abstract: Source dependency information has been successfully introduced into statistical machine translation However, there are only a few preliminary attempts for Neural Machine Translation (NMT), such as concatenating representations of source word and its dependency label together In this paper, we propose a novel NMT with source dependency representation to improve translation performance of NMT, especially long sentences Empirical results on NIST Chinese-to-English translation task show that our method achieves 16 BLEU improvements on average over a strong NMT system

Journal ArticleDOI
TL;DR: This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: It is demonstrated that linguistically informed target word segmentation is better suited for NMT, leading to improved translation quality on the order of magnitude of +0.5 BLEU and −0.9 TER for a medium-scale English→German translation task.
Abstract: For efficiency considerations, state-of-theart neural machine translation (NMT) requires the vocabulary to be restricted to a limited-size set of several thousand symbols. This is highly problematic when translating into inflected or compounding languages. A typical remedy is the use of subword units, where words are segmented into smaller components. Byte pair encoding, a purely corpus-based approach, has proved effective recently. In this paper, we investigate word segmentation strategies that incorporate more linguistic knowledge. We demonstrate that linguistically informed target word segmentation is better suited for NMT, leading to improved translation quality on the order of magnitude of +0.5 BLEU and −0.9 TER for a medium-scale English→German translation task. Our work is important in that it shows that linguistic knowledge can be used to improve NMT results over results based only on the language-agnostic byte pair encoding vocabulary reduction technique.

Journal ArticleDOI
TL;DR: A probability based method is proposed to measure the consensus degree between individual and collective overall random preferences based on the concept of stochastic dominance, which also takes both the importance weights and the fuzzy majority into account.
Abstract: This paper focuses on qualitative multi-attribute group decision making (MAGDM) with linguistic information in terms of single linguistic terms and/or flexible linguistic expressions. To do so, we propose a new linguistic decision rule based on the concepts of random preference and stochastic dominance, by a probability based interpretation of weight information. The importance weights and the concept of fuzzy majority are incorporated into both the multi-attribute and collective decision rule by the so-called weighted ordered weighted averaging operator with the input parameters expressed as probability distributions over a linguistic term set. Moreover, a probability based method is proposed to measure the consensus degree between individual and collective overall random preferences based on the concept of stochastic dominance, which also takes both the importance weights and the fuzzy majority into account. As such, our proposed approaches are based on the ordinal semantics of linguistic terms and voting statistics. By this, on one hand, the strict constraint of the uniform linguistic term set in linguistic decision making can be released; on the other hand, the difference and variation of individual opinions can be captured. The proposed approaches can deal with qualitative MAGDM with single linguistic terms and flexible linguistic expressions. Two application examples taken from the literature are used to illuminate the proposed techniques by comparisons with existing studies. The results show that our proposed approaches are comparable with existing studies.

Journal ArticleDOI
TL;DR: This work proposes an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images using multimedia as the “pivot” and finds that an end-to-end model that simultaneously optimized both rank loss in multimodAL encoders and cross-entropy loss in decoders performed the best.
Abstract: We propose an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images. Based on the assumption that text documents are often likely to be described with other multimedia information (e.g., images) somewhat related to the content, we try to indirectly estimate the relevance between two languages. Using multimedia as the "pivot", we project all modalities into one common hidden space where samples belonging to similar semantic concepts should come close to each other, whatever the observed space of each sample is. This modality-agnostic representation is the key to bridging the gap between different modalities. Putting a decoder on top of it, our network can flexibly draw the outputs from any input modality. Notably, in the testing phase, we need only source language texts as the input for translation. In experiments, we tested our method on two benchmarks to show that it can achieve reasonable translation performance. We compared and investigated several possible implementations and found that an end-to-end model that simultaneously optimized both rank loss in multimodal encoders and cross-entropy loss in decoders performed the best.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: This work proposes to modify the decoder in a neural sequence-to-sequence model to enable multi-task learning for two strongly related tasks: target-side language modeling and translation.
Abstract: The performance of Neural Machine Translation (NMT) models relies heavily on the availability of sufficient amounts of parallel data, and an efficient and effective way of leveraging the vastly available amounts of monolingual data has yet to be found. We propose to modify the decoder in a neural sequence-to-sequence model to enable multi-task learning for two strongly related tasks: target-side language modeling and translation. The decoder predicts the next target word through two channels, a target-side language model on the lowest layer, and an attentional recurrent model which is conditioned on the source representation. This architecture allows joint training on both large amounts of monolingual and moderate amounts of bilingual data to improve NMT performance. Initial results in the news domain for three language pairs show moderate but consistent improvements over a baseline trained on bilingual data only.

Posted Content
TL;DR: Two solutions to the problem of mistranslating rare words in neural machine translation are explored, arguing that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately.
Abstract: We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.

Proceedings ArticleDOI
08 Feb 2017
TL;DR: This paper proposed an end-to-end model which jointly learns a latent graph parser as part of the encoder of an attention-based NMT model, and thus the parser is optimized according to the translation objective.
Abstract: This paper presents a novel neural machine translation model which jointly learns translation and source-side latent graph representations of sentences. Unlike existing pipelined approaches using syntactic parsers, our end-to-end model learns a latent graph parser as part of the encoder of an attention-based neural machine translation model, and thus the parser is optimized according to the translation objective. In experiments, we first show that our model compares favorably with state-of-the-art sequential and pipelined syntax-based NMT models. We also show that the performance of our model can be further improved by pre-training it with a small amount of treebank annotations. Our final ensemble model significantly outperforms the previous best models on the standard English-to-Japanese translation dataset.

Proceedings ArticleDOI
01 Aug 2017
TL;DR: It is found that, in general, late merging outperforms injection, suggesting that RNNs are better viewed as encoders, rather than generators.
Abstract: Image captioning has evolved into a core task for Natural Language Generation and has also proved to be an important testbed for deep learning approaches to handling multimodal representations. Most contemporary approaches rely on a combination of a convolutional network to handle image features, and a recurrent network to encode linguistic information. The latter is typically viewed as the primary “generation” component. Beyond this high-level characterisation, a CNN+RNN model supports a variety of architectural designs. The dominant model in the literature is one in which visual features encoded by a CNN are “injected” as part of the linguistic encoding process, driving the RNN’s linguistic choices. By contrast, it is possible to envisage an architecture in which visual and linguistic features are encoded separately, and merged at a subsequent stage. In this paper, we address two related questions: (1) Is direct injection the best way of combining multimodal information, or is a late merging alternative better for the image captioning task? (2) To what extent should a recurrent network be viewed as actually generating, rather than simply encoding, linguistic information?