scispace - formally typeset
Search or ask a question

Showing papers on "Rule-based machine translation published in 2016"


Posted Content
TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

5,737 citations


Proceedings Article
05 Dec 2016
TL;DR: Experiments show that dual-NMT works very well on English ↔ French translation; especially, by learning from monolingual data, it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Abstract: While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation dual-NMT. Experiments show that dual-NMT works very well on English ↔ French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.

559 citations


Posted Content
TL;DR: This paper presents the first attempts in building a multilingual Neural Machine Translation framework under a unified approach in which the information shared among languages can be helpful in the translation of individual language pairs and points out a novel way to make use of monolingual data with Neural Machine translation.
Abstract: In this paper, we present our first attempts in building a multilingual Neural Machine Translation framework under a unified approach. We are then able to employ attention-based NMT for many-to-many multilingual translation tasks. Our approach does not require any special treatment on the network architecture and it allows us to learn minimal number of free parameters in a standard way of training. Our approach has shown its effectiveness in an under-resourced translation scenario with considerable improvements up to 2.6 BLEU points. In addition, the approach has achieved interesting and promising results when applied in the translation task that there is no direct parallel corpus between source and target languages.

314 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: Two approaches to make full use of the sourceside monolingual data in NMT are proposed using the self-learning algorithm to generate the synthetic large-scale parallel data for NMT training and the multi-task learning framework using two NMTs to predict the translation and the reordered source-side monolingUAL sentences simultaneously.
Abstract: Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently become a new paradigm. Researchers have proven that the target-side monolingual data can greatly enhance the decoder model of NMT. However, the source-side monolingual data is not fully explored although it should be useful to strengthen the encoder model of NMT, especially when the parallel corpus is far from sufficient. In this paper, we propose two approaches to make full use of the sourceside monolingual data in NMT. The first approach employs the self-learning algorithm to generate the synthetic large-scale parallel data for NMT training. The second approach applies the multi-task learning framework using two NMTs to predict the translation and the reordered source-side monolingual sentences simultaneously. The extensive experiments demonstrate that the proposed methods obtain significant improvements over the strong attention-based NMT.

304 citations


Proceedings ArticleDOI
12 Aug 2016
TL;DR: This paper generalizes the embedding layer of the encoder in the attentional encoder--decoder architecture to support the inclusion of arbitrary features, in addition to the baseline word feature, and finds that linguistic input features improve model quality according to three metrics: perplexity, BLEU and CHRF3.
Abstract: Neural machine translation has recently achieved impressive results, while using little in the way of external linguistic information. In this paper we show that the strong learning capability of neural MT models does not make linguistic features redundant; they can be easily incorporated to provide further improvements in performance. We generalize the embedding layer of the encoder in the attentional encoder–decoder architecture to support the inclusion of arbitrary features, in addition to the baseline word feature. We add morphological features, part-ofspeech tags, and syntactic dependency labels as input features to English↔German and English→Romanian neural machine translation systems. In experiments on WMT16 training and test sets, we find that linguistic input features improve model quality according to three metrics: perplexity, BLEU and CHRF3. An opensource implementation of our neural MT system is available1, as are sample files and configurations2.

301 citations


Book ChapterDOI
08 Oct 2016
TL;DR: An end-to-end trainable recurrent and convolutional network model that jointly learns to process visual and linguistic information is proposed that can produce quality segmentation output from the natural language expression, and outperforms baseline methods by a large margin.
Abstract: In this paper we approach the novel problem of segmenting an image based on a natural language expression. This is different from traditional semantic segmentation over a predefined set of semantic classes, as e.g., the phrase “two men sitting on the right bench” requires segmenting only the two people on the right bench and no one standing or sitting on another bench. Previous approaches suitable for this task were limited to a fixed set of categories and/or rectangular regions. To produce pixelwise segmentation for the language expression, we propose an end-to-end trainable recurrent and convolutional network model that jointly learns to process visual and linguistic information. In our model, a recurrent neural network is used to encode the referential expression into a vector representation, and a fully convolutional network is used to a extract a spatial feature map from the image and output a spatial response map for the target object. We demonstrate on a benchmark dataset that our model can produce quality segmentation output from the natural language expression, and outperforms baseline methods by a large margin.

276 citations


Proceedings ArticleDOI
01 Aug 2016
TL;DR: This work develops hybrid models that process the text using both convolutional and recurrent neural networks, combining the merits on extracting linguistic information from both structures to address passage answer selection.
Abstract: Passage-level question answer matching is a challenging task since it requires effective representations that capture the complex semantic relations between questions and answers. In this work, we propose a series of deep learning models to address passage answer selection. To match passage answers to questions accommodating their complex semantic relations, unlike most previous work that utilizes a single deep learning structure, we develop hybrid models that process the text using both convolutional and recurrent neural networks, combining the merits on extracting linguistic information from both structures. Additionally, we also develop a simple but effective attention mechanism for the purpose of constructing better answer representations according to the input question, which is imperative for better modeling long answer sequences. The results on two public benchmark datasets, InsuranceQA and TREC-QA, show that our proposed models outperform a variety of strong baselines.

266 citations


Proceedings ArticleDOI
12 Aug 2016
TL;DR: This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more describe in a different (source) language.
Abstract: This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submissions for two task variants: (i) a translation task, in which a source language image description needs to be translated to a target language, (optionally) with additional cues from the corresponding image, and (ii) a description generation task, in which a target language description needs to be generated for an image, (optionally) with additional cues from source language descriptions of the same image. In this first edition of the shared task, 16 systems were submitted for the translation task and seven for the image description task, from a total of 10 teams.

263 citations


Proceedings Article
10 Dec 2016
TL;DR: This paper proposes a first attempt to build an end-to-end speech- to-text translation system, which does not use source language text during learning or decoding, and would drastically change the data collection methodology in speech translation.
Abstract: This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language text during learning or decoding. Relaxing the need for source language transcription would drastically change the data collection methodology in speech translation, especially in under-resourced scenarios.

256 citations


Journal ArticleDOI
TL;DR: This study provides a connection between two different models based on linguistic 2-tuples and proves the equivalence of the linguistic computational models to handle ULTSs, and proposes a novel CW methodology where the hesitant fuzzy linguistic term sets (HFLTSs) can be constructed based on ULtss using a numerical scale.

208 citations


Posted Content
TL;DR: The proposed multi-way, multilingual neural machine translation approach enables a single neural translation model to translate between multiple languages, with a number of parameters that grows only linearly with the number of languages.
Abstract: We propose multi-way, multilingual neural machine translation. The proposed approach enables a single neural translation model to translate between multiple languages, with a number of parameters that grows only linearly with the number of languages. This is made possible by having a single attention mechanism that is shared across all language pairs. We train the proposed multi-way, multilingual model on ten language pairs from WMT'15 simultaneously and observe clear performance improvements over models trained on only one language pair. In particular, we observe that the proposed model significantly improves the translation quality of low-resource language pairs.

Posted Content
TL;DR: A novel finetuning algorithm for the recently introduced multi-way, mulitlingual neural machine translate that enables zero-resource machine translation and is better than pivot-based translation strategy while keeping only one additional copy of attention-related parameters.
Abstract: In this paper, we propose a novel finetuning algorithm for the recently introduced multi-way, mulitlingual neural machine translate that enables zero-resource machine translation. When used together with novel many-to-one translation strategies, we empirically show that this finetuning algorithm allows the multi-way, multilingual model to translate a zero-resource language pair (1) as well as a single-pair neural translation model trained with up to 1M direct parallel sentences of the same language pair and (2) better than pivot-based translation strategy, while keeping only one additional copy of attention-related parameters.

Posted Content
TL;DR: The authors introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure, which allow application to both parsing and language modeling, and demonstrate that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English and Chinese.
Abstract: We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure. We explain efficient inference procedures that allow application to both parsing and language modeling. Experiments show that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English and Chinese.

Proceedings Article
Wei He1, Zhongjun He1, Hua Wu1, Haifeng Wang1
12 Feb 2016
TL;DR: The proposed method significantly improves the translation quality of the state-of-the-art NMT system on Chinese-to-English translation tasks and incorporates statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework.
Abstract: Neural machine translation (NMT) conducts end-to-end translation with a source language encoder and a target language decoder, making promising translation performance. However, as a newly emerged approach, the method has some limitations. An NMT system usually has to apply a vocabulary of certain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in fluent but inadequate translations. In order to solve the above problems, we incorporate statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Our experiments show that the proposed method significantly improves the translation quality of the state-of-the-art NMT system on Chinese-to-English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.

Posted Content
TL;DR: A novel decoding algorithm is introduced that allows an existing neural machine translation model to begin translating before a full source sentence is received and is unique from previous works on simultaneous translation in that segmentation and translation are done jointly to maximize the translation quality.
Abstract: We investigate the potential of attention-based neural machine translation in simultaneous translation. We introduce a novel decoding algorithm, called simultaneous greedy decoding, that allows an existing neural machine translation model to begin translating before a full source sentence is received. This approach is unique from previous works on simultaneous translation in that segmentation and translation are done jointly to maximize the translation quality and that translating each segment is strongly conditioned on all the previous segments. This paper presents a first step toward building a full simultaneous translation system based on neural machine translation.

Proceedings ArticleDOI
25 Aug 2016
TL;DR: The AUTOGRAM prototype automatically produced readable and structurally accurate grammars for inputs like URLs, spreadsheets or configuration files, and not only allow simple reverse engineering of input formats, but can also directly serve as input for test generators.
Abstract: Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program. In a URL http://www.example.com/path/, for instance, the protocol http, the host www.example.com, and the path path would be handled by different functions and stored in different variables. Given a set of sample inputs, we use dynamic tainting to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities. The result is a context-free grammar that reflects valid input structure. In its evaluation, our AUTOGRAM prototype automatically produced readable and structurally accurate grammars for inputs like URLs, spreadsheets or configuration files. The resulting grammars not only allow simple reverse engineering of input formats, but can also directly serve as input for test generators.

01 Jan 2016
TL;DR: This is a book that will show you even new to old thing, and when you are really dying of grammatical framework programming with multilingual grammars, just pick this book; it will be right for you.
Abstract: It's coming again, the new collection that this site has. To complete your curiosity, we offer the favorite grammatical framework programming with multilingual grammars book as the choice today. This is a book that will show you even new to old thing. Forget it; it will be right for you. Well, when you are really dying of grammatical framework programming with multilingual grammars, just pick it. You know, this book is always making the fans to be dizzy if not to find.

Posted Content
TL;DR: It is demonstrated that current neural machine translation could already be used for in-production systems when comparing words-persecond ratios, and aspects of translation speed are investigated, introducing AmuNMT, the authors' efficient neural machinetranslation decoder.
Abstract: In this paper we provide the largest published comparison of translation quality for phrase-based SMT and neural machine translation across 30 translation directions. For ten directions we also include hierarchical phrase-based MT. Experiments are performed for the recently published United Nations Parallel Corpus v1.0 and its large six-way sentence-aligned subcorpus. In the second part of the paper we investigate aspects of translation speed, introducing AmuNMT, our efficient neural machine translation decoder. We demonstrate that current neural machine translation could already be used for in-production systems when comparing words-per-second ratios.

Journal ArticleDOI
01 May 2016
TL;DR: A multi-criteria group decision making (MCGDM) technique based on the fuzzy VIKOR method is developed to solve a CNC machine tool selection problem and a general MCGDM framework is proposed.
Abstract: Graphical abstractDisplay Omitted HighlightsTwo algorithms for VIKOR method based on the fuzzy linguistic approach are developed.A general MCGDM framework for a machine tool selection problem is presented.The proposed framework is verified by an example. Computer numerical control (CNC) machines are used for repetitive, difficult and unsafe manufacturing tasks that require a high degree of accuracy. However, when selecting an appropriate CNC machine, multiple criteria need to be considered by multiple decision makers. In this study, a multi-criteria group decision making (MCGDM) technique based on the fuzzy VIKOR method is developed to solve a CNC machine tool selection problem. Linguistic variables represented by triangular fuzzy numbers are used to reflect decision maker preferences for the criteria importance weights and the performance ratings. After the individual preferences are aggregated or after the separation values are computed, they are then defuzzified. In this paper, two algorithms based on a fuzzy linguistic approach are developed. Based on these two algorithms and the VIKOR method, a general MCGDM framework is proposed. A CNC machine tool selection example illustrates the application of the proposed approach. A comparative study of the two algorithms using the above case study information highlighted the need to combine the ranking results, as both algorithms have distinct characteristics.

Posted Content
TL;DR: A multi-source machine translation model is built and trained to maximize the probability of a target English string given French and German sources to report up to +4.8 Bleu increases on top of a very strong attention-based neural translation model.
Abstract: We build a multi-source machine translation model and train it to maximize the probability of a target English string given French and German sources. Using the neural encoder-decoder framework, we explore several combination methods and report up to +4.8 Bleu increases on top of a very strong attention-based neural translation model.

Posted Content
TL;DR: This paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset, and enhances the objective function of Neural Programmer, a neural network with built-in discrete operations, and applies it on WikiTableQuestions, a natural language question-answering dataset.
Abstract: Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser.

Journal ArticleDOI
01 Dec 2016
TL;DR: This paper proposes the probabilistic linguistic vector-term sets (PLVTSs) to promote the application of multi-granular linguistic information and develops a novel algorithm to tackle multi-attribute group decision making (MAGDM) problems with multiple LESs.
Abstract: Display Omitted The concept of probabilistic linguistic vector-term set (PLVTS) is proposed to consider the score of linguistic term and its associated change rate simultaneously.A novel algorithm is developed to aid MAGDM with multiple linguistic evaluation scales to deal with the large group decision making with linguistic terms at the aspect of patients.Demonstrate the practical guiding significance for the product-provider (such as the hospital). With the rapid information explosion and sharing, recommender systems (RS) play an auxiliary role in assisting the Internet users to make decision especially in the e-service platform. Normally, the information in this process is related to opinions and preferences, which are usually expressed through a qualitative way such as linguistic evaluation terms (LETs). However, the LETs may come from different sources such as experts, users, etc., which makes the linguistic evaluation scales (LESs) used in this process probably be different due to their different backgrounds and levels of knowledge. The diversity and flexibility of these LESs determine the quality of information, and further affect the effectiveness of a RS. In this paper, we focus on improving the accuracy of the multi-granular linguistic recommender system by supporting customers to find out the most eligible items according their own preferences. We first propose the probabilistic linguistic vector-term sets (PLVTSs) to promote the application of multi-granular linguistic information. Based on the PLVTSs, we then develop a novel algorithm to tackle multi-attribute group decision making (MAGDM) problems with multiple LESs. Furthermore, the effectiveness of the PLVTSs is validated by an illustration of personalized hospital selection-recommender problem. Finally, we point out some possible research directions regrading to the PLVTSs.

Journal ArticleDOI
TL;DR: It is argued that, beyond these random factors, linguistic differences, from sounds to grammars, may also reflect adaptations to different environments in which the languages are learned and used.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: An algorithmic approach is developed that combines the strengths of both machine learning classification and machine translation and is better at correcting complex mistakes.
Abstract: We focus on two leading state-of-the-art approaches to grammatical error correction – machine learning classification and machine translation. Based on the comparative study of the two learning frameworks and through error analysis of the output of the state-of-the-art systems, we identify key strengths and weaknesses of each of these approaches and demonstrate their complementarity. In particular, the machine translation method learns from parallel data without requiring further linguistic input and is better at correcting complex mistakes. The classification approach possesses other desirable characteristics, such as the ability to easily generalize beyond what was seen in training, the ability to train without human-annotated data, and the flexibility to adjust knowledge sources for individual error types. Based on this analysis, we develop an algorithmic approach that combines the strengths of both methods. We present several systems based on resources used in previous work with a relative improvement of over 20% (and 7.4 F score points) over the previous state-of-the-art.

Posted Content
TL;DR: This paper proposes to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence and proposes to represent special tokens with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors.
Abstract: We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine translation need to spend substantial amount of their capacity in disambiguating source and target words based on the context which is defined by a source sentence. Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence. Additionally, we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors. The experiments on En-Fr and En-De reveal that the proposed approaches of contextualization and symbolization improves the translation quality of neural machine translation systems significantly.

Proceedings ArticleDOI
08 Aug 2016
TL;DR: This paper builds and formulate a semantic space to connect the source and target languages, and applies it to the sequence-to-sequence framework to propose a Knowledge-Based Semantic Embedding (KBSE) method.
Abstract: In this paper, with the help of knowledge base, we build and formulate a semantic space to connect the source and target languages, and apply it to the sequence-to-sequence framework to propose a Knowledge-Based Semantic Embedding (KBSE) method. In our KBSE method, the source sentence is firstly mapped into a knowledge based semantic space, and the target sentence is generated using a recurrent neural network with the internal meaning preserved. Experiments are conducted on two translation tasks, the electric business data and movie data, and the results show that our proposed method can achieve outstanding performance, compared with both the traditional SMT methods and the existing encoder-decoder models.

01 Jan 2016
TL;DR: The use of text simplification as a pre-processing step for statistical machine translation of grammatically complex under-resourced languages can improve grammaticality (fluency) of the translation output and reduce technical post-editing effort.
Abstract: This article explores the use of text simplification as a pre-processing step for statistical machine translation of grammatically complex under-resourced languages. Our experiments on English-to-Serbian translation show that this approach can improve grammaticality (fluency) of the translation output and reduce technical post-editing effort (number of post-edit operations). Furthermore, the use of more aggressive text simplification methods (which do not only simplify the given sentence but also discard irrelevant information thus producing syntactically very simple sentences) also improves meaning preservation (adequacy) of the translation output.

Posted Content
TL;DR: This paper builds a neural posterior approximator conditioned on both the source and the target sides, and equip it with a reparameterization technique to estimate the variational lower bound, and shows that the proposed variational neural machine translation achieves significant improvements over the vanilla neural machinetranslation baselines.
Abstract: Models of neural machine translation are often from a discriminative family of encoderdecoders that learn a conditional distribution of a target sentence given a source sentence. In this paper, we propose a variational model to learn this conditional distribution for neural machine translation: a variational encoderdecoder model that can be trained end-to-end. Different from the vanilla encoder-decoder model that generates target translations from hidden representations of source sentences alone, the variational model introduces a continuous latent variable to explicitly model underlying semantics of source sentences and to guide the generation of target translations. In order to perform efficient posterior inference and large-scale training, we build a neural posterior approximator conditioned on both the source and the target sides, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on both Chinese-English and English- German translation tasks show that the proposed variational neural machine translation achieves significant improvements over the vanilla neural machine translation baselines.

Posted Content
TL;DR: The authors generalize the embedding layer of the encoder in the attentional encoder-decoder architecture to support the inclusion of arbitrary features, in addition to the baseline word feature, and add morphological features, part-of-speech tags, and syntactic dependency labels as input features to English German, and English->Romanian NMT systems.
Abstract: Neural machine translation has recently achieved impressive results, while using little in the way of external linguistic information. In this paper we show that the strong learning capability of neural MT models does not make linguistic features redundant; they can be easily incorporated to provide further improvements in performance. We generalize the embedding layer of the encoder in the attentional encoder--decoder architecture to support the inclusion of arbitrary features, in addition to the baseline word feature. We add morphological features, part-of-speech tags, and syntactic dependency labels as input features to English German, and English->Romanian neural machine translation systems. In experiments on WMT16 training and test sets, we find that linguistic input features improve model quality according to three metrics: perplexity, BLEU and CHRF3. An open-source implementation of our neural MT system is available, as are sample files and configurations.

Posted Content
TL;DR: It is shown that decoding time on CPUs can be reduced by up to 90% and training time by 25% on the WMT15 English-German and WMT16 English-Romanian tasks at the same or only negligible change in accuracy.
Abstract: Classical translation models constrain the space of possible outputs by selecting a subset of translation rules based on the input sentence. Recent work on improving the efficiency of neural translation models adopted a similar strategy by restricting the output vocabulary to a subset of likely candidates given the source. In this paper we experiment with context and embedding-based selection methods and extend previous work by examining speed and accuracy trade-offs in more detail. We show that decoding time on CPUs can be reduced by up to 90% and training time by 25% on the WMT15 English-German and WMT16 English-Romanian tasks at the same or only negligible change in accuracy. This brings the time to decode with a state of the art neural translation system to just over 140 msec per sentence on a single CPU core for English-German.