scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Instance Weighting for Neural Machine Translation Domain Adaptation

TL;DR: Two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation and empirical results show that the proposed methods can substantially improve NMT performance.
Abstract: Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation. Empirical results on the IWSLT English-German/French tasks show that the proposed methods can substantially improve NMT performance by up to 2.7-6.7 BLEU points, outperforming the existing baselines by up to 1.6-3.6 BLEU points.

Content maybe subject to copyright    Report

Citations
More filters
Book
23 Jul 2020
TL;DR: A comprehensive treatment of the topic, ranging from introduction to neural networks, computation graphs, description of the currently dominant attentional sequence-to-sequence model, recent refinements, alternative architectures and challenges.
Abstract: Deep learning is revolutionizing how machine translation systems are built today This book introduces the challenge of machine translation and evaluation - including historical, linguistic, and applied context -- then develops the core deep learning methods used for natural language applications Code examples in Python give readers a hands-on blueprint for understanding and implementing their own machine translation systems The book also provides extensive coverage of machine learning tricks, issues involved in handling various forms of data, model enhancements, and current challenges and methods for analysis and visualization Summaries of the current research in the field make this a state-of-the-art textbook for undergraduate and graduate classes, as well as an essential reference for researchers and developers interested in other applications of neural methods in the broader field of human language processing

239 citations

Proceedings Article
01 Jun 2018
TL;DR: A comprehensive survey of the state-of-the-art domain adaptation techniques for NMT is given, which leverages both out- of-domain parallel corpora as well as monolingual corpora for in-domain translation.
Abstract: Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

182 citations


Cites background or methods from "Instance Weighting for Neural Machi..."

  • ...On the other hand, the model centric category focuses on NMT models that are specialized for domain adaptation, which can be either the training objective (Luong and Manning, 2015; Sennrich et al., 2016b; Servan et al., 2016; Freitag and Al-Onaizan, 2016; Wang et al., 2017b; Chen et al., 2017a; Varga, 2017; Dakwale and Monz, 2017; Chu et al., 2017; Miceli Barone et al., 2017), the NMT architecture (Kobus et al....

    [...]

  • ...To address this problem, Wang et al. (2017a) exploit the internal embedding of the source sentence in NMT, and use the sentence embedding similarity to select the sentences that are close to in-domain data from out-of-domain data (Figure 4)....

    [...]

  • ...Figure 5: Instance weighting for NMT (Wang et al., 2017b)....

    [...]

  • ...…and Zong, 2016b; Cheng et al., 2016; Currey et al., 2017; Domhan and Hieber, 2017), synthetic corpora (Sennrich et al., 2016b; Zhang and Zong, 2016b; Park et al., 2017), or parallel copora (Chu et al., 2017; Sajjad et al., 2017; Britz et al., 2017; Wang et al., 2017a; van der Wees et al., 2017)....

    [...]

  • ...Data Selection As mentioned in the SMT section (Section 3.1), the data selection methods in SMT can improve NMT performance modestly, because their criteria of data selection are not very related to NMT (Wang et al., 2017a)....

    [...]

Proceedings ArticleDOI
01 Sep 2018
TL;DR: This paper proposed a benchmark dataset for machine translation of noisy text, consisting of noisy comments on Reddit (www.reddit.com) and professionally sourced translations, on the order of 7k-37k sentences per language pair.
Abstract: Noisy or non-standard input text can cause disastrous mistranslations in most modern Machine Translation (MT) systems, and there has been growing research interest in creating noise-robust MT systems. However, as of yet there are no publicly available parallel corpora of with naturally occurring noisy inputs and translations, and thus previous work has resorted to evaluating on synthetically created datasets. In this paper, we propose a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit (www.reddit.com) and professionally sourced translations. We commissioned translations of English comments into French and Japanese, as well as French and Japanese comments into English, on the order of 7k-37k sentences per language pair. We qualitatively and quantitatively examine the types of noise included in this dataset, then demonstrate that existing MT models fail badly on a number of noise-related phenomena, even after performing adaptation on a small training set of in-domain data. This indicates that this dataset can provide an attractive testbed for methods tailored to handling noisy text in MT.

146 citations

Journal ArticleDOI
TL;DR: This paper for the first time summarizes the state-of-art cross-domain fault diagnosis research works from three different viewpoints: research motivations, cross- domain strategies, and application objects and provides readers a framework for better understanding and identifying the research status, challenges and future directions of cross- domains fault diagnosis.
Abstract: Data-driven fault diagnosis has been a hot topic in recent years with the development of machine learning techniques. However, the prerequisite that the training data and the test data should follow an identical distribution prevents the conventional data-driven diagnosis methods from being applied to the engineering diagnosis problems. To tackle this dilemma, cross-domain fault diagnosis using knowledge transfer strategy is becoming popular in the past five years. The diagnosis methods based on transfer learning aim to build models that can perform well on target tasks by leveraging knowledge from semantic related but distribution different source domains. This paper for the first time summarizes the state-of-art cross-domain fault diagnosis research works. The literatures are introduced from three different viewpoints: research motivations, cross-domain strategies, and application objects. In addition, the corresponding open-source fault datasets and several future directions are also presented. The survey provides readers a framework for better understanding and identifying the research status, challenges and future directions of cross-domain fault diagnosis.

127 citations

Posted Content
TL;DR: This paper surveys the recent advances in transfer adaptation learning methodology and potential benchmarks and provides researchers a framework for better understanding and identifying the research status, challenges and future directions of the field.
Abstract: The world we see is ever-changing and it always changes with people, things, and the environment. Domain is referred to as the state of the world at a certain moment. A research problem is characterized as transfer adaptation learning (TAL) when it needs knowledge correspondence between different moments/domains. Conventional machine learning aims to find a model with the minimum expected risk on test data by minimizing the regularized empirical risk on the training data, which, however, supposes that the training and test data share similar joint probability distribution. TAL aims to build models that can perform tasks of target domain by learning knowledge from a semantic related but distribution different source domain. It is an energetic research filed of increasing influence and importance, which is presenting a blowout publication trend. This paper surveys the advances of TAL methodologies in the past decade, and the technical challenges and essential problems of TAL have been observed and discussed with deep insights and new perspectives. Broader solutions of transfer adaptation learning being created by researchers are identified, i.e., instance re-weighting adaptation, feature adaptation, classifier adaptation, deep network adaptation and adversarial adaptation, which are beyond the early semi-supervised and unsupervised split. The survey helps researchers rapidly but comprehensively understand and identify the research foundation, research status, theoretical limitations, future challenges and under-studied issues (universality, interpretability, and credibility) to be broken in the field toward universal representation and safe applications in open-world scenarios.

125 citations


Cites background from "Instance Weighting for Neural Machi..."

  • ...Intuitive Weighting Adaptive tuning [98], [99], [100], [101]...

    [...]

  • ...Instance re-weighting based domain adaptation was first proposed for natural language processing (NLP) [98], [99]....

    [...]

References
More filters
Proceedings ArticleDOI
06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

21,126 citations

Proceedings ArticleDOI
17 Aug 2015
TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1

8,055 citations


"Instance Weighting for Neural Machi..." refers background in this paper

  • ...…directly models the conditional probability p(y|x) of translating a source sentence, x = {x1, ..., xn}, to a target sentence, y = {y1, ..., ym} (Luong et al., 2015): p(y|x) = m∏ j=1 softmax(g(yj |yj−1, sj , cj)), (1) with g being the transformation function that outputs a vocabulary-sized…...

    [...]

Posted Content
TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.
Abstract: We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

6,189 citations


"Instance Weighting for Neural Machi..." refers methods in this paper

  • ...Each NMT model was trained for 500K batches by using ADADELTA optimizer (Zeiler, 2012)....

    [...]

Proceedings ArticleDOI
25 Jun 2007
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
Abstract: We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

6,008 citations


"Instance Weighting for Neural Machi..." refers background or methods in this paper

  • ...Further training (Luong and Manning, 2015) can be viewed as a special case of the proposed batch weighting method....

    [...]

  • ...Recently, Chu et al. (2017) make an empirical comparison of NMT further training (Luong and Manning, 2015) and domain control (Kobus et al., 2016), which applied word-level domain features to word embedding layer....

    [...]

  • ...This adaptation corpora settings were the same as those used in (Luong and Manning, 2015)....

    [...]

  • ...In Statistical Machine Translation (SMT), unrelated additional corpora, known as out-ofdomain corpora, have been shown not to benefit some domains and tasks, such as TED-talks and IWSLT tasks (Axelrod et al., 2011; Luong and Manning, 2015)....

    [...]

  • ...There are two methods for model combination of NMT: i) the in-domain model and out-of-domain model can be ensembled (Jean et al., 2015). ii) an NMT further training (fine-tuning) method (Luong and Manning, 2015)....

    [...]

Proceedings Article
01 Jul 2004
TL;DR: The authors describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete example of the BLEU score for small test sizes of only 300 sentences, which may give us assurances that test result differences are real.
Abstract: If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete example of the BLEU score. Even for small test sizes of only 300 sentences, our methods may give us assurances that test result differences are real.

1,690 citations


"Instance Weighting for Neural Machi..." refers methods in this paper

  • ..., 2002), with the paired bootstrap re-sampling test (Koehn, 2004)5....

    [...]