Instance Weighting for Neural Machine Translation Domain Adaptation

doi:10.18653/V1/D17-1155

Home
/
Papers
/
Instance Weighting for Neural Machine Translation Domain Adaptation

Proceedings Article•DOI•

Instance Weighting for Neural Machine Translation Domain Adaptation

Rui Wang¹, Masao Utiyama¹, Lemao Liu¹, Kehai Chen², Eiichiro Sumita¹ - Show less +1 more•Institutions (2)

National Institute of Information and Communications Technology¹, Harbin Institute of Technology²

01 Sep 2017-pp 1482-1488

TL;DR: Two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation and empirical results show that the proposed methods can substantially improve NMT performance.

read less

Abstract: Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation. Empirical results on the IWSLT English-German/French tasks show that the proposed methods can substantially improve NMT performance by up to 2.7-6.7 BLEU points, outperforming the existing baselines by up to 1.6-3.6 BLEU points.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Neural Machine Translation

[...]

Philipp Koehn¹•Institutions (1)

Johns Hopkins University¹

23 Jul 2020

TL;DR: A comprehensive treatment of the topic, ranging from introduction to neural networks, computation graphs, description of the currently dominant attentional sequence-to-sequence model, recent refinements, alternative architectures and challenges.

...read moreread less

Abstract: Deep learning is revolutionizing how machine translation systems are built today This book introduces the challenge of machine translation and evaluation - including historical, linguistic, and applied context -- then develops the core deep learning methods used for natural language applications Code examples in Python give readers a hands-on blueprint for understanding and implementing their own machine translation systems The book also provides extensive coverage of machine learning tricks, issues involved in handling various forms of data, model enhancements, and current challenges and methods for analysis and visualization Summaries of the current research in the field make this a state-of-the-art textbook for undergraduate and graduate classes, as well as an essential reference for researchers and developers interested in other applications of neural methods in the broader field of human language processing

...read moreread less

239 citations

Proceedings Article•

A Survey of Domain Adaptation for Neural Machine Translation

[...]

Chenhui Chu¹, Rui Wang²•Institutions (2)

Osaka University¹, National Institute of Information and Communications Technology²

01 Jun 2018

TL;DR: A comprehensive survey of the state-of-the-art domain adaptation techniques for NMT is given, which leverages both out- of-domain parallel corpora as well as monolingual corpora for in-domain translation.

...read moreread less

Abstract: Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

...read moreread less

182 citations

Cites background or methods from "Instance Weighting for Neural Machi..."

...On the other hand, the model centric category focuses on NMT models that are specialized for domain adaptation, which can be either the training objective (Luong and Manning, 2015; Sennrich et al., 2016b; Servan et al., 2016; Freitag and Al-Onaizan, 2016; Wang et al., 2017b; Chen et al., 2017a; Varga, 2017; Dakwale and Monz, 2017; Chu et al., 2017; Miceli Barone et al., 2017), the NMT architecture (Kobus et al....
[...]
...To address this problem, Wang et al. (2017a) exploit the internal embedding of the source sentence in NMT, and use the sentence embedding similarity to select the sentences that are close to in-domain data from out-of-domain data (Figure 4)....
[...]
...Figure 5: Instance weighting for NMT (Wang et al., 2017b)....
[...]
...…and Zong, 2016b; Cheng et al., 2016; Currey et al., 2017; Domhan and Hieber, 2017), synthetic corpora (Sennrich et al., 2016b; Zhang and Zong, 2016b; Park et al., 2017), or parallel copora (Chu et al., 2017; Sajjad et al., 2017; Britz et al., 2017; Wang et al., 2017a; van der Wees et al., 2017)....
[...]
...Data Selection As mentioned in the SMT section (Section 3.1), the data selection methods in SMT can improve NMT performance modestly, because their criteria of data selection are not very related to NMT (Wang et al., 2017a)....
[...]

Proceedings Article•DOI•

MTNT: A Testbed for Machine Translation of Noisy Text

[...]

Paul Michel¹, Graham Neubig¹•Institutions (1)

Carnegie Mellon University¹

01 Sep 2018

TL;DR: This paper proposed a benchmark dataset for machine translation of noisy text, consisting of noisy comments on Reddit (www.reddit.com) and professionally sourced translations, on the order of 7k-37k sentences per language pair.

...read moreread less

Abstract: Noisy or non-standard input text can cause disastrous mistranslations in most modern Machine Translation (MT) systems, and there has been growing research interest in creating noise-robust MT systems. However, as of yet there are no publicly available parallel corpora of with naturally occurring noisy inputs and translations, and thus previous work has resorted to evaluating on synthetically created datasets. In this paper, we propose a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit (www.reddit.com) and professionally sourced translations. We commissioned translations of English comments into French and Japanese, as well as French and Japanese comments into English, on the order of 7k-37k sentences per language pair. We qualitatively and quantitatively examine the types of noise included in this dataset, then demonstrate that existing MT models fail badly on a number of noise-related phenomena, even after performing adaptation on a small training set of in-domain data. This indicates that this dataset can provide an attractive testbed for methods tailored to handling noisy text in MT.

...read moreread less

146 citations

Journal Article•DOI•

Cross-Domain Fault Diagnosis Using Knowledge Transfer Strategy: A Review

[...]

Huailiang Zheng¹, Rixin Wang¹, Yuantao Yang¹, Jiancheng Yin¹, Yongbo Li², Yuqing Li¹, Minqiang Xu¹ - Show less +3 more•Institutions (2)

Harbin Institute of Technology¹, Northwestern Polytechnical University²

06 Sep 2019-IEEE Access

TL;DR: This paper for the first time summarizes the state-of-art cross-domain fault diagnosis research works from three different viewpoints: research motivations, cross- domain strategies, and application objects and provides readers a framework for better understanding and identifying the research status, challenges and future directions of cross- domains fault diagnosis.

...read moreread less

Abstract: Data-driven fault diagnosis has been a hot topic in recent years with the development of machine learning techniques. However, the prerequisite that the training data and the test data should follow an identical distribution prevents the conventional data-driven diagnosis methods from being applied to the engineering diagnosis problems. To tackle this dilemma, cross-domain fault diagnosis using knowledge transfer strategy is becoming popular in the past five years. The diagnosis methods based on transfer learning aim to build models that can perform well on target tasks by leveraging knowledge from semantic related but distribution different source domains. This paper for the first time summarizes the state-of-art cross-domain fault diagnosis research works. The literatures are introduced from three different viewpoints: research motivations, cross-domain strategies, and application objects. In addition, the corresponding open-source fault datasets and several future directions are also presented. The survey provides readers a framework for better understanding and identifying the research status, challenges and future directions of cross-domain fault diagnosis.

...read moreread less

127 citations

Posted Content•

Transfer Adaptation Learning: A Decade Survey

[...]

Lei Zhang, B Rajesh Kumar¹•Institutions (1)

ShanghaiTech University¹

12 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper surveys the recent advances in transfer adaptation learning methodology and potential benchmarks and provides researchers a framework for better understanding and identifying the research status, challenges and future directions of the field.

...read moreread less

Abstract: The world we see is ever-changing and it always changes with people, things, and the environment. Domain is referred to as the state of the world at a certain moment. A research problem is characterized as transfer adaptation learning (TAL) when it needs knowledge correspondence between different moments/domains. Conventional machine learning aims to find a model with the minimum expected risk on test data by minimizing the regularized empirical risk on the training data, which, however, supposes that the training and test data share similar joint probability distribution. TAL aims to build models that can perform tasks of target domain by learning knowledge from a semantic related but distribution different source domain. It is an energetic research filed of increasing influence and importance, which is presenting a blowout publication trend. This paper surveys the advances of TAL methodologies in the past decade, and the technical challenges and essential problems of TAL have been observed and discussed with deep insights and new perspectives. Broader solutions of transfer adaptation learning being created by researchers are identified, i.e., instance re-weighting adaptation, feature adaptation, classifier adaptation, deep network adaptation and adversarial adaptation, which are beyond the early semi-supervised and unsupervised split. The survey helps researchers rapidly but comprehensively understand and identify the research foundation, research status, theoretical limitations, future challenges and under-studied issues (universality, interpretability, and credibility) to be broken in the field toward universal representation and safe applications in open-world scenarios.

...read moreread less

125 citations

Cites background from "Instance Weighting for Neural Machi..."

...Intuitive Weighting Adaptive tuning [98], [99], [100], [101]...
[...]
...Instance re-weighting based domain adaptation was first proposed for natural language processing (NLP) [98], [99]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Bleu: a Method for Automatic Evaluation of Machine Translation

[...]

Kishore Papineni¹, Salim Roukos¹, Todd Ward¹, Wei-Jing Zhu¹•Institutions (1)

IBM¹

06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

...read moreread less

21,126 citations

Proceedings Article•DOI•

Effective Approaches to Attention-based Neural Machine Translation

[...]

Minh-Thang Luong¹, Hieu Pham¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

17 Aug 2015

TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

...read moreread less

Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1

...read moreread less

8,055 citations

"Instance Weighting for Neural Machi..." refers background in this paper

...…directly models the conditional probability p(y|x) of translating a source sentence, x = {x1, ..., xn}, to a target sentence, y = {y1, ..., ym} (Luong et al., 2015): p(y|x) = m∏ j=1 softmax(g(yj |yj−1, sj , cj)), (1) with g being the transformation function that outputs a vocabulary-sized…...
[...]

Posted Content•

ADADELTA: An Adaptive Learning Rate Method

[...]

Matthew D. Zeiler

22 Dec 2012-arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Abstract: We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

...read moreread less

6,189 citations

"Instance Weighting for Neural Machi..." refers methods in this paper

...Each NMT model was trained for 500K batches by using ADADELTA optimizer (Zeiler, 2012)....
[...]

Proceedings Article•DOI•

Moses: Open Source Toolkit for Statistical Machine Translation

[...]

Philipp Koehn¹, Hieu Hoang¹, Alexandra Birch¹, Chris Callison-Burch¹, Marcello Federico, Nicola Bertoldi, Brooke Cowan², Wade Shen², C. Corbett Moran², Richard Zens³, Chris Dyer⁴, Ondrej Bojar⁵, Alexandra Elena Constantin⁶, Evan Herbst⁷ - Show less +10 more•Institutions (7)

University of Edinburgh¹, Massachusetts Institute of Technology², RWTH Aachen University³, University of Maryland, College Park⁴, Charles University in Prague⁵, Williams College⁶, Cornell University⁷

25 Jun 2007

TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.

...read moreread less

Abstract: We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

...read moreread less

6,008 citations

"Instance Weighting for Neural Machi..." refers background or methods in this paper

...Further training (Luong and Manning, 2015) can be viewed as a special case of the proposed batch weighting method....
[...]
...Recently, Chu et al. (2017) make an empirical comparison of NMT further training (Luong and Manning, 2015) and domain control (Kobus et al., 2016), which applied word-level domain features to word embedding layer....
[...]
...This adaptation corpora settings were the same as those used in (Luong and Manning, 2015)....
[...]
...In Statistical Machine Translation (SMT), unrelated additional corpora, known as out-ofdomain corpora, have been shown not to benefit some domains and tasks, such as TED-talks and IWSLT tasks (Axelrod et al., 2011; Luong and Manning, 2015)....
[...]
...There are two methods for model combination of NMT: i) the in-domain model and out-of-domain model can be ensembled (Jean et al., 2015). ii) an NMT further training (fine-tuning) method (Luong and Manning, 2015)....
[...]

Proceedings Article•

Statistical Significance Tests for Machine Translation Evaluation.

[...]

Philipp Koehn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 2004

TL;DR: The authors describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete example of the BLEU score for small test sizes of only 300 sentences, which may give us assurances that test result differences are real.

...read moreread less

Abstract: If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete example of the BLEU score. Even for small test sizes of only 300 sentences, our methods may give us assurances that test result differences are real.

...read moreread less

1,690 citations