Top 26 papers published by Kevin Duh from Johns Hopkins University in 2019

Proceedings Article•DOI•

Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation

[...]

Brian Thompson¹, Jeremy Gwinnup, Huda Khayrallah¹, Kevin Duh¹, Philipp Koehn¹ - Show less +1 more•Institutions (1)

01 Jun 2019

TL;DR: This work adapts Elastic Weight Consolidation (EWC)—a machine learning method for learning a new task without forgetting previous tasks—to mitigate the drop in general-domain performance as catastrophic forgetting of general- domain knowledge.

...read moreread less

Abstract: Continued training is an effective method for domain adaptation in neural machine translation. However, in-domain gains from adaptation come at the expense of general-domain performance. In this work, we interpret the drop in general-domain performance as catastrophic forgetting of general-domain knowledge. To mitigate it, we adapt Elastic Weight Consolidation (EWC)—a machine learning method for learning a new task without forgetting previous tasks. Our method retains the majority of general-domain performance lost in continued training without degrading in-domain performance, outperforming the previous state-of-the-art. We also explore the full range of general-domain performance available when some in-domain degradation is acceptable.

...read moreread less

128 citations

Proceedings Article•DOI•

AMR Parsing as Sequence-to-Graph Transduction.

[...]

Sheng Zhang¹, Xutai Ma², Kevin Duh³, Benjamin Van Durme³•Institutions (3)

Singapore Management University¹, Facebook², Johns Hopkins University³

01 Jul 2019

TL;DR: This work proposes an attention-based model that treats AMR parsing as sequence-to-graph transduction, and it can be effectively trained with limited amounts of labeled AMR data.

...read moreread less

Abstract: We propose an attention-based model that treats AMR parsing as sequence-to-graph transduction. Unlike most AMR parsers that rely on pre-trained aligners, external semantic resources, or data augmentation, our proposed parser is aligner-free, and it can be effectively trained with limited amounts of labeled AMR data. Our experimental results outperform all previously reported SMATCH scores, on both AMR 2.0 (76.3% on LDC2017T10) and AMR 1.0 (70.2% on LDC2014T12).

...read moreread less

107 citations

Proceedings Article•DOI•

Curriculum Learning for Domain Adaptation in Neural Machine Translation

[...]

Xuan Zhang¹, Pamela Shapiro¹, Gaurav Kumar¹, Paul McNamee¹, Marine Carpuat², Kevin Duh¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, University of Maryland, College Park²

01 Jun 2019

TL;DR: This article introduced a curriculum learning approach to adapt generic NMT models to a specific domain, where samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule.

...read moreread less

Abstract: We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.

...read moreread less

78 citations

Proceedings Article•DOI•

Broad-Coverage Semantic Parsing as Transduction

[...]

Sheng Zhang¹, Xutai Ma², Kevin Duh², Benjamin Van Durme²•Institutions (2)

Singapore Management University¹, Johns Hopkins University²

01 Nov 2019

TL;DR: This article propose an attention-based neural transducer that incrementally builds meaning representation via a sequence of semantic relations, which can be effectively trained without relying on a pre-trained aligner.

...read moreread less

Abstract: We unify different broad-coverage semantic parsing tasks into a transduction parsing paradigm, and propose an attention-based neural transducer that incrementally builds meaning representation via a sequence of semantic relations. By leveraging multiple attention mechanisms, the neural transducer can be effectively trained without relying on a pre-trained aligner. Experiments separately conducted on three broad-coverage semantic parsing tasks – AMR, SDP and UCCA – demonstrate that our attention-based neural transducer improves the state of the art on both AMR and UCCA, and is competitive with the state of the art on SDP.

...read moreread less

72 citations

Proceedings Article•DOI•

Multilingual End-to-End Speech Translation

[...]

Hirofumi Inaguma¹, Kevin Duh², Tatsuya Kawahara¹, Shinji Watanabe²•Institutions (2)

Kyoto University¹, Johns Hopkins University²

01 Oct 2019

TL;DR: In this paper, the authors proposed a multilingual end-to-end speech translation (ST) model, in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-tosequence architecture.

...read moreread less

Abstract: In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic11Available at https://github.com/espnet/espnet..

...read moreread less

56 citations

Posted Content•

Curriculum Learning for Domain Adaptation in Neural Machine Translation

[...]

Xuan Zhang¹, Pamela Shapiro¹, Gaurav Kumar¹, Paul McNamee¹, Marine Carpuat², Kevin Duh¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, University of Maryland, College Park²

14 May 2019-arXiv: Computation and Language

TL;DR: This work introduces a curriculum learning approach to adapt generic neural machine translation models to a specific domain and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.

...read moreread less

Abstract: We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.

...read moreread less

52 citations

A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation

[...]

Shuoyang Ding, Adithya Renduchintala, Kevin Duh

01 Aug 2019

TL;DR: In this paper, the authors conduct a systematic exploration on different numbers of BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair.

...read moreread less

Abstract: Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration on different numbers of BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair. Our exploration could provide guidance for selecting proper BPE configurations in the future. Most prominently: we show that for LSTM-based architectures, it is necessary to experiment with a wide range of different BPE operations as there is no typical optimal BPE configuration, whereas for Transformer architectures, smaller BPE size tends to be a typically optimal choice. We urge the community to make prudent choices with subword merge operations, as our experiments indicate that a sub-optimal BPE configuration alone could easily reduce the system performance by 3-4 BLEU points.

...read moreread less

47 citations

Posted Content•

Broad-Coverage Semantic Parsing as Transduction.

[...]

Sheng Zhang¹, Xutai Ma², Kevin Duh², Benjamin Van Durme²•Institutions (2)

Singapore Management University¹, Johns Hopkins University²

05 Sep 2019-arXiv: Computation and Language

TL;DR: An attention-based neural transducer that incrementally builds meaning representation via a sequence of semantic relations is proposed that improves the state of the art on both AMR and UCCA, and is competitive with the state-of-the- art on SDP.

...read moreread less

Abstract: We unify different broad-coverage semantic parsing tasks under a transduction paradigm, and propose an attention-based neural framework that incrementally builds a meaning representation via a sequence of semantic relations. By leveraging multiple attention mechanisms, the transducer can be effectively trained without relying on a pre-trained aligner. Experiments conducted on three separate broad-coverage semantic parsing tasks -- AMR, SDP and UCCA -- demonstrate that our attention-based neural transducer improves the state of the art on both AMR and UCCA, and is competitive with the state of the art on SDP.

...read moreread less

29 citations

Posted Content•

Multilingual End-to-End Speech Translation

[...]

Hirofumi Inaguma¹, Kevin Duh², Tatsuya Kawahara¹, Shinji Watanabe²•Institutions (2)

Kyoto University¹, Johns Hopkins University²

01 Oct 2019-arXiv: Computation and Language

TL;DR: It is experimentally confirmed that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios and the generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair.

...read moreread less

Abstract: In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic.

...read moreread less

26 citations

Identifying Fluently Inadequate Output in Neural and Statistical Machine Translation.

[...]

Marianna J. Martindale, Marine Carpuat, Kevin Duh, Paul McNamee

01 Aug 2019

TL;DR: A method for automatically predicting whether translated segments are fluently inadequate by predicting fluency using grammaticality scores and predicting adequacy by augmenting sentence BLEU with a novel Bag-of-Vectors Sentence Similarity (BVSS).

...read moreread less

Abstract: With the impressive fluency of modern machine translation output, systems may produce output that is fluent but not adequate (fluently inadequate). We seek to identify these errors and quantify their frequency in MT output of varying quality. To that end, we introduce a method for automatically predicting whether translated segments are fluently inadequate by predicting fluency using grammaticality scores and predicting adequacy by augmenting sentence BLEU with a novel Bag-of-Vectors Sentence Similarity (BVSS). We then apply this technique to analyze the outputs of statistical and neural systems for six language pairs with different levels of translation quality. We find that neural models are consistently more prone to this type of error than traditional statistical models. However, improving the overall quality of the MT system such as through domain adaptation reduces these errors.

...read moreread less

22 citations

Posted Content•

Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation

[...]

Mitchell A. Gordon¹, Kevin Duh¹•Institutions (1)

Johns Hopkins University¹

06 Dec 2019-arXiv: Computation and Language

TL;DR: This work test the common hypothesis that SLKD addresses a capacity deficiency in students by "simplifying" noisy data points and finds it unlikely in this case, and proposes an alternative hypothesis under the lens of data augmentation and regularization.

...read moreread less

Abstract: Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train smaller models? We test the common hypothesis that SLKD addresses a capacity deficiency in students by "simplifying" noisy data points and find it unlikely in our case. Models trained on concatenations of original and "simplified" datasets generalize just as well as baseline SLKD. We then propose an alternative hypothesis under the lens of data augmentation and regularization. We try various augmentation strategies and observe that dropout regularization can become unnecessary. Our methods achieve BLEU gains of 0.7-1.2 on TED Talks.

...read moreread less

Journal Article•DOI•

Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

[...]

Sorami Hisamoto¹, Matt Post¹, Kevin Duh¹•Institutions (1)

Johns Hopkins University¹

11 Apr 2019-arXiv: Learning

TL;DR: This work defines the membership inference problem for sequence generation, provides an open dataset based on state-of-the-art machine translation models, and reports initial results on whether these models leak private information against several kinds of membership inference attacks.

...read moreread less

Abstract: Data privacy is an important issue for "machine learning as a service" providers. We focus on the problem of membership inference attacks: given a data sample and black-box access to a model's API, determine whether the sample existed in the model's training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.

...read moreread less

Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings

[...]

Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Mahbubur Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh - Show less +5 more

01 Aug 2019

TL;DR: A robust document representation is proposed that combines N-best translations and a novel bag-of-phrases output from various ASR/MT systems and demonstrates that a richer document representation can consistently overcome issues in low translation accuracy for CLIR in low-resource settings.

...read moreread less

Abstract: The goal of cross-lingual information retrieval (CLIR) is to find relevant documents written in languages different from that of the query. Robustness to translation errors is one of the main challenges for CLIR, especially in low-resource settings where there is limited training data for building machine translation (MT) systems or bilingual dictionaries. If the test collection contains speech documents, additional errors from automatic speech recognition (ASR) makes translation even more difficult. We propose a robust document representation that combines N-best translations and a novel bag-of-phrases output from various ASR/MT systems. We perform a comprehensive empirical analysis on three challenging collections; they consist of Somali, Swahili, and Tagalog speech/text documents to be retrieved by English queries. By comparing various ASR/MT systems with different error profiles, our results demonstrate that a richer document representation can consistently overcome issues in low translation accuracy for CLIR in low-resource settings.

...read moreread less

Posted Content•

A Call for Prudent Choice of Subword Merge Operations.

[...]

Shuoyang Ding, Adithya Renduchintala, Kevin Duh

24 May 2019

TL;DR: The authors conduct a systematic exploration of different Byte-Pair Encoding (BPE) merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair.

...read moreread less

Abstract: Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration of different BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair. Our exploration could provide guidance for selecting proper BPE configurations in the future. Most prominently: we show that for LSTM-based architectures, it is necessary to experiment with a wide range of different BPE operations as there is no typical optimal BPE configuration, whereas for Transformer architectures, smaller BPE size tends to be a typically optimal choice. We urge the community to make prudent choices with subword merge operations, as our experiments indicate that a sub-optimal BPE configuration alone could easily reduce the system performance by 3-4 BLEU points.

...read moreread less

Journal Article•DOI•

Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition

[...]

Takafumi Moriya¹, Tomohiro Tanaka¹, Takahiro Shinozaki¹, Shinji Watanabe², Kevin Duh² - Show less +1 more•Institutions (2)

Tokyo Institute of Technology¹, Johns Hopkins University²

01 Jan 2019-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This work proposes to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization and makes use of parallel computation on cloud computers.

...read moreread less

Abstract: The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model and deep neural network. To realize the highest recognition performance, numerous meta-parameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.

...read moreread less

Proceedings Article•DOI•

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

[...]

Brian Thompson¹, Rebecca Knowles¹, Xuan Zhang¹, Huda Khayrallah¹, Kevin Duh¹, Philipp Koehn¹ - Show less +2 more•Institutions (1)

Johns Hopkins University¹

01 Nov 2019

TL;DR: This work presents the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation, and presents two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

...read moreread less

Abstract: Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

...read moreread less

Posted Content•

AMR Parsing as Sequence-to-Graph Transduction

[...]

Sheng Zhang¹, Xutai Ma², Kevin Duh³, Benjamin Van Durme³•Institutions (3)

Singapore Management University¹, Facebook², Johns Hopkins University³

21 May 2019-arXiv: Computation and Language

TL;DR: The authors proposed an attention-based model that treats AMR parsing as sequence-to-graph transduction, which can be effectively trained with a limited amount of labeled AMR data.

...read moreread less

Abstract: We propose an attention-based model that treats AMR parsing as sequence-to-graph transduction. Unlike most AMR parsers that rely on pre-trained aligners, external semantic resources, or data augmentation, our proposed parser is aligner-free, and it can be effectively trained with limited amounts of labeled AMR data. Our experimental results outperform all previously reported SMATCH scores, on both AMR 2.0 (76.3% F1 on LDC2017T10) and AMR 1.0 (70.2% F1 on LDC2014T12).

...read moreread less

Proceedings Article•DOI•

JHU 2019 Robustness Task System Description

[...]

Matt Post, Kevin Duh

01 Aug 2019

TL;DR: The goal was to evaluate the performance of baseline systems on both the official noisy test set as well as news data, in order to ensure that performance gains in the latter did not come at the expense of general-domain performance.

...read moreread less

Abstract: We describe the JHU submissions to the French–English, Japanese–English, and English–Japanese Robustness Task at WMT 2019. Our goal was to evaluate the performance of baseline systems on both the official noisy test set as well as news data, in order to ensure that performance gains in the latter did not come at the expense of general-domain performance. To this end, we built straightforward 6-layer Transformer models and experimented with a handful of variables including subword processing (FR→EN) and a handful of hyperparameters settings (JA↔EN). As expected, our systems performed reasonably.

...read moreread less

Proceedings Article•DOI•

JHU System Description for the MADAR Arabic Dialect Identification Shared Task.

[...]

Thomas Lippincott¹, Pamela Shapiro², Kevin Duh², Paul McNamee²•Institutions (2)

Columbia University¹, Johns Hopkins University²

01 Aug 2019

TL;DR: Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel, which suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.

...read moreread less

Abstract: Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.

...read moreread less

DOI•

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper

[...]

Hirofumi Inaguma, Shun Kiyono, Nelson Enrique Yalta Soplin, Jun Suzuki, Kevin Duh, Shinji Watanabe - Show less +2 more

02 Nov 2019

Posted Content•

A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation

[...]

Shuoyang Ding, Adithya Renduchintala, Kevin Duh

24 May 2019-arXiv: Computation and Language

TL;DR: This paper conducts a systematic exploration of different BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair, and could provide guidance for selecting proper BPE configurations in the future.

...read moreread less

Abstract: Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration on different numbers of BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair. Our exploration could provide guidance for selecting proper BPE configurations in the future. Most prominently: we show that for LSTM-based architectures, it is necessary to experiment with a wide range of different BPE operations as there is no typical optimal BPE configuration, whereas for Transformer architectures, smaller BPE size tends to be a typically optimal choice. We urge the community to make prudent choices with subword merge operations, as our experiments indicate that a sub-optimal BPE configuration alone could easily reduce the system performance by 3-4 BLEU points.

...read moreread less

Proceedings Article•DOI•

Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation

[...]

Pamela Shapiro, Kevin Duh

01 Jun 2019

TL;DR: This work explores under which conditions it is beneficial to perform dialect identification for Arabic neural machine translation versus using a general system for all dialects.

...read moreread less

Abstract: When translating diglossic languages such as Arabic, situations may arise where we would like to translate a text but do not know which dialect it is. A traditional approach to this problem is to design dialect identification systems and dialect-specific machine translation systems. However, under the recent paradigm of neural machine translation, shared multi-dialectal systems have become a natural alternative. Here we explore under which conditions it is beneficial to perform dialect identification for Arabic neural machine translation versus using a general system for all dialects.

...read moreread less

Posted Content•

Query Expansion for Cross-Language Question Re-Ranking.

[...]

Muhammad Mahbubur Rahman, Sorami Hisamoto, Kevin Duh

16 Apr 2019-arXiv: Information Retrieval

TL;DR: This work investigates expansions based on Word Embeddings, DBpedia concepts linking, and Hypernym, and shows that they outperform existing state-of-the-art methods on the cross-language question re-ranking shared task.

...read moreread less

Abstract: Community question-answering (CQA) platforms have become very popular forums for asking and answering questions daily. While these forums are rich repositories of community knowledge, they present challenges for finding relevant answers and similar questions, due to the open-ended nature of informal discussions. Further, if the platform allows questions and answers in multiple languages, we are faced with the additional challenge of matching cross-lingual information. In this work, we focus on the cross-language question re-ranking shared task, which aims to find existing questions that may be written in different languages. Our contribution is an exploration of query expansion techniques for this problem. We investigate expansions based on Word Embeddings, DBpedia concepts linking, and Hypernym, and show that they outperform existing state-of-the-art methods.

...read moreread less

Posted Content•

Membership Inference Attacks on Sequence-to-Sequence Models

[...]

Sorami Hisamoto, Matt Post, Kevin Duh

11 Apr 2019-arXiv: Learning

TL;DR: In this article, the problem of membership inference for sequence-to-sequence models is investigated in the context of machine translation and video captioning, and an open dataset based on state-of-the-art machine translation models is provided.

...read moreread less

Abstract: Data privacy is an important issue for "machine learning as a service" providers. We focus on the problem of membership inference attacks: given a data sample and black-box access to a model's API, determine whether the sample existed in the model's training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.

...read moreread less

Proceedings Article•DOI•

An Interactive Teaching Tool for Introducing Novices to Machine Translation

[...]

Huda Khayrallah¹, Rebecca Knowles¹, Kevin Duh¹, Matt Post¹•Institutions (1)

Johns Hopkins University¹

22 Feb 2019

TL;DR: This work presents a hands-on activity in which students build and evaluate their own MT systems using curated parallel texts, and gains intuition about why early MT research took this approach, where it fails, and what features of language make MT a challenging problem even today.

...read moreread less

Abstract: The first step in the research process is developing an understanding of the problem at hand. Novices may be interested in learning about machine translation (MT), but often lack experience and intuition about the task of translation (either by human or machine) and its challenges. The goal of this work is to allow students to interactively discover why MT is an open problem, and encourage them to ask questions, propose solutions, and test intuitions. We present a hands-on activity in which students build and evaluate their own MT systems using curated parallel texts. By having students hand-engineer MT system rules in a simple user interface, which they can then run on real data, they gain intuition about why early MT research took this approach, where it fails, and what features of language make MT a challenging problem even today. Developing translation rules typically strikes novices as an obvious approach that should succeed, but the idea quickly struggles in the face of natural language complexity. This interactive, intuition-building exercise can be augmented by a discussion of state-of-the-art MT techniques and challenges, focusing on areas or aspects of linguistic complexity that the students found difficult. We envision this lesson plan being used in the framework of a larger AI or natural language processing course (where only a small amount of time can be dedicated to MT) or as a standalone activity. We describe and release the tool that supports this lesson, as well as accompanying data.

...read moreread less

Character-Aware Decoder for Translation into Morphologically Rich Languages

[...]

Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn

01 Aug 2019

TL;DR: This article proposed a character-aware decoder to capture lower-level patterns of morphology when translating into morphologically rich languages by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional neural networks that operate on the spelling of a word.

...read moreread less

Abstract: Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology. We present a character-aware decoder designed to capture such patterns when translating into morphologically rich languages. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional neural networks that operate on the spelling of a word. To investigate performance on a wide variety of morphological phenomena, we translate English into 14 typologically diverse target languages using the TED multi-target dataset. In this low-resource setting, the character-aware decoder provides consistent improvements with BLEU score gains of up to $+3.05$. In addition, we analyze the relationship between the gains obtained and properties of the target language and find evidence that our model does indeed exploit morphological patterns.

...read moreread less

Showing papers by "Kevin Duh published in 2019"