Showing papers by "Kevin Duh published in 2018"

PDF

Open Access

Posted Content•

ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension.

[...]

Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, Benjamin Van Durme - Show less +2 more

30 Oct 2018-arXiv: Computation and Language

TL;DR: This work presents a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning, and demonstrates that the performance of state-of-the-art MRC systems fall far behind human performance.

...read moreread less

Abstract: We present a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning. Experiments on this dataset demonstrate that the performance of state-of-the-art MRC systems fall far behind human performance. ReCoRD represents a challenge for future research to bridge the gap between human and machine commonsense reading comprehension. ReCoRD is available at this http URL.

...read moreread less

252 citations

Proceedings Article•DOI•

Stochastic Answer Networks for Machine Reading Comprehension

[...]

Xiaodong Liu¹, Yelong Shen², Kevin Duh³, Jianfeng Gao²•Institutions (3)

Edinburgh Napier University¹, Microsoft², Johns Hopkins University³

01 Jul 2018

TL;DR: This paper proposed a stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension and achieved state-of-the-art performance on several reading comprehension tasks.

...read moreread less

Abstract: We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of steps, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading COmprehension Dataset (MS MARCO).

...read moreread less

150 citations

Posted Content•

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

[...]

Xuan Zhang, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J. Martindale, Paul McNamee, Kevin Duh, Marine Carpuat - Show less +5 more

02 Nov 2018-arXiv: Computation and Language

TL;DR: A probabilistic view of curriculum learning is adopted, which lets us flexibly evaluate the impact of curricula design, and an extensive exploration on a German-English translation task shows it is possible to improve convergence time at no loss in translation quality.

...read moreread less

Abstract: Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We adopt a probabilistic view of curriculum learning, which lets us flexibly evaluate the impact of curricula design, and perform an extensive exploration on a German-English translation task. Results show that it is possible to improve convergence time at no loss in translation quality. However, results are highly sensitive to the choice of sample difficulty criteria, curriculum schedule and other hyperparameters.

...read moreread less

87 citations

Proceedings Article•DOI•

Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation

[...]

Huda Khayrallah¹, Brian Thompson¹, Kevin Duh¹, Philipp Koehn•Institutions (1)

Johns Hopkins University¹

20 Jul 2018

TL;DR: This work adds an auxiliary term to the training objective during continued training that minimizes the cross entropy between the in-domain model’s output word distribution and that of the out-of- domain model to prevent the model‘s output from differing too much from the original out- of- domains model.

...read moreread less

Abstract: Supervised domain adaptation—where a large generic corpus and a smaller in-domain corpus are both available for training—is a challenge for neural machine translation (NMT). Standard practice is to train a generic model and use it to initialize a second model, then continue training the second model on in-domain data to produce an in-domain model. We add an auxiliary term to the training objective during continued training that minimizes the cross entropy between the in-domain model’s output word distribution and that of the out-of-domain model to prevent the model’s output from differing too much from the original out-of-domain model. We perform experiments on EMEA (descriptions of medicines) and TED (rehearsed presentations), initialized from a general domain (WMT) model. Our method shows improvements over standard continued training by up to 1.5 BLEU.

...read moreread less

66 citations

Posted Content•

Stochastic Answer Networks for Natural Language Inference

[...]

Xiaodong Liu¹, Kevin Duh¹, Jianfeng Gao²•Institutions (2)

Microsoft¹, Johns Hopkins University²

21 Apr 2018-arXiv: Computation and Language

TL;DR: A stochastic answer network (SAN) is proposed to explore multi-step inference strategies in Natural Language Inference and achieves the state-of-the-art results on three benchmarks.

...read moreread less

Abstract: We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference. Rather than directly predicting the results given the inputs, the model maintains a state and iteratively refines its predictions. Our experiments show that SAN achieves the state-of-the-art results on three benchmarks: Stanford Natural Language Inference (SNLI) dataset, MultiGenre Natural Language Inference (MultiNLI) dataset and Quora Question Pairs dataset.

...read moreread less

51 citations

Proceedings Article•DOI•

Cross-Lingual Learning-to-Rank with Shared Representations

[...]

Shota Sasaki, Shuo Sun¹, Shigehiko Schamoni², Kevin Duh¹, Kentaro Inui - Show less +1 more•Institutions (2)

Johns Hopkins University¹, Heidelberg University²

01 Jun 2018

TL;DR: A large-scale dataset derived from Wikipedia is introduced to support CLIR research in 25 languages and a simple yet effective neural learning-to-rank model is presented that shares representations across languages and reduces the data requirement.

...read moreread less

Abstract: Cross-lingual information retrieval (CLIR) is a document retrieval task where the documents are written in a language different from that of the user’s query. This is a challenging problem for data-driven approaches due to the general lack of labeled training data. We introduce a large-scale dataset derived from Wikipedia to support CLIR research in 25 languages. Further, we present a simple yet effective neural learning-to-rank model that shares representations across languages and reduces the data requirement. This model can exploit training data in, for example, Japanese-English CLIR to improve the results of Swahili-English CLIR.

...read moreread less

49 citations

Proceedings Article•DOI•

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

[...]

Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Timothy R. Anderson, Philipp Koehn - Show less +6 more

01 Oct 2018

TL;DR: In this paper, the authors analyzed the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and considered each component's contribution to, and capacity for, domain adaptation.

...read moreread less

Abstract: To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component’s contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain.

...read moreread less

33 citations

Proceedings Article•DOI•

Audio-Visual Person Recognition in Multimedia Data From the Iarpa Janus Program

[...]

Gregory Sell¹, Kevin Duh¹, David Snyder¹, David Etter¹, Daniel Garcia-Romero¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

16 Apr 2018

TL;DR: An expansion of video data from the IARPA Janus program is introduced, which adds labels for voice to the already-existing face labels, as the Janus Multimedia dataset, and the power of audiovisual fusion is shown.

...read moreread less

Abstract: Currently, datasets that support audio-visual recognition of people in videos are scarce and limited. In this paper, we introduce an expansion of video data from the IARPA Janus program to support this research area. We refer to the expanded set, which adds labels for voice to the already-existing face labels, as the Janus Multimedia dataset. We first describe the speaker labeling process, which involved a combination of automatic and manual criteria. We then discuss two evaluation settings for this data. In the core condition, the voice and face of the labeled individual are present in every video. In the full condition, no such guarantee is made. The power of audiovisual fusion is then shown using these publicly-available videos and labels, showing significant improvement over only recognizing voice or face alone. In addition to this work, several other possible paths for future research with this dataset are discussed.

...read moreread less

27 citations

Proceedings Article•DOI•

Cross-lingual Decompositional Semantic Parsing.

[...]

Sheng Zhang¹, Xutai Ma², Rachel Rudinger¹, Kevin Duh¹, Benjamin Van Durme¹ - Show less +1 more•Institutions (2)

Johns Hopkins University¹, Facebook²

01 Jan 2018

TL;DR: A form of decompositional semantic analysis designed to allow systems to target varying levels of structural complexity (shallow to deep analysis), an evaluation metric to measure the similarity between system output and reference semantic analysis, and an end-to-end model with a novel annotating mechanism that supports intra-sentential coreference are presented.

...read moreread less

Abstract: We introduce the task of cross-lingual decompositional semantic parsing: mapping content provided in a source language into a decompositional semantic analysis based on a target language We present: (1) a form of decompositional semantic analysis designed to allow systems to target varying levels of structural complexity (shallow to deep analysis), (2) an evaluation metric to measure the similarity between system output and reference semantic analysis, (3) an end-to-end model with a novel annotating mechanism that supports intra-sentential coreference, and (4) an evaluation dataset on which our model outperforms strong baselines by at least 175 F1 score

...read moreread less

24 citations

Posted Content•

Stochastic Answer Networks for SQuAD 2.0

[...]

Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao - Show less +2 more

24 Sep 2018-arXiv: Computation and Language

TL;DR: An extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not, is presented.

...read moreread less

Abstract: This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not. The extended SAN contains two components: a span detector and a binary classifier for judging whether the question is unanswerable, and both components are jointly optimized. Experiments show that SAN achieves the results competitive to the state-of-the-art on Stanford Question Answering Dataset (SQuAD) 2.0. To facilitate the research on this field, we release our code: this https URL.

...read moreread less

24 citations

Proceedings Article•DOI•

The JHU Machine Translation Systems for WMT 2018

[...]

Philipp Koehn, Kevin Duh, Brian Thompson

01 Oct 2018

TL;DR: The efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018 are reported on.

...read moreread less

Abstract: We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018. We developed systems for German–English, English– German, and Russian–English. Our novel contributions are iterative back-translation and fine-tuning on test sets from prior years.

...read moreread less

Proceedings Article•DOI•

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

[...]

Pamela Shapiro, Kevin Duh

01 Jun 2018

TL;DR: It is found that word embeddings utilizing subword information consistently outperform standard word embedDings on a word similarity task and as initialization of the source word embeddeddings in a low-resource NMT system.

...read moreread less

Abstract: Neural machine translation has achieved impressive results in the last few years, but its success has been limited to settings with large amounts of parallel data. One way to improve NMT for lower-resource settings is to initialize a word-based NMT model with pretrained word embeddings. However, rare words still suffer from lower quality word embeddings when trained with standard word-level objectives. We introduce word embeddings that utilize morphological resources, and compare to purely unsupervised alternatives. We work with Arabic, a morphologically rich language with available linguistic resources, and perform Ar-to-En MT experiments on a small corpus of TED subtitles. We find that word embeddings utilizing subword information consistently outperform standard word embeddings on a word similarity task and as initialization of the source word embeddings in a low-resource NMT system.

...read moreread less

Proceedings Article•DOI•

Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds

[...]

Sheng Zhang¹, Kevin Duh¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

01 Apr 2018

TL;DR: This paper propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context (both document and sentence level information) than prior work, with further improvements gained by utilizing adaptive classification thresholds.

...read moreread less

Abstract: Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context – both document and sentence level information – than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.

...read moreread less

Posted Content•

BPE and CharCNNs for Translation of Morphology: A Cross-Lingual Comparison and Analysis

[...]

Pamela Shapiro, Kevin Duh

05 Sep 2018-arXiv: Computation and Language

TL;DR: This work argues for a reconsideration of the charCNN, based on cross-lingual improvements on low-resource data, and finds that in most cases, using both BPE and a charCNN performs best, while in Hebrew, using acharCNN over words is best.

...read moreread less

Abstract: Neural Machine Translation (NMT) in low-resource settings and of morphologically rich languages is made difficult in part by data sparsity of vocabulary words Several methods have been used to help reduce this sparsity, notably Byte-Pair Encoding (BPE) and a character-based CNN layer (charCNN) However, the charCNN has largely been neglected, possibly because it has only been compared to BPE rather than combined with it We argue for a reconsideration of the charCNN, based on cross-lingual improvements on low-resource data We translate from 8 languages into English, using a multi-way parallel collection of TED transcripts We find that in most cases, using both BPE and a charCNN performs best, while in Hebrew, using a charCNN over words is best

...read moreread less

Proceedings Article•DOI•

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

[...]

Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Timothy R. Anderson, Philipp Koehn - Show less +6 more

14 Sep 2018-arXiv: Computation and Language

TL;DR: It is found that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed.

...read moreread less

Abstract: To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain.

...read moreread less

Posted Content•

Character-Aware Decoder for Neural Machine Translation

[...]

Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn

06 Sep 2018-arXiv: Computation and Language

TL;DR: This work achieves character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder network with convolutional neural networks that operate on spelling of a word (or subword).

...read moreread less

Abstract: Standard neural machine translation (NMT) systems operate primarily on words, ignoring lower-level patterns of morphology. We present a character-aware decoder for NMT that can simultaneously work with both word-level and subword-level sequences which is designed to capture such patterns. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder network with convolutional neural networks that operate on spelling of a word (or subword). While character-aware embeddings have been successfully used in the source-side, we find that mixing character-aware embeddings with standard embeddings is crucial in the target-side. Furthermore, we show that a simple approximate softmax layer can be used for large target-side vocabularies which would otherwise require prohibitively large memory. We experiment on the TED multi-target dataset, translating English into 14 typologically diverse languages. We find that in this low-resource setting, the character-aware decoder provides consistent improvements over word-level and subword-level counterparts with BLEU score gains of up to +3.37.

...read moreread less

Posted Content•

Character-Aware Decoder for Translation into Morphologically Rich Languages

[...]

Adithya Renduchintala¹, Pamela Shapiro¹, Kevin Duh¹, Philipp Koehn¹•Institutions (1)

Johns Hopkins University¹

06 Sep 2018-arXiv: Computation and Language

TL;DR: This work achieves character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional neural networks that operate on the spelling of a word with evidence that the model does indeed exploit morphological patterns.

...read moreread less

Abstract: Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology. We present a character-aware decoder designed to capture such patterns when translating into morphologically rich languages. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional neural networks that operate on the spelling of a word. To investigate performance on a wide variety of morphological phenomena, we translate English into 14 typologically diverse target languages using the TED multi-target dataset. In this low-resource setting, the character-aware decoder provides consistent improvements with BLEU score gains of up to $+3.05$. In addition, we analyze the relationship between the gains obtained and properties of the target language and find evidence that our model does indeed exploit morphological patterns.

...read moreread less

Posted Content•

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

[...]

Shuoyang Ding, Kevin Duh

05 Jun 2018-arXiv: Computation and Language

TL;DR: It is suggested that pre-trained embeddings can be helpful if properly incorporated into NMT, especially when parallel data is limited or additional in-domain monolingual data is readily available.

...read moreread less

Abstract: Using pre-trained word embeddings as input layer is a common practice in many natural language processing (NLP) tasks, but it is largely neglected for neural machine translation (NMT). In this paper, we conducted a systematic analysis on the effect of using pre-trained source-side monolingual word embedding in NMT. We compared several strategies, such as fixing or updating the embeddings during NMT training on varying amounts of data, and we also proposed a novel strategy called dual-embedding that blends the fixing and updating strategies. Our results suggest that pre-trained embeddings can be helpful if properly incorporated into NMT, especially when parallel data is limited or additional in-domain monolingual data is readily available.

...read moreread less

Posted Content•

Cross-lingual Semantic Parsing

[...]

Sheng Zhang, Kevin Duh, Benjamin Van Durme

21 Apr 2018-arXiv: Computation and Language

TL;DR: A meaning representation designed to allow systems to target varying levels of structural complexity, an evaluation metric to measure the similarity between system output and reference meaning representations, an end-to-end model with a novel copy mechanism that supports intrasentential coreference and an evaluation dataset.

...read moreread less

Abstract: We introduce the task of cross-lingual semantic parsing: mapping content provided in a source language into a meaning representation based on a target language. We present: (1) a meaning representation designed to allow systems to target varying levels of structural complexity (shallow to deep analysis), (2) an evaluation metric to measure the similarity between system output and reference meaning representations, (3) an end-to-end model with a novel copy mechanism that supports intrasentential coreference, and (4) an evaluation dataset where experiments show our model outperforms strong baselines by at least 1.18 F1 score.

...read moreread less

Posted Content•

Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction.

[...]

Hongyuan Mei¹, Sheng Zhang¹, Kevin Duh², Benjamin Van Durme¹•Institutions (2)

Johns Hopkins University¹, Microsoft²

21 May 2018-arXiv: Computation and Language

TL;DR: In this paper, the local region of each hidden state of a neural model is enforced to only generate target tokens with the same semantic structure tag, thus yielding better generalization in both high and low resource settings.

...read moreread less

Abstract: Cross-lingual information extraction (CLIE) is an important and challenging task, especially in low resource scenarios. To tackle this challenge, we propose a training method, called Halo, which enforces the local region of each hidden state of a neural model to only generate target tokens with the same semantic structure tag. This simple but powerful technique enables a neural model to learn semantics-aware representations that are robust to noise, without introducing any extra parameter, thus yielding better generalization in both high and low resource settings.

...read moreread less

Proceedings Article•DOI•

Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction.

[...]

Hongyuan Mei¹, Sheng Zhang¹, Kevin Duh², Benjamin Van Durme¹•Institutions (2)

Johns Hopkins University¹, Microsoft²

01 May 2018

...read moreread less

Posted Content•

Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds.

[...]

Sheng Zhang¹, Kevin Duh¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

21 Apr 2018-arXiv: Computation and Language

TL;DR: This article propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context (both document and sentence level information) than prior work, and find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds.

...read moreread less

Abstract: Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context -- both document and sentence level information -- than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.

...read moreread less