Showing papers by "Kevin Duh published in 2017"

PDF

Open Access

Posted Content•

DyNet: The Dynamic Neural Network Toolkit

[...]

15 Jan 2017-arXiv: Machine Learning

TL;DR: DyNet is a toolkit for implementing neural network models based on dynamic declaration of network structure that has an optimized C++ backend and lightweight graph representation and is designed to allow users to implement their models in a way that is idiomatic in their preferred programming language.

...read moreread less

Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet's dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet's speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit. DyNet is released open-source under the Apache 2.0 license and available at this http URL.

...read moreread less

434 citations

Posted Content•

Stochastic Answer Networks for Machine Reading Comprehension

[...]

Xiaodong Liu¹, Yelong Shen², Kevin Duh³, Jianfeng Gao²•Institutions (3)

Edinburgh Napier University¹, Microsoft², Johns Hopkins University³

10 Dec 2017-arXiv: Computation and Language

TL;DR: This work proposes a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension that achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset, the Adversarial SQuAD, and the Microsoft MAchine Reading COmprehensionDataset.

...read moreread less

Abstract: We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of steps, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading COmprehension Dataset (MS MARCO).

...read moreread less

159 citations

Journal Article•DOI•

Ordinal Common-sense Inference

[...]

Sheng Zhang¹, Rachel Rudinger¹, Kevin Duh¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

05 Nov 2017-Transactions of the Association for Computational Linguistics

TL;DR: The authors proposed an automated common-sense inference based on an extension of recognizing textual entailment: predicting ordinal human responses on the subjective likelihood of an inference holding in a given context.

...read moreread less

Abstract: Humans have the capacity to draw common-sense inferences from natural language: various things that are likely but not certain to hold based on established discourse, and are rarely stated explicitly. We propose an evaluation of automated common-sense inference based on an extension of recognizing textual entailment: predicting ordinal human responses on the subjective likelihood of an inference holding in a given context. We describe a framework for extracting common-sense knowledge from corpora, which is then used to construct a dataset for this ordinal entailment task. We train a neural sequence-to-sequence model on this dataset, which we use to score and generate possible inferences. Further, we annotate subsets of previously established datasets via our ordinal annotation protocol in order to then analyze the distinctions between these and what we have constructed.

...read moreread less

127 citations

Proceedings Article•

Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework

[...]

Aaron Steven White¹, Pushpendre Rastogi¹, Kevin Duh¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

01 Nov 2017

TL;DR: A general strategy to automatically generate one or more sentential hypotheses based on an input sentence and pre-existing manual semantic annotations is presented, which enables us to probe a statistical RTE model’s performance on different aspects of semantics.

...read moreread less

Abstract: We propose to unify a variety of existing semantic classification tasks, such as semantic role labeling, anaphora resolution, and paraphrase detection, under the heading of Recognizing Textual Entailment (RTE). We present a general strategy to automatically generate one or more sentential hypotheses based on an input sentence and pre-existing manual semantic annotations. The resulting suite of datasets enables us to probe a statistical RTE model’s performance on different aspects of semantics. We demonstrate the value of this approach by investigating the behavior of a popular neural network RTE model.

...read moreread less

93 citations

Proceedings Article•

Low-Resource Named Entity Recognition with Cross-lingual, Character-Level Neural Conditional Random Fields

[...]

Ryan Cotterell¹, Kevin Duh¹•Institutions (1)

Johns Hopkins University¹

01 Nov 2017

TL;DR: This paper presents a transfer learning scheme, whereby character-level neural CRFs are trained to predict named entities for both high-resource languages and low- resource languages jointly, improving F1 by up to 9.8 points.

...read moreread less

Abstract: Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world’s languages it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low-resource languages jointly. Learning character representations for multiple related languages allows knowledge transfer from the high-resource languages to the low-resource ones, improving F1 by up to 9.8 points.

...read moreread less

68 citations

Proceedings Article•DOI•

MT/IE: Cross-lingual Open Information Extraction with Neural Sequence-to-Sequence Models

[...]

Sheng Zhang¹, Kevin Duh¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

01 Apr 2017

TL;DR: A joint solution with a neural sequence model is proposed, and it is shown that it outperforms the pipeline in a cross-lingual open information extraction setting by 1-4 BLEU and 0.5-0.8 F1.

...read moreread less

Abstract: Cross-lingual information extraction is the task of distilling facts from foreign language (e.g. Chinese text) into representations in another language that is preferred by the user (e.g. English tuples). Conventional pipeline solutions decompose the task as machine translation followed by information extraction (or vice versa). We propose a joint solution with a neural sequence model, and show that it outperforms the pipeline in a cross-lingual open information extraction setting by 1-4 BLEU and 0.5-0.8 F1.

...read moreread less

33 citations

Proceedings Article•

Neural Lattice Search for Domain Adaptation in Machine Translation

[...]

Huda Khayrallah¹, Gaurav Kumar¹, Kevin Duh¹, Matt Post¹, Philipp Koehn¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

01 Nov 2017

TL;DR: A stack-based lattice search algorithm for NMT is presented and it is shown that constraining its search space with lattices generated by phrase-based machine translation (PBMT) improves robustness.

...read moreread less

Abstract: Domain adaptation is a major challenge for neural machine translation (NMT). Given unknown words or new domains, NMT systems tend to generate fluent translations at the expense of adequacy. We present a stack-based lattice search algorithm for NMT and show that constraining its search space with lattices generated by phrase-based machine translation (PBMT) improves robustness. We report consistent BLEU score gains across four diverse domain adaptation tasks involving medical, IT, Koran, or subtitles texts.

...read moreread less

30 citations

Proceedings Article•

A Multi-task Learning Approach to Adapting Bilingual Word Embeddings for Cross-lingual Named Entity Recognition

[...]

Dingquan Wang¹, Nanyun Peng², Kevin Duh¹•Institutions (2)

Johns Hopkins University¹, University of Southern California²

01 Nov 2017

TL;DR: This work shows how to adapt bilingual word embeddings (BWE’s) to bootstrap a cross-lingual name-entity recognition (NER) system in a language with no labeled data to build a NER model for the target language.

...read moreread less

Abstract: We show how to adapt bilingual word embeddings (BWE’s) to bootstrap a cross-lingual name-entity recognition (NER) system in a language with no labeled data. We assume a setting where we are given a comparable corpus with NER labels for the source language only; our goal is to build a NER model for the target language. The proposed multi-task model jointly trains bilingual word embeddings while optimizing a NER objective. This creates word embeddings that are both shared between languages and fine-tuned for the NER task.

...read moreread less

19 citations

Proceedings Article•

An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

[...]

Yelong Shen¹, Xiaodong Liu², Kevin Duh³, Jianfeng Gao³•Institutions (3)

Microsoft¹, Edinburgh Napier University², Johns Hopkins University³

01 Nov 2017

TL;DR: This paper used reinforcement learning to dynamically control the number of turns of reading comprehension (RC) for question and answer generation. But their results were limited to the SQuAD and MS MARCO datasets.

...read moreread less

Abstract: Reading comprehension (RC) is a challenging task that requires synthesis of information across sentences and multiple turns of reasoning. Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets. The RC model is an end-to-end neural network with iterative attention, and uses reinforcement learning to dynamically control the number of turns. We find that multiple-turn reasoning outperforms single-turn reasoning for all question and answer types; further, we observe that enabling a flexible number of turns generally improves upon a fixed multiple-turn strategy. %across all question types, and is particularly beneficial to questions with lengthy, descriptive answers. We achieve results competitive to the state-of-the-art on these two datasets.

...read moreread less

17 citations

Journal Article•DOI•

Bayesian Analysis in Natural Language Processing

[...]

Kevin Duh¹•Institutions (1)

Johns Hopkins University¹

14 Dec 2017-Computational Linguistics

TL;DR: In this article, the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area are discussed. But they are partially borrowed from both machine learning and statistics and are partially developed ''in-house''.

...read moreread less

Abstract: Abstract Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate for various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples. We cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed \"in-house\" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Ba...

...read moreread less

16 citations

Proceedings Article•DOI•

The JHU Machine Translation Systems for WMT 2017

[...]

Shuoyang Ding, Huda Khayrallah, Philipp Koehn, Matt Post, Gaurav Kumar, Kevin Duh - Show less +2 more

01 Sep 2017

TL;DR: The efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018 are reported on.

...read moreread less

Abstract: We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018. We developed systems for German–English, English– German, and Russian–English. Our novel contributions are iterative back-translation and fine-tuning on test sets from prior years.

...read moreread less

Journal Article•DOI•

A Psycholinguistic Model for the Marking of Discourse Relations

[...]

Frances Yung¹, Kevin Duh², Taku Komura, Yuji Matsumoto¹•Institutions (2)

Nara Institute of Science and Technology¹, Johns Hopkins University²

31 Jan 2017

TL;DR: A psycholinguistic model is proposed that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express and quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance.

...read moreread less

Abstract: Discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but , or implicitly conveyed in natural language utterances. How speakers choose between the two options is a question that is not well understood. In this study, we propose a psycholinguistic model that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express. Our model is based on two information-theoretic frameworks: (1) the Rational Speech Acts model, which models the pragmatic interaction between language production and interpretation by Bayesian inference, and (2) the Uniform Information Density theory, which advocates that speakers adjust linguistic redundancy to maintain a uniform rate of information transmission. Specifically, our model quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms the state-of-the-art performance at predicting the presence of DCs (Patterson and Kehler, 2013), in addition to giving an explanatory account of the speaker’s choice.

...read moreread less

Posted Content•

Streaming Word Embeddings with the Space-Saving Algorithm

[...]

Chandler May, Kevin Duh, Benjamin Van Durme, Ashwin Lall

24 Apr 2017-arXiv: Computation and Language

TL;DR: A streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec is developed and the results are discussed, concluding they provide partial validation of the approach as a streaming replacement forword2vec.

...read moreread less

Abstract: We develop a streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec. We compare our streaming algorithm to word2vec empirically by measuring the cosine similarity between word pairs under each algorithm and by applying each algorithm in the downstream task of hashtag prediction on a two-month interval of the Twitter sample stream. We then discuss the results of these experiments, concluding they provide partial validation of our approach as a streaming replacement for word2vec. Finally, we discuss potential failure modes and suggest directions for future work.

...read moreread less

Proceedings Article•

Selective Decoding for Cross-lingual Open Information Extraction

[...]

Sheng Zhang¹, Kevin Duh¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

01 Nov 2017

TL;DR: A novel selective decoding mechanism is employed, which explicitly models the sequence labeling process as well as the sequence generation process on the decoder side, which significantly increases the performance on a Chinese-English cross-lingual open IE dataset.

...read moreread less

Abstract: Cross-lingual open information extraction is the task of distilling facts from the source language into representations in the target language. We propose a novel encoder-decoder model for this problem. It employs a novel selective decoding mechanism, which explicitly models the sequence labeling process as well as the sequence generation process on the decoder side. Compared to a standard encoder-decoder model, selective decoding significantly increases the performance on a Chinese-English cross-lingual open IE dataset by 3.87-4.49 BLEU and 1.91-5.92 F1. We also extend our approach to low-resource scenarios, and gain promising improvement.

...read moreread less

Skip-Prop: Representing Sentences with One Vector Per Proposition

[...]

Rachel Rudinger, Kevin Duh, Benjamin Van Durme

01 Jan 2017

TL;DR: It is demonstrated the feasibility of training skip-prop vectors, introducing a method adapted from skip-thought vectors, and compared with “one vector per sentence” and “ one vector per token” approaches.

...read moreread less

Abstract: We introduce the notion of a multi-vector sentence representation based on a “one vector per proposition” philosophy, which we term skip-prop vectors. By representing each predicate-argument structure in a complex sentence as an individual vector, skip-prop is (1) a response to empirical evidence that single-vector sentence representations degrade with sentence length, and (2) a representation that maintains a semantically useful level of granularity. We demonstrate the feasibility of training skip-prop vectors, introducing a method adapted from skip-thought vectors, and compare skip-prop with “one vector per sentence” and “one vector per token” approaches.

...read moreread less

Posted Content•

An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

[...]

Yelong Shen¹, Xiaodong Liu², Kevin Duh³, Jianfeng Gao³•Institutions (3)

Microsoft¹, Edinburgh Napier University², Johns Hopkins University³

09 Nov 2017-arXiv: Computation and Language

TL;DR: It is found that multiple- turn reasoning outperforms single-turn reasoning for all question and answer types; further, it is observed that enabling a flexible number of turns generally improves upon a fixed multiple-turn strategy.

...read moreread less

Journal Article•DOI•

Towards a Consistent Segmentation Level across Multiple Chinese Word Segmentation Corpora

[...]

Fei Cheng¹, Kevin Duh², Yuji Matsumoto¹•Institutions (2)

Nara Institute of Science and Technology¹, Johns Hopkins University²

01 Jan 2017

TL;DR: This paper proposes a simple strategy to transform two different Chinese word segmentation (CWS) corpora into a new consistent segmentation level, which enables easy extension of the training data size and achieves state-of-the-art segmentation recall and F-score on the PKU and MSR corpora.

...read moreread less

Abstract: One of the crucial problems facing current Chinese natural language processing (NLP) is the ambiguity of word boundaries, which raises many further issues, such as different word segmentation standards and the prevalence of out-of-vocabulary (OOV) words. We assume that such issues can be better handled if a consistent segmentation level is created among multiple corpora. In this paper, we propose a simple strategy to transform two different Chinese word segmentation (CWS) corpora into a new consistent segmentation level, which enables easy extension of the training data size. The extended data is verified to be highly consistent by 10-fold cross-validation. In addition, we use a synthetic word parser to analyze the internal structure information of the words in the extended training data to convert the data into a more fine-grained standard. Then we use two-stage Conditional Random Fields (CRFs) to perform finegrained segmentation and chunk the segments back to the original Peking University (PKU) or Microsoft Research (MSR) standard. Due to the extension of the training data and reduction of the OOV rate in the new fine-grained level, the proposed system achieves state-of-the-art segmentation recall and F-score on the PKU and MSR corpora.

...read moreread less

Proceedings Article•

CADET: Computer Assisted Discovery Extraction and Translation

[...]

Benjamin Van Durme¹, Thomas Lippincott², Kevin Duh¹, Deana Burchfield, Adam Poliak¹, Cash Costello¹, Tim Finin³, Scott Miller, James Mayfield¹, Philipp Koehn¹, Craig Harman¹, Dawn Lawrie⁴, Chandler May¹, Max Thomas, Annabelle Carrell¹, Julianne Chaloux, Tongfei Chen¹, Alex Comerford, Mark Dredze¹, Benjamin Glass, Shudong Hao⁵, Patrick Martin⁶, Pushpendre Rastogi¹, Rashmi Sankepally⁷, Travis Wolfe⁸, Ying-Ying Tran, Teddy Zhang⁹ - Show less +23 more•Institutions (9)

Johns Hopkins University¹, Columbia University², University of Maryland, Baltimore County³, Loyola University Maryland⁴, Beijing Forestry University⁵, Queen's University⁶, University of Maryland, College Park⁷, Google⁸, Rakuten⁹

27 Nov 2017

TL;DR: This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.

...read moreread less

Abstract: Computer Assisted Discovery Extraction and Translation (CADET) is a workbench for helping knowledge workers find, label, and translate documents of interest. It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users. This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.

...read moreread less

Journal Article•DOI•

Generalized Hierarchical Word Sequence Framework for Language Modeling

[...]

Xiaoyi Wu¹, Kevin Duh², Yuji Matsumoto¹•Institutions (2)

Nara Institute of Science and Technology¹, Johns Hopkins University²

15 Jun 2017-Journal of Information Processing

TL;DR: A generalized hierarchical word sequence framework, where different word association scores can be adopted to rearrange word sequences in a totally unsupervised fashion, to be considered as a better alternative for n-gram language models.

...read moreread less

Abstract: Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences, most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-gram models is not robust enough to cope with the data sparseness problem, which affects the final performance of language models. In this paper, we propose a generalized hierarchical word sequence framework, where different word association scores can be adopted to rearrange word sequences in a totally unsupervised fashion. Unlike the n-gram which factors sentence probability from left-to-right, our model factors using a more flexible strategy. For evaluation, we compare our rearranged word sequences to normal n-gram word sequences. Both intrinsic and extrinsic experiments verify that our language model can achieve better performance, proving that our method can be considered as a better alternative for n-gram language models.

...read moreread less