Showing papers by "Kevin Duh published in 2015"

PDF

Open Access

Proceedings Article•DOI•

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

[...]

Xiaodong Liu¹, Jianfeng Gao², Xiaodong He², Li Deng², Kevin Duh¹, Ye-Yi Wang² - Show less +2 more•Institutions (2)

Nara Institute of Science and Technology¹, Microsoft²

01 May 2015

TL;DR: This work develops a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains.

...read moreread less

Abstract: Methods of deep neural networks (DNNs) have recently demonstrated superior performance on a number of natural language processing tasks. However, in most previous work, the models are learned based on either unsupervised objectives, which does not directly optimize the desired task, or singletask supervised objectives, which often suffer from insufficient training data. We develop a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains. Our multi-task DNN approach combines tasks of multiple-domain classification (for query classification) and information retrieval (ranking for web search), and demonstrates significant gains over strong baselines in a comprehensive set of domain adaptation.

...read moreread less

436 citations

Posted Content•

Depth-Gated Recurrent Neural Networks

[...]

Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, Chris Dyer - Show less +1 more

16 Aug 2015

TL;DR: This short note presents an extension of LSTM to use a depth gate to connect memory cells of adjacent layers, which introduces a linear dependence between lower and upper recurrent units.

...read moreread less

Abstract: In this short note, we present an extension of long short-term memory (LSTM) neural networks to using a depth gate to connect memory cells of adjacent layers. Doing so introduces a linear dependence between lower and upper layer recurrent units. Importantly, the linear dependence is gated through a gating function, which we call depth gate. This gate is a function of the lower layer memory cell, the input to and the past memory cell of this layer. We conducted experiments and verified that this new architecture of LSTMs was able to improve machine translation and language modeling performances.

...read moreread less

64 citations

Posted Content•

Depth-Gated LSTM

[...]

Kaisheng Yao¹, Trevor Cohn², Katerina Vylomova³, Kevin Duh, Chris Dyer - Show less +1 more•Institutions (3)

Microsoft¹, University of Melbourne², Carnegie Mellon University³

16 Aug 2015-arXiv: Neural and Evolutionary Computing

TL;DR: This short note presents an extension of long short-term memory (LSTM) neural networks to using a depth gate to connect memory cells of adjacent layers to introduce a linear dependence between lower and upper layer recurrent units.

...read moreread less

56 citations

Patent•

Representation Learning Using Multi-Task Deep Neural Networks

[...]

Jianfeng Gao¹, Li Deng¹, Xiaodong He¹, Ye-Yi Wang¹, Kevin Duh¹, Xiaodong Liu¹ - Show less +2 more•Institutions (1)

Microsoft¹

28 Jul 2015

TL;DR: In this article, a system may comprise one or more processors and memory storing instructions that can be configured to perform a number of operations or tasks, such as receiving a query or a document and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks.

...read moreread less

Abstract: A system may comprise one or more processors and memory storing instructions that, when executed by one or more processors, configure one or more processors to perform a number of operations or tasks, such as receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks.

...read moreread less

44 citations

Proceedings Article•DOI•

Joint Case Argument Identification for Japanese Predicate Argument Structure Analysis

[...]

Hiroki Ouchi¹, Hiroyuki Shindo¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jul 2015

TL;DR: New methods for Japanese PAS analysis to jointly identify case arguments of all predicates in a sentence are proposed by modeling multiple PAS interactions with a bipartite graph and approximately searching optimal PAS combinations.

...read moreread less

Abstract: Existing methods for Japanese predicate argument structure (PAS) analysis identify case arguments of each predicate without considering interactions between the target PAS and others in a sentence. However, the argument structures of the predicates in a sentence are semantically related to each other. This paper proposes new methods for Japanese PAS analysis to jointly identify case arguments of all predicates in a sentence by (1) modeling multiple PAS interactions with a bipartite graph and (2) approximately searching optimal PAS combinations. Performing experiments on the NAIST Text Corpus, we demonstrate that our joint analysis methods substantially outperform a strong baseline and are comparable to previous work.

...read moreread less

21 citations

Proceedings Article•DOI•

Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy

[...]

Takafumi Moriya¹, Tomohiro Tanaka¹, Takahiro Shinozaki¹, Shinji Watanabe², Kevin Duh³ - Show less +1 more•Institutions (3)

Tokyo Institute of Technology¹, Mitsubishi Electric Research Laboratories², Nara Institute of Science and Technology³

01 Dec 2015

TL;DR: This paper proposes to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization, which optimizes systems to achieve both high-accuracy and compact model size.

...read moreread less

Abstract: When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.

...read moreread less

20 citations

Journal Article•DOI•

Multilingual Topic Models for Bilingual Dictionary Extraction

[...]

Xiaodong Liu¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

12 Jun 2015

TL;DR: This article proposes a framework for extracting a bilingual dictionary from comparable corpora by exploiting a novel combination of topic modeling and word aligners such as the IBM models, and can extract higher precision dictionaries compared to previous approaches.

...read moreread less

Abstract: A machine-readable bilingual dictionary plays a crucial role in many natural language processing tasks, such as statistical machine translation and cross-language information retrieval. In this article, we propose a framework for extracting a bilingual dictionary from comparable corpora by exploiting a novel combination of topic modeling and word aligners such as the IBM models. Using a multilingual topic model, we first convert a comparable document-aligned corpus into a parallel topic-aligned corpus. This novel topic-aligned corpus is similar in structure to the sentence-aligned corpus frequently employed in statistical machine translation and allows us to extract a bilingual dictionary using a word alignment model.The main advantages of our framework is that (1) no seed dictionary is necessary for bootstrapping the process, and (2) multilingual comparable corpora in more than two languages can also be exploited. In our experiments on a large-scale Wikipedia dataset, we demonstrate that our approach can extract higher precision dictionaries compared to previous approaches and that our method improves further as we add more languages to the dataset.

...read moreread less

19 citations

Proceedings Article•DOI•

Multi-Target Machine Translation with Multi-Synchronous Context-free Grammars

[...]

Graham Neubig¹, Philip Arthur¹, Kevin Duh¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jan 2015

TL;DR: As a specific framework to realize multi-target translation, the formalism of synchronous context-free grammars to handle multiple targets is expanded, and methods for rule extraction, scoring, pruning, and search are described.

...read moreread less

Abstract: We propose a method for simultaneously translating from a single source language to multiple target languages T1, T2, etc. The motivation behind this method is that if we only have a weak language model for T1 and translations in T1 and T2 are associated, we can use the information from a strong language model over T2 to disambiguate the translations in T1, providing better translation results. As a specific framework to realize multi-target translation, we expand the formalism of synchronous context-free grammars to handle multiple targets, and describe methods for rule extraction, scoring, pruning, and search with these models. Experiments find that multi-target translation with a strong language model in a similar second target language can provide gains of up to 0.8-1.5 BLEU points. 1

...read moreread less

9 citations

Proceedings Article•DOI•

Crosslingual Annotation and Analysis of Implicit Discourse Connectives for Machine Translation

[...]

Frances Yung¹, Kevin Duh², Yuji Matsumoto¹•Institutions (2)

Nara Institute of Science and Technology¹, Microsoft²

01 Sep 2015

TL;DR: Results show that, without modifying the translation model, explicitating implicit relations in the input source text has limited effect on MT evaluation scores, and translation spotting analysis shows that it is crucial to identify DCs that should be explicitly translated in order to improve implicit-to-explicit DC translation.

...read moreread less

Abstract: Usage of discourse connectives (DCs) differs across languages, thus addition and omission of connectives are common in translation. We investigate how implicit (omitted) DCs in the source text impacts various machine translation (MT) systems, and whether a discourse parser is needed as a preprocessor to explicitate implicit DCs. Based on the manual annotation and alignment of 7266 pairs of discourse relations in a Chinese-English translation corpus, we evaluate whether a preprocessing step that inserts explicit DCs at positions of implicit relations can improve MT. Results show that, without modifying the translation model, explicitating implicit relations in the input source text has limited effect on MT evaluation scores. In addition, translation spotting analysis shows that it is crucial to identify DCs that should be explicitly translated in order to improve implicit-to-explicit DC translation. On the other hand, further analysis reveals that the disambiguation as well as explicitation of implicit relations are subject to a certain level of optionality, suggesting the limitation to learn and evaluate this linguistic phenomenon using standard parallel corpora.

...read moreread less

6 citations

Proceedings Article•DOI•

Synthetic Word Parsing Improves Chinese Word Segmentation

[...]

Fei Cheng¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jul 2015

TL;DR: A pipeline CWS system that first predicts this fine-grained segmentation, then chunks the output to reconstruct the original word segmentation standard, achieving competitive results on the PKU and MSR datasets.

...read moreread less

Abstract: We present a novel solution to improve the performance of Chinese word segmentation (CWS) using a synthetic word parser. The parser analyses the internal structure of words, and attempts to convert out-of-vocabulary words (OOVs) into in-vocabulary fine-grained sub-words. We propose a pipeline CWS system that first predicts this fine-grained segmentation, then chunks the output to reconstruct the original word segmentation standard. We achieve competitive results on the PKU and MSR datasets, with substantial improvements in OOV recall.

...read moreread less

6 citations

Journal Article•DOI•

A Hybrid Ranking Approach to Chinese Spelling Check

[...]

Xiaodong Liu¹, Fei Cheng¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

11 Nov 2015

TL;DR: A novel framework for Chinese Spelling Check (CSC), which is an automatic algorithm to detect and correct Chinese spelling errors, which uses both SMT and LM models as components of this framework for generating the correction candidates, in order to obtain maximum recall.

...read moreread less

Abstract: We propose a novel framework for Chinese Spelling Check (CSC), which is an automatic algorithm to detect and correct Chinese spelling errors. Our framework contains two key components: candidate generation and candidate ranking. Our framework differs from previous research, such as Statistical Machine Translation (SMT) based model or Language Model (LM) based model, in that we use both SMT and LM models as components of our framework for generating the correction candidates, in order to obtain maximum recall; to improve the precision, we further employ a Support Vector Machines (SVM) classifier to rank the candidates generated by the SMT and the LM. Experiments show that our framework outperforms other systems, which adopted the same or similar resources as ours in the SIGHAN 7 shared task; even comparing with the state-of-the-art systems, which used more resources, such as a considerable large dictionary, an idiom dictionary and other semantic information, our framework still obtains competitive results. Furthermore, to address the resource scarceness problem for training the SMT model, we generate around 2 million artificial training sentences using the Chinese character confusion sets, which include a set of Chinese characters with similar shapes and similar pronunciations, provided by the SIGHAN 7 shared task.

...read moreread less

Book Chapter•DOI•

An Improved Hierarchical Word Sequence Language Model Using Word Association

[...]

Xiaoyi Wu¹, Yuji Matsumoto¹, Kevin Duh¹, Hiroyuki Shindo¹•Institutions (1)

Nara Institute of Science and Technology¹

24 Nov 2015

TL;DR: The basic HWS approach is improved upon by generalizing it to exploit not only word frequencies but word association, and both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.

...read moreread less

Abstract: Language modeling is a fundamental research problem that has applications for many NLP tasks. For estimating probabilities, most research on language modeling uses n-gram approach to factor sentence probabilities. However, the assumption of n-gram is too simple to cope with the data sparseness problem, which affects the final performance of language models. At the point, Hierarchical Word Sequence abbreviated as HWS language model, which uses word frequency information to convert raw sentences into special n-gram sequences, can be viewed as an effective alternative to normal n-gram method. In this paper, we improve upon the basic HWS approach by generalizing it to exploit not only word frequencies but word association. For evaluation, we compare word association based HWS models to normal HWS models and normal n-gram models. Both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.

...read moreread less

Proceedings Article•DOI•

Sequential Annotation and Chunking of Chinese Discourse Structure

[...]

Frances Yung, Kevin Duh, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jul 2015

TL;DR: An annotated resource consisting of 325 articles in the Chinese Treebank is presented and a discourse chunker based on a cascade of classifiers is introduced to report 70% top-level discourse sense accuracy.

...read moreread less

Abstract: We propose a linguistically driven approach to represent discourse relations in Chinese text as sequences. We observe that certain surface characteristics of Chinese texts, such as the order of clauses, are overt markers of discourse structures, yet existing annotation proposals adapted from formalism constructed for English do not fully incorporate these characteristics. We present an annotated resource consisting of 325 articles in the Chinese Treebank. In addition, using this annotation, we introduce a discourse chunker based on a cascade of classifiers and report 70% top-level discourse sense accuracy.

...read moreread less