scispace - formally typeset
Search or ask a question

Showing papers by "Kevin Duh published in 2015"


Proceedings ArticleDOI
01 May 2015
TL;DR: This work develops a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains.
Abstract: Methods of deep neural networks (DNNs) have recently demonstrated superior performance on a number of natural language processing tasks. However, in most previous work, the models are learned based on either unsupervised objectives, which does not directly optimize the desired task, or singletask supervised objectives, which often suffer from insufficient training data. We develop a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains. Our multi-task DNN approach combines tasks of multiple-domain classification (for query classification) and information retrieval (ranking for web search), and demonstrates significant gains over strong baselines in a comprehensive set of domain adaptation.

436 citations


Posted Content
16 Aug 2015
TL;DR: This short note presents an extension of LSTM to use a depth gate to connect memory cells of adjacent layers, which introduces a linear dependence between lower and upper recurrent units.
Abstract: In this short note, we present an extension of long short-term memory (LSTM) neural networks to using a depth gate to connect memory cells of adjacent layers. Doing so introduces a linear dependence between lower and upper layer recurrent units. Importantly, the linear dependence is gated through a gating function, which we call depth gate. This gate is a function of the lower layer memory cell, the input to and the past memory cell of this layer. We conducted experiments and verified that this new architecture of LSTMs was able to improve machine translation and language modeling performances.

64 citations


Posted Content
TL;DR: This short note presents an extension of long short-term memory (LSTM) neural networks to using a depth gate to connect memory cells of adjacent layers to introduce a linear dependence between lower and upper layer recurrent units.
Abstract: In this short note, we present an extension of long short-term memory (LSTM) neural networks to using a depth gate to connect memory cells of adjacent layers. Doing so introduces a linear dependence between lower and upper layer recurrent units. Importantly, the linear dependence is gated through a gating function, which we call depth gate. This gate is a function of the lower layer memory cell, the input to and the past memory cell of this layer. We conducted experiments and verified that this new architecture of LSTMs was able to improve machine translation and language modeling performances.

56 citations


Patent
Jianfeng Gao1, Li Deng1, Xiaodong He1, Ye-Yi Wang1, Kevin Duh1, Xiaodong Liu1 
28 Jul 2015
TL;DR: In this article, a system may comprise one or more processors and memory storing instructions that can be configured to perform a number of operations or tasks, such as receiving a query or a document and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks.
Abstract: A system may comprise one or more processors and memory storing instructions that, when executed by one or more processors, configure one or more processors to perform a number of operations or tasks, such as receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks.

44 citations


Proceedings ArticleDOI
01 Jul 2015
TL;DR: New methods for Japanese PAS analysis to jointly identify case arguments of all predicates in a sentence are proposed by modeling multiple PAS interactions with a bipartite graph and approximately searching optimal PAS combinations.
Abstract: Existing methods for Japanese predicate argument structure (PAS) analysis identify case arguments of each predicate without considering interactions between the target PAS and others in a sentence. However, the argument structures of the predicates in a sentence are semantically related to each other. This paper proposes new methods for Japanese PAS analysis to jointly identify case arguments of all predicates in a sentence by (1) modeling multiple PAS interactions with a bipartite graph and (2) approximately searching optimal PAS combinations. Performing experiments on the NAIST Text Corpus, we demonstrate that our joint analysis methods substantially outperform a strong baseline and are comparable to previous work.

21 citations


Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper proposes to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization, which optimizes systems to achieve both high-accuracy and compact model size.
Abstract: When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.

20 citations


Journal ArticleDOI
12 Jun 2015
TL;DR: This article proposes a framework for extracting a bilingual dictionary from comparable corpora by exploiting a novel combination of topic modeling and word aligners such as the IBM models, and can extract higher precision dictionaries compared to previous approaches.
Abstract: A machine-readable bilingual dictionary plays a crucial role in many natural language processing tasks, such as statistical machine translation and cross-language information retrieval. In this article, we propose a framework for extracting a bilingual dictionary from comparable corpora by exploiting a novel combination of topic modeling and word aligners such as the IBM models. Using a multilingual topic model, we first convert a comparable document-aligned corpus into a parallel topic-aligned corpus. This novel topic-aligned corpus is similar in structure to the sentence-aligned corpus frequently employed in statistical machine translation and allows us to extract a bilingual dictionary using a word alignment model.The main advantages of our framework is that (1) no seed dictionary is necessary for bootstrapping the process, and (2) multilingual comparable corpora in more than two languages can also be exploited. In our experiments on a large-scale Wikipedia dataset, we demonstrate that our approach can extract higher precision dictionaries compared to previous approaches and that our method improves further as we add more languages to the dataset.

19 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: As a specific framework to realize multi-target translation, the formalism of synchronous context-free grammars to handle multiple targets is expanded, and methods for rule extraction, scoring, pruning, and search are described.
Abstract: We propose a method for simultaneously translating from a single source language to multiple target languages T1, T2, etc. The motivation behind this method is that if we only have a weak language model for T1 and translations in T1 and T2 are associated, we can use the information from a strong language model over T2 to disambiguate the translations in T1, providing better translation results. As a specific framework to realize multi-target translation, we expand the formalism of synchronous context-free grammars to handle multiple targets, and describe methods for rule extraction, scoring, pruning, and search with these models. Experiments find that multi-target translation with a strong language model in a similar second target language can provide gains of up to 0.8-1.5 BLEU points. 1

9 citations


Proceedings ArticleDOI
01 Sep 2015
TL;DR: Results show that, without modifying the translation model, explicitating implicit relations in the input source text has limited effect on MT evaluation scores, and translation spotting analysis shows that it is crucial to identify DCs that should be explicitly translated in order to improve implicit-to-explicit DC translation.
Abstract: Usage of discourse connectives (DCs) differs across languages, thus addition and omission of connectives are common in translation. We investigate how implicit (omitted) DCs in the source text impacts various machine translation (MT) systems, and whether a discourse parser is needed as a preprocessor to explicitate implicit DCs. Based on the manual annotation and alignment of 7266 pairs of discourse relations in a Chinese-English translation corpus, we evaluate whether a preprocessing step that inserts explicit DCs at positions of implicit relations can improve MT. Results show that, without modifying the translation model, explicitating implicit relations in the input source text has limited effect on MT evaluation scores. In addition, translation spotting analysis shows that it is crucial to identify DCs that should be explicitly translated in order to improve implicit-to-explicit DC translation. On the other hand, further analysis reveals that the disambiguation as well as explicitation of implicit relations are subject to a certain level of optionality, suggesting the limitation to learn and evaluate this linguistic phenomenon using standard parallel corpora.

6 citations


Proceedings ArticleDOI
01 Jul 2015
TL;DR: A pipeline CWS system that first predicts this fine-grained segmentation, then chunks the output to reconstruct the original word segmentation standard, achieving competitive results on the PKU and MSR datasets.
Abstract: We present a novel solution to improve the performance of Chinese word segmentation (CWS) using a synthetic word parser. The parser analyses the internal structure of words, and attempts to convert out-of-vocabulary words (OOVs) into in-vocabulary fine-grained sub-words. We propose a pipeline CWS system that first predicts this fine-grained segmentation, then chunks the output to reconstruct the original word segmentation standard. We achieve competitive results on the PKU and MSR datasets, with substantial improvements in OOV recall.

6 citations


Journal ArticleDOI
11 Nov 2015
TL;DR: A novel framework for Chinese Spelling Check (CSC), which is an automatic algorithm to detect and correct Chinese spelling errors, which uses both SMT and LM models as components of this framework for generating the correction candidates, in order to obtain maximum recall.
Abstract: We propose a novel framework for Chinese Spelling Check (CSC), which is an automatic algorithm to detect and correct Chinese spelling errors. Our framework contains two key components: candidate generation and candidate ranking. Our framework differs from previous research, such as Statistical Machine Translation (SMT) based model or Language Model (LM) based model, in that we use both SMT and LM models as components of our framework for generating the correction candidates, in order to obtain maximum recall; to improve the precision, we further employ a Support Vector Machines (SVM) classifier to rank the candidates generated by the SMT and the LM. Experiments show that our framework outperforms other systems, which adopted the same or similar resources as ours in the SIGHAN 7 shared task; even comparing with the state-of-the-art systems, which used more resources, such as a considerable large dictionary, an idiom dictionary and other semantic information, our framework still obtains competitive results. Furthermore, to address the resource scarceness problem for training the SMT model, we generate around 2 million artificial training sentences using the Chinese character confusion sets, which include a set of Chinese characters with similar shapes and similar pronunciations, provided by the SIGHAN 7 shared task.

Book ChapterDOI
24 Nov 2015
TL;DR: The basic HWS approach is improved upon by generalizing it to exploit not only word frequencies but word association, and both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.
Abstract: Language modeling is a fundamental research problem that has applications for many NLP tasks. For estimating probabilities, most research on language modeling uses n-gram approach to factor sentence probabilities. However, the assumption of n-gram is too simple to cope with the data sparseness problem, which affects the final performance of language models. At the point, Hierarchical Word Sequence abbreviated as HWS language model, which uses word frequency information to convert raw sentences into special n-gram sequences, can be viewed as an effective alternative to normal n-gram method. In this paper, we improve upon the basic HWS approach by generalizing it to exploit not only word frequencies but word association. For evaluation, we compare word association based HWS models to normal HWS models and normal n-gram models. Both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.

Proceedings ArticleDOI
01 Jul 2015
TL;DR: An annotated resource consisting of 325 articles in the Chinese Treebank is presented and a discourse chunker based on a cascade of classifiers is introduced to report 70% top-level discourse sense accuracy.
Abstract: We propose a linguistically driven approach to represent discourse relations in Chinese text as sequences. We observe that certain surface characteristics of Chinese texts, such as the order of clauses, are overt markers of discourse structures, yet existing annotation proposals adapted from formalism constructed for English do not fully incorporate these characteristics. We present an annotated resource consisting of 325 articles in the Chinese Treebank. In addition, using this annotation, we introduce a discourse chunker based on a cascade of classifiers and report 70% top-level discourse sense accuracy.