Top 13 papers published by Kevin Duh from Johns Hopkins University in 2014

Proceedings Article•DOI•

A framework for analyzing semantic change of words across time

[...]

Adam Jatowt¹, Kevin Duh²•Institutions (2)

Kyoto University¹, Nara Institute of Science and Technology²

08 Sep 2014

TL;DR: An exploratory analysis aiming to investigate methods for studying and visualizing changes in word meaning over time, and proposes a framework for exploring semantic change at the lexical level, at the contrastive-pair level, and at the sentiment orientation level.

...read moreread less

Abstract: Recently, large amounts of historical texts have been digitized and made accessible to the public. Thanks to this, for the first time, it became possible to analyze evolution of language through the use of automatic approaches. In this paper, we show the results of an exploratory analysis aiming to investigate methods for studying and visualizing changes in word meaning over time. In particular, we propose a framework for exploring semantic change at the lexical level, at the contrastive-pair level, and at the sentiment orientation level. We demonstrate several kinds of NLP approaches that altogether give users deeper understanding of word evolution. We use two diachronic corpora that are currently the largest available historical language corpora. Our results indicate that the task is feasible and satisfactory outcomes can be already achieved by using simple approaches.

...read moreread less

118 citations

Proceedings Article•DOI•

On the Elements of an Accurate Tree-to-String Machine Translation System

[...]

Graham Neubig¹, Kevin Duh¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jun 2014

TL;DR: It is shown how a basic T2S system that performs on par with phrasebased systems can be improved by 2.6-4.6 BLEU, greatly exceeding existing state-of-the-art methods.

...read moreread less

Abstract: While tree-to-string (T2S) translation theoretically holds promise for efficient, accurate translation, in previous reports T2S systems have often proven inferior to other machine translation (MT) methods such as phrase-based or hierarchical phrase-based MT. In this paper, we attempt to clarify the reason for this performance gap by investigating a number of peripheral elements that affect the accuracy of T2S systems, including parsing, alignment, and search. Based on detailed experiments on the English-Japanese and JapaneseEnglish pairs, we show how a basic T2S system that performs on par with phrasebased systems can be improved by 2.6-4.6 BLEU, greatly exceeding existing stateof-the-art methods. These results indicate that T2S systems indeed hold much promise, but the above-mentioned elements must be taken seriously in construction of these systems.

...read moreread less

37 citations

Posted Content•

Incorporating Both Distributional and Relational Semantics in Word Representations

[...]

Daniel Fried¹, Kevin Duh²•Institutions (2)

University of Arizona¹, Nara Institute of Science and Technology²

14 Dec 2014-arXiv: Computation and Language

TL;DR: The authors investigate the hypothesis that word representations should incorporate both distributional and relational semantics, and employ the Alternating Direction Method of Multipliers (ADMM) to flexibly optimise a distributional objective on raw text and a relational objective on WordNet.

...read moreread less

Abstract: We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word representations trained on both objectives can give improvements in some cases.

...read moreread less

34 citations

Proceedings Article•DOI•

Improving Dependency Parsers with Supertags

[...]

Hiroki Ouchi¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Apr 2014

TL;DR: This paper develops two types of supertags that encode information about head position and dependency relations in different levels of granularity and proposes a transition-based dependency parser that incorporates the predictions from a CRF-based supertagger as new features.

...read moreread less

Abstract: Transition-based dependency parsing systems can utilize rich feature representations. However, in practice, features are generally limited to combinations of lexical tokens and part-of-speech tags. In this paper, we investigate richer features based on supertags, which represent lexical templates extracted from dependency structure annotated corpus. First, we develop two types of supertags that encode information about head position and dependency relations in different levels of granularity. Then, we propose a transition-based dependency parser that incorporates the predictions from a CRF-based supertagger as new features. On standard English Penn Treebank corpus, we show that our supertag features achieve parsing improvements of 1.3% in unlabeled attachment, 2.07% root attachment, and 3.94% in complete tree accuracy.

...read moreread less

22 citations

Proceedings Article•

Parsing Chinese Synthetic Words with a Character-based Dependency Model

[...]

Fei Cheng¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 May 2014

TL;DR: The usefulness of incorporating large unlabelled corpora and a dictionary for this task is demonstrated, and two synthetic word parsers significantly outperform the baseline (a pipeline method).

...read moreread less

Abstract: Synthetic word analysis is a potentially important but relatively unexplored problem in Chinese natural language processing. Two issues with the conventional pipeline methods involving word segmentation are (1) the lack of a common segmentation standard and (2) the poor segmentation performance on OOV words. These issues may be circumvented if we adopt the view of character-based parsing, providing both internal structures to synthetic words and global structure to sentences in a seamless fashion. However, the accuracy of synthetic word parsing is not yet satisfactory, due to the lack of research. In view of this, we propose and present experiments on several synthetic word parsers. Additionally, we demonstrate the usefulness of incorporating large unlabelled corpora and a dictionary for this task. Our parsers significantly outperform the baseline (a pipeline method).

...read moreread less

8 citations

Proceedings Article•

Incorporating Both Distributional and Relational Semantics in Word Representations

[...]

Daniel Fried¹, Kevin Duh²•Institutions (2)

University of Arizona¹, Nara Institute of Science and Technology²

14 Dec 2014

TL;DR: The authors investigate the hypothesis that word representations should incorporate both distributional and relational semantics, and employ the Alternating Direction Method of Multipliers (ADMM) to flexibly optimise a distributional objective on raw text and a relational objective on WordNet.

...read moreread less

Abstract: We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word representations trained on both objectives can give improvements in some cases.

...read moreread less

8 citations

The NAIST-NTT TED talk treebank.

[...]

Graham Neubig, Katsuhito Sudoh, Yusuke Oda, Kevin Duh, Hajime Tsukada, Masaaki Nagata - Show less +2 more

01 Jan 2014

6 citations

Journal Article•DOI•

Evaluating Translation Quality with Word Order Correlations

[...]

Tsutomu Hirao, Hideki Isozaki, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Nagata Masaaki - Show less +2 more

01 Jan 2014

4 citations

Proceedings Article•DOI•

Identifying collocations using cross-lingual association measures

[...]

Lis Pereira, Elga Strafella, Kevin Duh, Yuji Matsumoto

01 Apr 2014

TL;DR: A simple and effective crosslingual approach to identifying collocations based on the observation that true collocations, which cannot be translated word for word, will exhibit very different association scores before and after literal translation is introduced.

...read moreread less

Abstract: We introduce a simple and effective crosslingual approach to identifying collocations. This approach is based on the observation that true collocations, which cannot be translated word for word, will exhibit very different association scores before and after literal translation. Our experiments in Japanese demonstrate that our cross-lingual association measure can successfully exploit the combination of bilingual dictionary and large monolingual corpora, outperforming monolingual association measures.

...read moreread less

4 citations

Journal Article•DOI•

Creating Stories from Socially Curated Microblog Messages

[...]

Akisato Kimura¹, Kevin Duh², Tsutomu Hirao¹, Katsuhiko Ishiguro¹, Tomoharu Iwata¹, Albert Au Yeung - Show less +2 more•Institutions (2)

Nippon Telegraph and Telephone¹, Nara Institute of Science and Technology²

01 Jun 2014-IEICE Transactions on Information and Systems

TL;DR: An in-depth analysis of a large corpus of curated microblog data is performed and a novel method based on a learning-to-rank framework is proposed that increases the curator’s productivity and breadth of perspective by suggesting which novel microblogs should be added to the curated content.

...read moreread less

Abstract: SUMMARY Social media such as microblogs have become so pervasive such that it is now possible to use them as sensors for real-world events and memes. While much recent research has focused on developing automatic methods for filtering and summarizing these data streams, we explore a different trend called social curation. In contrast to automatic methods, social curation is characterized as a human-in-the-loop and sometimes crowd-sourced mechanism for exploiting social media as sensors. Although social curation web services like Togetter, Naver Matome and Storify are gaining popularity, little academic research has studied the phenomenon. In this paper, our goal is to investigate the phenomenon and potential of this new field of social curation. First, we perform an in-depth analysis of a large corpus of curated microblog data. We seek to understand why and how people participate in this laborious curation process. We then explore new ways in which information retrieval and machine learning technologies can be used to assist curators. In particular, we propose a novel method based on a learning-to-rank framework that increases the curator’s productivity and breadth of perspective by suggesting which novel microblogs should be added to the curated content.

...read moreread less

2 citations

NTT-NAIST Syntax-based SMT Systems for IWSLT 2014

[...]

Katsuhito Sudoh, Graham Neubig, Kevin Duh, Katsuhiko Hayashi

01 Jan 2014

TL;DR: This paper presents NTT-NAIST SMT systems for English-German and German-English MT tasks of the IWSLT 2014 evaluation campaign based on generalized minimum Bayes risk system system combination using the forest-to-string, syntactic preordering, and phrase-based translation formalisms.

...read moreread less

Abstract: This paper presents NTT-NAIST SMT systems for English-German and German-English MT tasks of the IWSLT 2014 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems using the forest-to-string, syntactic preordering, and phrase-based translation formalisms. Individual systems employ training data selection for domain adaptation, truecasing, compound word splitting (for GermanEnglish), interpolated n-gram language models, and hypotheses rescoring using recurrent neural network language models.

...read moreread less

Tree-to-String Machine Translation System

[...]

Graham Neubig, Kevin Duh

01 Jan 2014

TL;DR: It is shown how a basic T2S system that performs on par with phrasebased systems can be improved by 2.6-4.6 BLEU, greatly exceeding existing state-of-the-art methods.

...read moreread less

Abstract: While tree-to-string (T2S) translation theoretically holds promise for efficient, accurate translation, in previous reports T2S systems have often proven inferior to other machine translation (MT) methods such as phrase-based or hierarchical phrase-based MT. In this paper, we attempt to clarify the reason for this performance gap by investigating a number of peripheral elements that affect the accuracy of T2S systems, including parsing, alignment, and search. Based on detailed experiments on the English-Japanese and JapaneseEnglish pairs, we show how a basic T2S system that performs on par with phrasebased systems can be improved by 2.6-4.6 BLEU, greatly exceeding existing stateof-the-art methods. These results indicate that T2S systems indeed hold much promise, but the above-mentioned elements must be taken seriously in construction of these systems.

...read moreread less

Proceedings Article•DOI•

Analysis and Prediction of Unalignable Words in Parallel Text

[...]

Frances Yung¹, Kevin Duh¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Apr 2014

TL;DR: A simple and effective method to improve automatic word alignment by pre-removing unalignable words is proposed, and improvements on hierarchical MT systems in both translation directions are shown.

...read moreread less

Abstract: Professional human translators usually do not employ the concept of word alignments, producing translations ‘sense-forsense’ instead of ‘word-for-word’. This suggests that unalignable words may be prevalent in the parallel text used for machine translation (MT). We analyze this phenomenon in-depth for Chinese-English translation. We further propose a simple and effective method to improve automatic word alignment by pre-removing unalignable words, and show improvements on hierarchical MT systems in both translation directions. 1 Motivation It is generally acknowledged that absolute equivalence between two languages is impossible, since concept lexicalization varies across languages. Major translation theories thus argue that texts should be translated ‘sense-for-sense’ instead of ‘word-for-word’ (Nida, 1964). This suggests that unalignable words may be an issue for the parallel text used to train current statistical machine translation (SMT) systems. Although existing automatic word alignment methods have some mechanism to handle the lack of exact word-for-word alignment (e.g. null probabilities, fertility in the IBM models (Brown et al., 1993)), they may be too coarse-grained to model the ’sense-for-sense’ translations created by professional human translators.

...read moreread less

Showing papers by "Kevin Duh published in 2014"