scispace - formally typeset
Search or ask a question

Showing papers by "Kevin Duh published in 2006"


Journal ArticleDOI
TL;DR: Four different approaches to morphology-based language modeling are presented, including a novel technique called factored language models, and results are presented for both rescoring and first-pass recognition experiments.

120 citations


Proceedings ArticleDOI
04 Jun 2006
TL;DR: This work develops dependency parsers for Arabic, English, Chinese, and Czech using Bayes Point Machines, a training algorithm which is as easy to implement as the perceptron yet competitive with large margin methods.
Abstract: We develop dependency parsers for Arabic, English, Chinese, and Czech using Bayes Point Machines, a training algorithm which is as easy to implement as the perceptron yet competitive with large margin methods. We achieve results comparable to state-of-the-art in English and Czech, and report the first directed dependency parsing accuracies for Arabic and Chinese. Given the multilingual nature of our experiments, we discuss some issues regarding the comparison of dependency parsers for different languages.

26 citations


Proceedings ArticleDOI
22 Jul 2006
TL;DR: It is demonstrated that lexicon learning is an important task in resource-poor domains and leads to significant improvements in tagging accuracy for dialectal Arabic.
Abstract: We investigate the problem of learning a part-of-speech (POS) lexicon for a resource-poor language, dialectal Arabic. Developing a high-quality lexicon is often the first step towards building a POS tagger, which is in turn the front-end to many NLP systems. We frame the lexicon acquisition problem as a transductive learning problem, and perform comparisons on three transductive algorithms: Transductive SVMs, Spectral Graph Transducers, and a novel Transductive Clustering method. We demonstrate that lexicon learning is an important task in resource-poor domains and leads to significant improvements in tagging accuracy for dialectal Arabic.

17 citations


01 Jan 2006
TL;DR: This article presented a multi-pass statistical phrase-based machine translation system for the Italian-English open-data track, which used heterogeneous data sources for training translation and language models, the use of several novel rescoring features in the second pass and exploiting N-best information for translation in the ASR-output condition.
Abstract: This paper describes the University of Washington’s submission to the IWSLT 2006 evaluation campaign. We present a multi-pass statistical phrase-based machine translation system for the Italian-English open-data track. The focus of our work was on the use of heterogeneous data sources for training translation and language models, the use of several novel rescoring features in the second pass, and exploiting N-best information for translation in the ASR-output condition. Results show mixed benefits of adding out-of-domain data and using N-best information and demonstrate improvements for some of the novel rescoring features.

3 citations


01 Jan 2006
TL;DR: The problem of learning a part-of-speech (POS) lexicon for resource-poor languages is investigated, and it is demonstrated that lexicon learning is an important task and leads to signicant improvements in tagging accuracy.
Abstract: We investigate the problem of learning a part-of-speech (POS) lexicon for resource-poor languages. Developing a high-quality lexicon is often the rst step towards building a POS tagger, which is in turn the front-end to many NLP systems. We frame the lexicon acquisition problem as a transductive learning problem, and perform comparisons on three transductive algorithms: Transductive SVMs, Spectral Graph Transducers, and a novel Transductive Clustering method. We test on two datasets: dialectal Arabic (a resource-poor language) and Wall Street Journal with articially limited training data. For dialectal Arabic, we demonstrate that lexicon learning is an important task and leads to signicant improvements in tagging accuracy. For Wall Street Journal, we observe that transductive learning does not necessary lead to improvements in lexicon accuracy and present some preliminary analyses of results.