Open AccessProceedings Article
An Unsupervised Model for Joint Phrase Alignment and Extraction
Graham Neubig,Taro Watanabe,Eiichiro Sumita,Shinsuke Mori,Tatsuya Kawahara +4 more
- pp 632-641
Reads0
Chats0
TLDR
An unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs) is presented, which matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.Abstract:
We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.read more
Citations
More filters
Proceedings ArticleDOI
Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)
Yusuke Oda,Hiroyuki Fudaba,Graham Neubig,Hideaki Hata,Sakriani Sakti,Tomoki Toda,Satoshi Nakamura +6 more
TL;DR: SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort.
Proceedings ArticleDOI
Grammatical error correction using neural machine translation
Zheng Yuan,Ted Briscoe +1 more
TL;DR: This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC) with a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task.
Proceedings ArticleDOI
An attentional model for speech translation without transcription
TL;DR: On the more challenging speech-to-word alignment task, the model nearly matches GIZA++’s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.
Proceedings ArticleDOI
Grammatical error correction using hybrid systems and type filtering
TL;DR: This research highlights the need to understand the role of language education in the development of bilingualism and the role that language education can play in this process.
Proceedings ArticleDOI
Artificial Error Generation with Machine Translation and Syntactic Patterns
TL;DR: Two alternative methods for artificially generating writing errors are investigated, treating error generation as a machine translation task, and a system for extracting textual patterns from an annotated corpus, which can be used to insert errors into grammatically correct sentences.
References
More filters
Proceedings ArticleDOI
Moses: Open Source Toolkit for Statistical Machine Translation
Philipp Koehn,Hieu Hoang,Alexandra Birch,Chris Callison-Burch,Marcello Federico,Nicola Bertoldi,Brooke Cowan,Wade Shen,C. Corbett Moran,Richard Zens,Chris Dyer,Ondrej Bojar,Alexandra Elena Constantin,Evan Herbst +13 more
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
Journal Article
The mathematics of statistical machine translation: parameter estimation
TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Proceedings ArticleDOI
Statistical phrase-based translation
TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.
Journal ArticleDOI
The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator
Jim Pitman,Marc Yor +1 more
TL;DR: The two-parameter Poisson-Dirichlet distribution with a single parameter is known as the size-biased random permutation (SBNP) as discussed by the authors, which was introduced by Engen in the context of species diversity and rediscovered by Perman and the authors in the study of excursions of Bessel processes.
Journal ArticleDOI
Hierarchical Phrase-Based Translation
TL;DR: A statistical machine translation model that uses hierarchical phrasesphrases that contain subphrasing that is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations is presented.