scispace - formally typeset
Open AccessProceedings Article

An Unsupervised Model for Joint Phrase Alignment and Extraction

Reads0
Chats0
TLDR
An unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs) is presented, which matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.
Abstract
We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)

TL;DR: SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort.
Proceedings ArticleDOI

Grammatical error correction using neural machine translation

TL;DR: This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC) with a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task.
Proceedings ArticleDOI

An attentional model for speech translation without transcription

TL;DR: On the more challenging speech-to-word alignment task, the model nearly matches GIZA++’s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.
Proceedings ArticleDOI

Grammatical error correction using hybrid systems and type filtering

TL;DR: This research highlights the need to understand the role of language education in the development of bilingualism and the role that language education can play in this process.
Proceedings ArticleDOI

Artificial Error Generation with Machine Translation and Syntactic Patterns

TL;DR: Two alternative methods for artificially generating writing errors are investigated, treating error generation as a machine translation task, and a system for extracting textual patterns from an annotated corpus, which can be used to insert errors into grammatically correct sentences.
References
More filters
Proceedings ArticleDOI

Moses: Open Source Toolkit for Statistical Machine Translation

TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
Journal Article

The mathematics of statistical machine translation: parameter estimation

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Proceedings ArticleDOI

Statistical phrase-based translation

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.
Journal ArticleDOI

The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator

Jim Pitman, +1 more
TL;DR: The two-parameter Poisson-Dirichlet distribution with a single parameter is known as the size-biased random permutation (SBNP) as discussed by the authors, which was introduced by Engen in the context of species diversity and rediscovered by Perman and the authors in the study of excursions of Bessel processes.
Journal ArticleDOI

Hierarchical Phrase-Based Translation

TL;DR: A statistical machine translation model that uses hierarchical phrasesphrases that contain subphrasing that is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations is presented.
Related Papers (5)