An Unsupervised Model for Joint Phrase Alignment and Extraction

Open AccessProceedings Article

An Unsupervised Model for Joint Phrase Alignment and Extraction

Graham Neubig, +4 more

- pp 632-641

Chats0

TLDR

An unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs) is presented, which matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

Abstract:

We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)

Yusuke Oda, +6 more

TL;DR: SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort.

...read moreread less

Proceedings ArticleDOI

Grammatical error correction using neural machine translation

Zheng Yuan, +1 more

TL;DR: This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC) with a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task.

...read moreread less

Proceedings ArticleDOI

An attentional model for speech translation without transcription

Long Duong, +5 more

TL;DR: On the more challenging speech-to-word alignment task, the model nearly matches GIZA++’s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.

...read moreread less

Proceedings ArticleDOI

Grammatical error correction using hybrid systems and type filtering

Mariano Felice, +4 more

TL;DR: This research highlights the need to understand the role of language education in the development of bilingualism and the role that language education can play in this process.

...read moreread less

Proceedings ArticleDOI

Artificial Error Generation with Machine Translation and Syntactic Patterns

Marek Rei, +3 more

TL;DR: Two alternative methods for artificially generating writing errors are investigated, treating error generation as a machine translation task, and a system for extracting textual patterns from an annotated corpus, which can be used to insert errors into grammatically correct sentences.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal Article

The mathematics of statistical machine translation: parameter estimation

Peter Fitzhugh Brown, +3 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.

...read moreread less

Proceedings ArticleDOI

Statistical phrase-based translation

Philipp Koehn, +2 more

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.

...read moreread less

Journal ArticleDOI

The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator

Jim Pitman, +1 more

- 01 Apr 1997 -

Annals of Probability

TL;DR: The two-parameter Poisson-Dirichlet distribution with a single parameter is known as the size-biased random permutation (SBNP) as discussed by the authors, which was introduced by Engen in the context of species diversity and rediscovered by Perman and the authors in the study of excursions of Bessel processes.

...read moreread less

Journal ArticleDOI

Hierarchical Phrase-Based Translation

David Chiang

- 01 Jun 2007 -

Computational Linguistics

TL;DR: A statistical machine translation model that uses hierarchical phrasesphrases that contain subphrasing that is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations is presented.

...read moreread less

Collapse

Computational Linguistics

The mathematics of statistical machine translation: parameter estimation

Peter Fitzhugh Brown, +3 more

- 01 Jun 1993 -

Computational Linguistics

An Unsupervised Model for Joint Phrase Alignment and Extraction

Citations

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)

Grammatical error correction using neural machine translation

An attentional model for speech translation without transcription

Grammatical error correction using hybrid systems and type filtering

Artificial Error Generation with Machine Translation and Syntactic Patterns

References

Moses: Open Source Toolkit for Statistical Machine Translation

The mathematics of statistical machine translation: parameter estimation

Statistical phrase-based translation

The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator

Hierarchical Phrase-Based Translation

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

Statistical phrase-based translation

Moses: Open Source Toolkit for Statistical Machine Translation

A systematic comparison of various statistical alignment models

The mathematics of statistical machine translation: parameter estimation