scispace - formally typeset
Book ChapterDOI

Robust Bilingual Word Alignment for Machine Aided Translation

Ido Dagan, +2 more
- pp 209-224
TLDR
Because word_align and char_align were designed to work robustly on texts that are smaller and more noisy than the Hansards, it has been possible to successfully deploy the programs at AT&T Language Line Services, a commercial translation service, to help them with difficult terminology.
Abstract
We have developed a new program called word_align for aligning parallel text, text such as the Canadian Hansards that are available in two or more languages. The program takes the output of char_align (Church, 1993), a robust alternative to sentence-based alignment programs, and applies word-level constraints using a version of Brown et al.’s Model 2 (Brown et al., 1993), modified and extended to deal with robustness issues. Word_align was tested on a subset of Canadian Hansards supplied by Simard (Simard et al., 1992). The combination of word_align plus char_align reduces the variance (average square error) by a factor of 5 over char_align alone. More importantly, because word_align and char_align were designed to work robustly on texts that are smaller and more noisy than the Hansards, it has been possible to successfully deploy the programs at AT&T Language Line Services, a commercial translation service, to help them with difficult terminology.

read more

Citations
More filters
Journal ArticleDOI

A systematic comparison of various statistical alignment models

TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.
Journal Article

Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

TL;DR: A novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and the concept of bilingual parsing with a variety of parallel corpus analysis applications are introduced.
Proceedings ArticleDOI

HMM-based word alignment in statistical translation

TL;DR: A new model for word alignment in statistical translation using a first-order Hidden Markov model for the word alignment problem as they are used successfully in speech recognition for the time alignment problem.
Proceedings ArticleDOI

Automatic Identification of Word Translations from Unrelated English and German Corpora

TL;DR: The current study, based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly.
Journal ArticleDOI

Translating collocations for bilingual lexicons: a statistical approach

TL;DR: A program named Champollion is described which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations, to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains.
References
More filters
Journal Article

The mathematics of statistical machine translation: parameter estimation

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Journal ArticleDOI

A statistical approach to machine translation

TL;DR: The application of the statistical approach to translation from French to English and preliminary results are described and the results are given.
Related Papers (5)