Author

Steve Finch

Bio: Steve Finch is an academic researcher from West. The author has an hindex of 2, co-authored 2 publications receiving 123 citations.

Papers

PDF

Open Access

More filters

Report•DOI•

A statistical word-level translation model for comparable corpora

[...]

Mona Diab¹, Steve Finch²•Institutions (2)

University of Maryland, College Park¹, West²

12 Apr 2000

TL;DR: The preliminary results are >92% accurate, suggesting the feasibility of the model, and some improvements are needed and the model needs to undergo some improvements and should be tested cross linguistically before assessing its significance.

...read moreread less

Abstract: In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance.

...read moreread less

114 citations

A statistical translation model using comparable corpora.

[...]

Mona Diab, Steve Finch

01 Jan 2000

9 citations

Cited by

PDF

Open Access

More filters

Book•

Statistical Machine Translation

[...]

Philipp Koehn

18 Jan 2010

TL;DR: This introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator, such as Google Language Tools and Babelfish, and the companion website provides open-source corpora and tool-kits.

...read moreread less

Abstract: This introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator, such as Google Language Tools and Babelfish. In general, statistical techniques allow automatic translation systems to be built quickly for any language-pair using only translated texts and generic software. With increasing globalization, statistical machine translation will be central to communication and commerce. Based on courses and tutorials, and classroom-tested globally, it is ideal for instruction or self-study, for advanced undergraduates and graduate students in computer science and/or computational linguistics, and researchers in natural language processing. The companion website provides open-source corpora and tool-kits.

...read moreread less

1,538 citations

Proceedings Article•DOI•

A Syntax-based Statistical Translation Model

[...]

Kenji Yamada¹, Kevin Knight¹•Institutions (1)

University of Southern California¹

06 Jul 2001

TL;DR: This model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node, and produces word alignments that are better than those produced by IBM Model 5.

...read moreread less

Abstract: We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5.

...read moreread less

924 citations

Journal Article•DOI•

Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

[...]

Dragos Stefan Munteanu¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

01 Dec 2005-Computational Linguistics

TL;DR: A maximum entropy classifier is trained that, given a pair of sentences, can reliably determine whether or not they are translations of each other and can be applied with great benefit to language pairs for which only scarce resources are available.

...read moreread less

Abstract: We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.

...read moreread less

471 citations

Proceedings Article•DOI•

An Unsupervised Method for Word Sense Tagging using Parallel Corpora

[...]

Mona Diab¹, Philip Resnik¹•Institutions (1)

University of Maryland, College Park¹

06 Jul 2002

TL;DR: An unsupervised method for word sense disambiguation that exploits translation correspondences in parallel corpora is presented, using pseudo-translations, created by machine translation systems, in order to make possible the evaluation of the approach against a standard test set.

...read moreread less

Abstract: We present an unsupervised method for word sense disambiguation that exploits translation correspondences in parallel corpora. The technique takes advantage of the fact that cross-language lexicalizations of the same concept tend to be consistent, preserving some core element of its semantics, and yet also variable, reflecting differing translator preferences and the influence of context. Working with parallel corpora introduces an extra complication for evaluation, since it is difficult to find a corpus that is both sense tagged and parallel with another language; therefore we use pseudo-translations, created by machine translation systems, in order to make possible the evaluation of the approach against a standard test set. The results demonstrate that word-level translation correspondences are a valuable source of information for sense disambiguation.

...read moreread less

250 citations

Proceedings Article•DOI•

Learning a Translation Lexicon from Monolingual Corpora

[...]

Philipp Koehn¹, Kevin Knight¹•Institutions (1)

University of Southern California¹

12 Jul 2002

TL;DR: This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora and combines various clues such as cognates, similar context, preservation of word similarity, and word frequency to create a German-English noun lexicon.

...read moreread less

Abstract: This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora. We combine various clues such as cognates, similar context, preservation of word similarity, and word frequency. Experimental results for the construction of a German-English noun lexicon are reported. Noun translation accuracy of 39% scored against a parallel test corpus could be achieved.

...read moreread less

245 citations

Collapse