A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

Open AccessPosted Content

A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

Pascale Fung

- 06 May 1995 -

arXiv: Computation and Language

Chats0

TLDR

A pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs and shows how the results can be used in the compilation of domain-specific noun phrases.

Abstract:

We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1\% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Roberto Navigli, +1 more

- 01 Dec 2012 -

Artificial Intelligence

TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.

...read moreread less

Proceedings Article

BabelNet: Building a Very Large Multilingual Semantic Network

Roberto Navigli, +1 more

TL;DR: A very large, wide-coverage multilingual semantic network that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia and Machine Translation is also applied to enrich the resource with lexical information for all languages.

...read moreread less

Journal ArticleDOI

Translating collocations for bilingual lexicons: a statistical approach

Frank Smadja, +2 more

- 01 Mar 1996 -

Computational Linguistics

TL;DR: A program named Champollion is described which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations, to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains.

...read moreread less

Proceedings ArticleDOI

An IR Approach for Translating New Words from Nonparallel, Comparable Texts

Pascale Fung, +1 more

TL;DR: A new method which combines IR and NLP techniques to extract new word translation from automatically downloaded English-Chinese nonparallel newspaper texts is described.

...read moreread less

Journal ArticleDOI

Models of translational equivalence among words

I. Dan Melamed

- 01 Jun 2000 -

Computational Linguistics

TL;DR: This article presents methods for biasing statistical translation models to reflect bitext properties, and shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

A program for aligning sentences in bilingual corpora

William A. Gale, +1 more

TL;DR: This paper will describe a method and a program for aligning sentences based on a simple statistical model of character lengths, which uses the fact that longer sentences in one language tend to be translated into longer sentence in the other language, and that shorter sentences tend to been translated into shorter sentences.

...read moreread less

Proceedings ArticleDOI

Aligning sentences in parallel corpora

Peter Fitzhugh Brown, +2 more

TL;DR: This paper describes a statistical technique for aligning sentences with their translations in two parallel corpora and shows that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that it should be expected to achieve an accuracy of between 96% and 97%.

...read moreread less

Journal Article

Text-translation alignment

Martin Kay, +1 more

- 01 Mar 1993 -

Computational Linguistics

TL;DR: An algorithm for aligning texts with their translations that is based only on internal evidence and appears to converge to the correct sentence alignment in only a few iterations is presented.

...read moreread less

Proceedings ArticleDOI

An algorithm for finding noun phrase correspondences in bilingual corpora

Julian M. Kupiec

TL;DR: The paper describes an algorithm that employs English and French text taggers to associate noun phrases in an aligned bilingual corpus and provides an alternative to other approaches for finding word correspondences, with the advantage that linguistic structure is incorporated.

...read moreread less

Proceedings ArticleDOI

Aligning sentences in bilingual corpora using lexical information

Stanley F. Chen

TL;DR: A fast algorithm for aligning sentences with their translations in a bilingual corpus that constructs a simple statistical word-to-word translation model on the fly during alignment and finds the alignment that maximizes the probability of generating the corpus with this translation model.

...read moreread less

A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

Citations

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

BabelNet: Building a Very Large Multilingual Semantic Network

Translating collocations for bilingual lexicons: a statistical approach

An IR Approach for Translating New Words from Nonparallel, Comparable Texts

Models of translational equivalence among words

References

A program for aligning sentences in bilingual corpora

Aligning sentences in parallel corpora

Text-translation alignment

An algorithm for finding noun phrase correspondences in bilingual corpora

Aligning sentences in bilingual corpora using lexical information

Related Papers (5)

The mathematics of statistical machine translation: parameter estimation

Identifying word correspondence in parallel texts

Translating collocations for bilingual lexicons: a statistical approach

Aligning sentences in parallel corpora

An IR Approach for Translating New Words from Nonparallel, Comparable Texts