scispace - formally typeset
Open AccessPosted Content

A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

Reads0
Chats0
TLDR
A pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs and shows how the results can be used in the compilation of domain-specific noun phrases.
Abstract
We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1\% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.

read more

Citations
More filters
Journal ArticleDOI

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.
Proceedings Article

BabelNet: Building a Very Large Multilingual Semantic Network

TL;DR: A very large, wide-coverage multilingual semantic network that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia and Machine Translation is also applied to enrich the resource with lexical information for all languages.
Journal ArticleDOI

Translating collocations for bilingual lexicons: a statistical approach

TL;DR: A program named Champollion is described which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations, to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains.
Proceedings ArticleDOI

An IR Approach for Translating New Words from Nonparallel, Comparable Texts

Pascale Fung, +1 more
TL;DR: A new method which combines IR and NLP techniques to extract new word translation from automatically downloaded English-Chinese nonparallel newspaper texts is described.
Journal ArticleDOI

Models of translational equivalence among words

TL;DR: This article presents methods for biasing statistical translation models to reflect bitext properties, and shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs.
References
More filters
Proceedings ArticleDOI

A program for aligning sentences in bilingual corpora

TL;DR: This paper will describe a method and a program for aligning sentences based on a simple statistical model of character lengths, which uses the fact that longer sentences in one language tend to be translated into longer sentence in the other language, and that shorter sentences tend to been translated into shorter sentences.
Proceedings ArticleDOI

Aligning sentences in parallel corpora

TL;DR: This paper describes a statistical technique for aligning sentences with their translations in two parallel corpora and shows that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that it should be expected to achieve an accuracy of between 96% and 97%.
Journal Article

Text-translation alignment

TL;DR: An algorithm for aligning texts with their translations that is based only on internal evidence and appears to converge to the correct sentence alignment in only a few iterations is presented.
Proceedings ArticleDOI

An algorithm for finding noun phrase correspondences in bilingual corpora

TL;DR: The paper describes an algorithm that employs English and French text taggers to associate noun phrases in an aligned bilingual corpus and provides an alternative to other approaches for finding word correspondences, with the advantage that linguistic structure is incorporated.
Proceedings ArticleDOI

Aligning sentences in bilingual corpora using lexical information

TL;DR: A fast algorithm for aligning sentences with their translations in a bilingual corpus that constructs a simple statistical word-to-word translation model on the fly during alignment and finds the alignment that maximizes the probability of generating the corpus with this translation model.