Showing papers on "Rule-based machine translation published in 2006"

PDF

Open Access

Journal Article•DOI•

A new version of 2-tuple fuzzy linguistic representation model for computing with words

[...]

Jin-Hsien Wang, Jongyun Hao¹•Institutions (1)

01 Nov 2006-IEEE Transactions on Fuzzy Systems

TL;DR: A new (proportional) 2-tuple fuzzy linguistic representation model for computing with words (CW), which is based on the concept of "symbolic proportion," which provides an opportunity to describe the initial linguistic information by members of a "continuous" linguistic scale domain which does not necessarily require the ordered linguistic terms of a linguistic variable being equidistant.

...read moreread less

Abstract: In this paper, we provide a new (proportional) 2-tuple fuzzy linguistic representation model for computing with words (CW), which is based on the concept of "symbolic proportion." This concept motivates us to represent the linguistic information by means of 2-tuples, which are composed by two proportional linguistic terms. For clarity and generality, we first study proportional 2-tuples under ordinal contexts. Then, under linguistic contexts and based on canonical characteristic values (CCVs) of linguistic labels, we define many aggregation operators to handle proportional 2-tuple linguistic information in a computational stage for CW without any loss of information. Our approach for this proportional 2-tuple fuzzy linguistic representation model deals with linguistic labels, which do not have to be symmetrically distributed around a medium label and without the traditional requirement of having "equal distance" between them. Moreover, this new model not only provides a space to allow a "continuous" interpolation of a sequence of ordered linguistic labels, but also provides an opportunity to describe the initial linguistic information by members of a "continuous" linguistic scale domain which does not necessarily require the ordered linguistic terms of a linguistic variable being equidistant. Meanwhile, under the assumption of equally informative (which is defined by a condition based on the concept of CCV), we show that our model reduces to Herrera and Mart/spl inodot//spl acute/nez's (translational) 2-tuple fuzzy linguistic representation model.

...read moreread less

467 citations

Proceedings Article•DOI•

Syntax Augmented Machine Translation via Chart Parsing

[...]

Andreas Zollmann¹, Ashish Venugopal¹•Institutions (1)

Carnegie Mellon University¹

08 Jun 2006

TL;DR: This work uses a target language parser to generate parse trees for each sentence on the target side of the bilingual training corpus, matching them with phrase table lattices built for the corresponding source sentence.

...read moreread less

Abstract: We present translation results on the shared task "Exploiting Parallel Texts for Statistical Machine Translation" generated by a chart parsing decoder operating on phrase tables augmented and generalized with target language syntactic categories. We use a target language parser to generate parse trees for each sentence on the target side of the bilingual training corpus, matching them with phrase table lattices built for the corresponding source sentence. Considering phrases that correspond to syntactic categories in the parse trees we develop techniques to augment (declare a syntactically motivated category for a phrase pair) and generalize (form mixed terminal and nonterminal phrases) the phrase table into a synchronous bilingual grammar. We present results on the French-to-English task for this workshop, representing significant improvements over the workshop's baseline system. Our translation system is available open-source under the GNU General Public License.

...read moreread less

347 citations

Proceedings Article•

Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature

[...]

Claudio Giuliano, Alberto Lavelli, Lorenza Romano

01 Apr 2006

TL;DR: An approach for extracting relations between entities from biomedical literature based solely on shallow linguistic information is proposed, which outperforms most of the previous methods based on syntactic and semantic information.

...read moreread less

Abstract: We propose an approach for extracting relations between entities from biomedical literature based solely on shallow linguistic information. We use a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We performed experiments on extracting gene and protein interactions from two different data sets. The results show that our approach outperforms most of the previous methods based on syntactic and semantic information.

...read moreread less

328 citations

Proceedings Article•DOI•

Learning for Semantic Parsing with Statistical Machine Translation

[...]

Yuk Wah Wong¹, Raymond J. Mooney¹•Institutions (1)

University of Texas at Austin¹

04 Jun 2006

TL;DR: It is shown that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

...read moreread less

Abstract: We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

...read moreread less

306 citations

Proceedings Article•

Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models

[...]

Mark Johnson¹, Thomas L. Griffiths², Sharon Goldwater³•Institutions (3)

Microsoft¹, University of California, Berkeley², Stanford University³

04 Dec 2006

TL;DR: This paper presents a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrates how several existing nonparametric Bayesian models can be expressed within this framework.

...read moreread less

Abstract: This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with "adaptors" that can induce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework.

...read moreread less

292 citations

Journal Article•DOI•

N-gram-based Machine Translation

[...]

José B. Mariòo, Rafael E. Banchs, Josep Maria Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa, Marta R. Costa-jussà - Show less +3 more

01 Dec 2006-Computational Linguistics

TL;DR: This article describes in detail an n-gram approach to statistical machine translation that consists of a log-linear combination of a translation model based on n- grams of bilingual units, which are referred to as tuples, along with four specific feature functions.

...read moreread less

Abstract: This article describes in detail an n-gram approach to statistical machine translation. This approach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS).

...read moreread less

285 citations

Proceedings Article•DOI•

TCS: a DSL for the specification of textual concrete syntaxes in model engineering

[...]

Frédéric Jouault¹, Jean Bézivin¹, Ivan Kurtev¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

22 Oct 2006

TL;DR: This work proposes a generative solution based on a DSL called TCS (Textual Concrete Syntax), which is used to automatically generate tools for model- to-text and text-to-model transformations.

...read moreread less

Abstract: Domain modeling promotes the description of various facets of information systems by a coordinated set of domain-specific languages (DSL). Some of them have visual/graphical and other may have textual concrete syntaxes. Model Driven Engineering (MDE) helps defining the concepts and relations of the domain by the way of metamodel elements. For visual languages, it is necessary to establish links between these concepts and relations on one side and visual symbols on the other side. Similarly, with textual languages it is necessary to establish links between metamodel elements and syntactic structures of the textual DSL. To successfully apply MDE in a wide range of domains we need tools for fast implementation of the expected growing number of DSLs. Regarding the textual syntax of DSLs, we believe that most current proposals for bridging the world of models (MDE) and the world of grammars (Grammarware) are not completely adapted to this need. We propose a generative solution based on a DSL called TCS (Textual Concrete Syntax). Specifications expressed in TCS are used to automatically generate tools for model-to-text and text-to-model transformations. The proposed approach is illustrated by a case study in the definition of a telephony language.

...read moreread less

270 citations

Proceedings Article•DOI•

Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora

[...]

Dragos Stefan Munteanu¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

17 Jul 2006

TL;DR: A novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora by analyzing potentially similar sentence pairs using a signal processing-inspired approach, which enables it to extract useful machine translation training data even from very non-Parallel corpora, which contain no parallel sentence pairs.

...read moreread less

Abstract: We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processing-inspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.

...read moreread less

207 citations

Proceedings Article•

Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment

[...]

Evgeny Matusov¹, Nicola Ueffing¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Apr 2006

TL;DR: A novel method for computing a consensus translation from the outputs of multiple machine translation (MT) systems by voting on a confusion network that produces pairwise word alignments of the original machine translation hypotheses with an enhanced statistical alignment algorithm that explicitly models word reordering.

...read moreread less

Abstract: This paper describes a novel method for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The outputs are combined and a possibly new translation hypothesis can be generated. Similarly to the well-established ROVER approach of (Fiscus, 1997) for combining speech recognition hypotheses, the consensus translation is computed by voting on a confusion network. To create the confusion network, we produce pairwise word alignments of the original machine translation hypotheses with an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole document of translations rather than a single sentence is taken into account to produce the alignment. The proposed alignment and voting approach was evaluated on several machine translation tasks, including a large vocabulary task. The method was also tested in the framework of multi-source and speech translation. On all tasks and conditions, we achieved significant improvements in translation quality, increasing e. g. the BLEU score by as much as 15% relative.

...read moreread less

193 citations

Proceedings Article•DOI•

Synchronous Binarization for Machine Translation

[...]

Hao Zhang¹, Liang Huang², Daniel Gildea¹, Kevin Knight³•Institutions (3)

University of Rochester¹, University of Pennsylvania², University of Southern California³

04 Jun 2006

TL;DR: A linear-time algorithm for factoring syntactic re-orderings by binarizing synchronous rules when possible is devised and it is shown that the resulting rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system.

...read moreread less

Abstract: Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages, and rules extracted from parallel corpora can be quite large. We devise a linear-time algorithm for factoring syntactic re-orderings by binarizing synchronous rules when possible and show that the resulting rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system.

...read moreread less

162 citations

Patent•

Language translation using a hybrid network of human and machine translators

[...]

John Shore, Ed Bice

12 Dec 2006

TL;DR: In this article, a Hybrid Distributed Network Language Translation (HDNLT) system is described, where a distributed network of human and machine translators communicate electronically and provide for the translation of material in source language.

...read moreread less

Abstract: A Hybrid Distributed Network Language Translation (HDNLT) system having a distributed network of human and machine translators that communicate electronically and provide for the translation of material in source language. Individual translators receive a reputation that reflects their translation competency, reliability and accuracy. An individual translator's reputation is adjusted dynamically with feedback from other translators and/or comparison of their translation results to translations from those with known high reputation and to the final translation results. Additionally, translations are produced statistically, first by breaking input source text into fragments, sending each fragment redundantly to a number of translators with varying levels of reputation. The results of these translations are assembled taking into account (giving weight to) the translator reputation of each translator, the statistical properties of the translation results, the statistical correlation of preferred results to target language fragments, the properties of the particular language and other relevant factors.

...read moreread less

Proceedings Article•DOI•

Continuous Space Language Models for Statistical Machine Translation

[...]

Holger Schwenk, Daniel Déchelotte, Jean-Luc Gauvain

17 Jul 2006

TL;DR: This work proposes to use a new statistical language model that is based on a continuous representation of the words in the vocabulary, which achieves consistent improvements in the BLEU score on the development and test data.

...read moreread less

Abstract: Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed, a standard word n-gram back-off language model is used in most systems. In this work, we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. We consider the translation of European Parliament Speeches. This task is part of an international evaluation organized by the TC-STAR project in 2006. The proposed method achieves consistent improvements in the BLEU score on the development and test data. We also present algorithms to improve the estimation of the language model probabilities when splitting long sentences into shorter chunks.

...read moreread less

Proceedings Article•DOI•

Distortion Models for Statistical Machine Translation

[...]

Yaser Al-Onaizan¹, Kishore Papineni¹•Institutions (1)

IBM¹

17 Jul 2006

TL;DR: A new distortion model is proposed that can be used with existing phrase-based SMT decoders to address n-gram language model limitations and a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments is proposed.

...read moreread less

Abstract: In this paper, we argue that n-gram language models are not sufficient to address word reordering required for Machine Translation. We propose a new distortion model that can be used with existing phrase-based SMT decoders to address those n-gram language model limitations. We present empirical results in Arabic to English Machine Translation that show statistically significant improvements when our proposed model is used. We also propose a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments.

...read moreread less

Proceedings Article•DOI•

Semi-Supervised Training for Statistical Word Alignment

[...]

Alexander Fraser¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

17 Jul 2006

TL;DR: A semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus is introduced.

...read moreread less

Abstract: We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality.

...read moreread less

Proceedings Article•DOI•

Effects of machine translation on collaborative work

[...]

Naomi Yamashita, Toru Ishida¹•Institutions (1)

Kyoto University¹

04 Nov 2006

TL;DR: Quantitative results combined with interview data show that lexical entrainment was disrupted in machine translation-mediated communication because echoing is disrupted by asymmetries in machine translations, and the process of shortening referring expressions is also disrupted.

...read moreread less

Abstract: Even though multilingual communities that use machine translation to overcome language barriers are increasing, we still lack a complete understanding of how machine translation affects communication. In this study, eight pairs from three different language communities--China, Korea, and Japan--worked on referential tasks in their shared second language (English) and in their native languages using a machine translation embedded chat system. Drawing upon prior research, we predicted differences in conversational efficiency and content, and in the shortening of referring expressions over trials. Quantitative results combined with interview data show that lexical entrainment was disrupted in machine translation-mediated communication because echoing is disrupted by asymmetries in machine translations. In addition, the process of shortening referring expressions is also disrupted because the translations do not translate the same terms consistently throughout the conversation. To support natural referring behavior in machine translation-mediated communication, we need to resolve asymmetries and inconsistencies caused by machine translations.

...read moreread less

Journal Article•DOI•

The ATR Multilingual Speech-to-Speech Translation System

[...]

Satoshi Nakamura, Konstantin Markov, Hiromi Nakaiwa, Genichiro Kikui, Hisashi Kawai, Takatoshi Jitsuhiro, Jinsong Zhang, H. Yamamoto, Eiichiro Sumita, Seiichi Yamamoto - Show less +6 more

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages, uses a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations.

...read moreread less

Abstract: In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages (Japanese and Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, and text-to-speech synthesis. All of them are multilingual and are designed using state-of-the-art technologies developed at ATR. A corpus-based statistical machine learning framework forms the basis of our system design. We use a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations. Recent evaluation of the overall system showed that speech-to-speech translation quality is high, being at the level of a person having a Test of English for International Communication (TOEIC) score of 750 out of the perfect score of 990.

...read moreread less

Journal Article•DOI•

On the fusion of multi-granularity linguistic label sets in group decision making

[...]

Zhifeng Chen¹, David Ben-Arieh¹•Institutions (1)

Kansas State University¹

01 Nov 2006-Computers & Industrial Engineering

TL;DR: A new fusion approach for multi-granularity linguistic information for managing information assessed in different linguistic term sets with different granularity and/or semantic is presented.

...read moreread less

Journal Article•DOI•

Reading for Repetition and Reading for Translation: Do They Involve the Same Processes?.

[...]

Pedro Macizo¹, M. Teresa Bajo¹•Institutions (1)

University of Granada¹

01 Feb 2006-Cognition

TL;DR: In four experiments, translators or bilinguals read sentences for repetition or for translation, and a pattern of results provides support for horizontal theories of translation.

...read moreread less

Grammars of Space

[...]

Stephen C. Levinson, David P. Wilkins

01 Jan 2006

Patent•

Methods for speech-to-speech translation

[...]

Guillaume Proulx, Youssef Billawala, Elaine Drom, Farzad Ehsani, Yookyung Kim, Demitrios Master - Show less +2 more

04 Dec 2006

TL;DR: In this paper, the authors present modular speech-to-speech translation systems and methods that provide adaptable platforms to enable verbal communication between speakers of different languages within the context of specific domains.

...read moreread less

Abstract: The present invention disclose modular speech-to-speech translation systems and methods that provide adaptable platforms to enable verbal communication between speakers of different languages within the context of specific domains. The components of the preferred embodiments of the present invention includes: (1) speech recognition; (2) machine translation; (3) N-best merging module; (4) verification; and (5) text-to-speech. Characteristics of the speech recognition module here are that the modules are structured to provide N-best selections and multi-stream processing, where multiple speech recognition engines may be active at any one time. The N-best lists from the one or more speech recognition engines may be handled either separately or collectively to improve both recognition and translation results. A merge module is responsible for integrating the N-best outputs of the translation engines along with confidence/translation scores to create a ranked list or recognition-translation pairs.

...read moreread less

Proceedings Article•DOI•

Re-evaluating machine translation results with paraphrase support

[...]

Liang Zhou¹, Chin-Yew Lin¹, Eduard Hovy¹•Institutions (1)

University of Southern California¹

22 Jul 2006

TL;DR: ParaEval is presented, an automatic evaluation framework that uses paraphrases to improve the quality of machine translation evaluations and correlates significantly better than BLEU with human assessment in measurements for both fluency and adequacy.

...read moreread less

Abstract: In this paper, we present ParaEval, an automatic evaluation framework that uses paraphrases to improve the quality of machine translation evaluations. Previous work has focused on fixed n-gram evaluation metrics coupled with lexical identity matching. ParaEval addresses three important issues: support for paraphrase/synonym matching, recall measurement, and correlation with human judgments. We show that ParaEval correlates significantly better than BLEU with human assessment in measurements for both fluency and adequacy.

...read moreread less

Proceedings Article•

POS-based Word Reorderings for Statistical Machine Translation.

[...]

Maja Popović, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 May 2006

TL;DR: This work investigates new possibilities for improving the quality of statistical machine translation (SMT) by applying word reorderings of the source language sentences based on Part-of-Speech tags by proposing two types of reordering depending on the language pair and the translation direction: local re orderings of nouns and adjectives for translation from and into Spanish and long-range reorderments of verbs for translation into German.

...read moreread less

Abstract: Translation In this work we investigate new possibilities for improving the quality of statistical machine translation (SMT) by applying word reorderings of the source language sentences based on Part-of-Speech tags. Results are presented on the European Parliament corpus containing about 700k sentences and 15M running words. In order to investigate sparse training data scenarios, we also report results obtained on about 1\% of the original corpus. The source languages are Spanish and English and target languages are Spanish, English and German. We propose two types of reorderings depending on the language pair and the translation direction: local reorderings of nouns and adjectives for translation from and into Spanish and long-range reorderings of verbs for translation into German. For our best translation system, we achieve up to 2\% relative reduction of WER and up to 7\% relative increase of BLEU score. Improvements can be seen both on the reordered sentences as well as on the rest of the test corpus. Local reorderings are especially important for the translation systems trained on the small corpus whereas long-range reorderings are more effective for the larger corpus.

...read moreread less

Proceedings Article•DOI•

Combination of Arabic Preprocessing Schemes for Statistical Machine Translation

[...]

Fatiha Sadat¹, Nizar Habash²•Institutions (2)

National Research Council¹, Columbia University²

17 Jul 2006

TL;DR: This paper studies the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation and presents and evaluates different methods for combining pre processing schemes resulting in improved translation quality.

...read moreread less

Abstract: Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality.

...read moreread less

Patent•

E-services translation utilizing machine translation and translation memory

[...]

Shang-Che Cheng, Alexander Pressman, Hong Zhang, Pei Chiang Ma, Shuan Zhang, Jochen Hummel - Show less +2 more

08 May 2006

TL;DR: In this article, a system and method for translating data from a source language to a target language is provided wherein machine generated target translation of a source sentence is compared to a database of human generated target sentences.

...read moreread less

Abstract: A system and method for translating data from a source language to a target language is provided wherein machine generated target translation of a source sentence is compared to a database of human generated target sentences. If a matching human generated target sentence is found, the human generated target sentence may be used instead of the machine generated sentence, since the human generated target sentence is more likely to be a well-formed sentence than the machine generated sentence. The system and method does not rely on a translation memory containing pairs of sentences in both source and target languages, and minimizes the reliance on a human translator to correct a translation generated by machine translation.

...read moreread less

Automated Dictionary Extraction for “Knowledge-Free” Example-Based Translation

[...]

Ralf D. Brown¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2006

TL;DR: With such an automatically-generated dictionary, the Example-Based Machine Translation system covers more of its input on unseen texts than the same system does when provided with a manually-created general-purpose dictionary and other knowledge sources.

...read moreread less

Abstract: An Example-Based Machine Translation system is supplied with a sentencealigned bilingual corpus, but no other knowledge sources. Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation. With such an automatically-generated dictionary, the system covers (with equivalent quality) more of its input on unseen texts than the same system does when provided with a manually-created general-purpose dictionary and other knowledge sources.

...read moreread less

Adding Linguistic Knowledge to a Lexical Example-Based Translation System

[...]

Ralf D. Brown¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2006

TL;DR: A modest investment of time on the order of two person-weeks adding linguistic knowledge reduces the required example text by a factor of six or more, while retaining comparable translation quality, which makes EBMT more attractive for so-called "low-density" languages for which little data is available.

...read moreread less

Abstract: Example-Based Machine Translation (EBMT) using partial exact matching against a database of translation examples has proven quite successful, but requires a large amount of pre-translated text in order to achieve broad coverage of unrestricted text. By adding linguistically tagged entries to the example base and permitting recursive matches that replace the matched text with the associated tag, substantial reductions in the required amount of pre-translated text can be achieved. A modest investment of time on the order of two person-weeks adding linguistic knowledge reduces the required example text by a factor of six or more, while retaining comparable translation quality. This reduction makes EBMT more attractive for so-called "low-density" languages for which little data is available.

...read moreread less

Patent•

Method and system for analyzing various languages and constructing language-independent semantic structures

[...]

Konstantin Anisimovich, Vladimir Selegey, Konstantin Zuev

10 Oct 2006

TL;DR: In this article, a method and computer system for analyzing sentences of various languages and constructing a language-independent semantic structure are provided, and exhaustive linguistic descriptions are created, and lexical, morphological, syntactic, and semantic analyses for one or more sentences of a natural or artificial language are performed.

...read moreread less

Abstract: A method and computer system for analyzing sentences of various languages and constructing a language-independent semantic structure are provided. On the basis of comprehensive knowledge about languages and semantics, exhaustive linguistic descriptions are created, and lexical, morphological, syntactic, and semantic analyses for one or more sentences of a natural or artificial language are performed. A computer system is also provided to implement, analyze and store various linguistic structures and to perform lexical, morphological, syntactic, and semantic analyses. As result, a generalized data structure, such as a semantic structure, is generated and used to describe the meaning of one or more sentences in language-independent form, applicable to automated abstracting, machine translation, control systems, Internet information retrieval, etc.

...read moreread less

Proceedings Article•DOI•

Attribute Grammar-Based Event Recognition and Anomaly Detection

[...]

Seong-Wook Joo¹, Rama Chellappa¹•Institutions (1)

University of Maryland, College Park¹

17 Jun 2006

TL;DR: This work proposes to use attribute grammars for recognizing normal events and detecting abnormal events in a video using an extension of the Earley parser that handles attributes and concurrent event threads.

...read moreread less

Abstract: We propose to use attribute grammars for recognizing normal events and detecting abnormal events in a video. Attribute grammars can describe constraints on features (attributes) in addition to the syntactic structure of the input. Events are recognized using an extension of the Earley parser that handles attributes and concurrent event threads. Abnormal events are detected when the input does not follow syntax of the grammar or the attributes do not satisfy the constraints in the attribute grammar to some degree. We demonstrate the effectiveness of our method for the task of recognizing normal events and detecting anomalies in a parking lot.

...read moreread less

Proceedings Article•

Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages

[...]

Mei Yang¹, Katrin Kirchhoff¹•Institutions (1)

University of Washington¹

01 Apr 2006

TL;DR: A backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level is proposed.

...read moreread less

Abstract: We propose a backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level. The model is evaluated on the Europarl corpus for German-English and FinnishEnglish translation and shows improvements over state-of-the-art phrase-based models.

...read moreread less

Proceedings Article•DOI•

Statistical Machine Reordering

[...]

Marta R. Costa-jussà¹, José A. R. Fonollosa¹•Institutions (1)

Polytechnic University of Catalonia¹

22 Jul 2006

TL;DR: statistical machine reordering (SMR) consists in using the powerful techniques developed for statistical machine translation to translate the source language into a reordered source language (S'), which allows for an improved translation into the target language (T).

...read moreread less

Abstract: Reordering is currently one of the most important problems in statistical machine translation systems. This paper presents a novel strategy for dealing with it: statistical machine reordering (SMR). It consists in using the powerful techniques developed for statistical machine translation (SMT) to translate the source language (S) into a reordered source language (S'), which allows for an improved translation into the target language (T). The SMT task changes from S2T to S'2T which leads to a monotonized word alignment and shorter translation units. In addition, the use of classes in SMR helps to infer new word reorderings. Experiments are reported in the EsEn WMT06 tasks and the ZhEn IWSLT05 task and show significant improvement in translation quality.

...read moreread less

Collapse