scispace - formally typeset
Search or ask a question
Topic

Rule-based machine translation

About: Rule-based machine translation is a research topic. Over the lifetime, 8804 publications have been published within this topic receiving 240581 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.
Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

1,288 citations

Journal ArticleDOI
TL;DR: A statistical machine translation model that uses hierarchical phrasesphrases that contain subphrasing that is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations is presented.
Abstract: We present a statistical machine translation model that uses hierarchical phrases---phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntax-based translation and phrase-based translation. We describe our system's training and decoding methods in detail, and evaluate it for translation speed and translation accuracy. Using BLEU as a metric of translation accuracy, we find that our system performs significantly better than the Alignment Template System, a state-of-the-art phrase-based system.

1,265 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: A framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case and shows that a baseline statistical machinetranslation system is significantly improved using this approach.
Abstract: We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is significantly improved using this approach.

1,216 citations

Book
15 Jul 1970
TL;DR: This paper used nonlinguistic information from situational and behavioral context to infer the semantic intent of utterances in order to analyze the development of linguistic expression, and demonstrated the extent (and limitations) of the child's knowledge of basic grammatical relations in the earliest two-word utterances.
Abstract: The research reported is in investigation into the early acquisition of grammar by three children from the age of approximately 19 months. Nonlinguistic information from situational and behavioral context was used to infer the semantic intent of utterances in order to analyze the development of linguistic expression. Previous psycholinguistic studies of child language had described utterances in terms of the orderly distribution with which words occurred in juxtaposition. In this study, by making judgments of semantic intent, it was possible to describe the inherent structure of utterances so that conclusions could be drawn about the child's knowledge of semantic-syntactic relationship in the derivation of sentences. For example, when the child said "Mommy sock" and Mommy was putting the child's sock on the child, it was clear that a different semantic interpretation was intended than when the child said "Mommy sock" and picked up Mommy's sock. The syntactic components of generative transformational grammars were proposed for those samples of the children's language in which mean length of utterance was less than 1.5 morphemes.For the psychologist, the book provides added insight into the relative development of syntactic expression and underlying cognitive function. It was clear, for example, that the two did not develop hand in hand. For the linguist, the book provides additional evidence for the growing conclusion that child language is not incoherent. There is strong evidence presented to demonstrate the extent (and limitations) of the child's knowledge of basic grammatical relations in the earliest two-word utterances. For the speech pathologist concerned with language disorders in children, the evidence presented and the resulting conclusions should provide important hypotheses for application in treatment.One of the major contributions that this book will make to the literature on child language is the presentation of a large body of data in support of the conclusions that have been drawn. There is an extensive catalog of the children's earliest two-word utterances, negative sentences, and syntactic and single-word lexicons. This evidence should prove invaluable to other researchers in the field.

1,149 citations

Journal ArticleDOI
TL;DR: A phrase-based statistical machine translation approach the alignment template approach is described, which allows for general many-to-many relations between words and is easier to extend than classical statistical machinetranslation systems.
Abstract: A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source–channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German–English speech VERBMOBIL task, we analyze the effect of various system components. On the French–English Canadian HANSARDS task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese–English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

1,031 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
94% related
Semantics
24.9K papers, 653K citations
91% related
Machine translation
22.1K papers, 574.4K citations
89% related
Grammar
33.8K papers, 767.6K citations
88% related
Vocabulary
44.6K papers, 941.5K citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023127
2022282
2021136
2020183
2019174
2018174