scispace - formally typeset
Topic

Rule-based machine translation

About: Rule-based machine translation is a(n) research topic. Over the lifetime, 8804 publication(s) have been published within this topic receiving 240581 citation(s).
Papers
More filters

Posted Content
Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

14,077 citations


Proceedings ArticleDOI
25 Jun 2007-
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
Abstract: We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

5,646 citations


Posted Content
TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

4,719 citations


Journal ArticleDOI
16 Jun 1995-Science
TL;DR: To test the effects of relevant visual context on the rapid mental processes that accompany spoken language comprehension, eye movements were recorded with a head-mounted eye-tracking system while subjects followed instructions to manipulate real objects.
Abstract: Psycholinguists have commonly assumed that as a spoken linguistic message unfolds over time, it is initially structured by a syntactic processing module that is encapsulated from information provided by other perceptual and cognitive systems. To test the effects of relevant visual context on the rapid mental processes that accompany spoken language comprehension, eye movements were recorded with a head-mounted eye-tracking system while subjects followed instructions to manipulate real objects. Visual context influenced spoken word recognition and mediated syntactic processing, even during the earliest moments of language processing.

2,382 citations


Journal ArticleDOI
TL;DR: This paper develops a computational technique for computing with words without any loss of information in the 2-tuple linguistic model and extends different classical aggregation operators to deal with this model.
Abstract: The fuzzy linguistic approach has been applied successfully to many problems. However, there is a limitation of this approach imposed by its information representation model and the computation methods used when fusion processes are performed on linguistic values. This limitation is the loss of information; this loss of information implies a lack of precision in the final results from the fusion of linguistic information. In this paper, we present tools for overcoming this limitation. The linguistic information is expressed by means of 2-tuples, which are composed of a linguistic term and a numeric value assessed in (-0.5, 0.5). This model allows a continuous representation of the linguistic information on its domain, therefore, it can represent any counting of information obtained in a aggregation process. We then develop a computational technique for computing with words without any loss of information. Finally, different classical aggregation operators are extended to deal with the 2-tuple linguistic model.

2,030 citations


Network Information
Related Topics (5)
Natural language

31.1K papers, 806.8K citations

94% related
Parsing

21.5K papers, 545.4K citations

92% related
Syntax

16.7K papers, 518.6K citations

92% related
Semantics

24.9K papers, 653K citations

91% related
Text corpus

4K papers, 107.8K citations

90% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20224
2021133
2020181
2019174
2018174
2017286

Top Attributes

Show by:

Topic's top 5 most impactful authors

Hermann Ney

90 papers, 5.9K citations

Eiichiro Sumita

61 papers, 1.3K citations

Alex Waibel

58 papers, 2.2K citations

Francisco Casacuberta

39 papers, 950 citations

Andy Way

34 papers, 738 citations