scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An improvement in BLEU metric for English-Hindi machine translation evaluation

29 Apr 2016-Vol. 2016, pp 331-336
TL;DR: In the proposed work, the applicability of BLEU metric and of its modified versions for English to Hindi Machine Translation(s) particularly for Agriculture Domain is checked and a synonym replacement module is incorporated in the algorithm.
Abstract: The task of the Evaluation of Machine Translation (MT) is very difficult and challenging too. The difficulty comes from the fact that translation is not a science but it is more an art; most of the sentences can be translated in many acceptable forms. Consequently, there is no such fix standard against which a particular translation can be evaluated. If it has been possible to make an independent algorithm that would be able to evaluate a specific Machine Translation, then the belief is that this evaluation algorithm will be a better algorithm than the translating algorithm itself. Initially, MT evaluation used to be done by human beings which was a time-consuming task and highly subjective too. Also evaluation results may vary from one human evaluator to another for the same sentence pair. Therefore we need automatic evaluation systems, which are quick and objective. Different methods for Automatic Evaluation of Machine Translation have been projected in recent years, out of which many of them have been accepted willingly by the MT community. In the proposed work, we have checked the applicability of BLEU metric and of its modified versions for English to Hindi Machine Translation(s) particularly for Agriculture Domain. Further, we have incorporated some additional features like synonym replacement and shallow parsing modules and after that we have calculated the final score by using BLEU and M-BLEU metrics. The sentences which have been tested are taken from Agriculture Domain. The BLEU metric does not consider the synonym problem and it considers synonym as different words thereby lowering down the final calculated score while comparing the human translations with the machine translations. To overcome this drawback of BLEU, we have incorporated a synonym replacement module in our algorithm. For this, first of all the word is replaced by its synonym present in any of the reference human translations and then it is compared with the reference human translation.
Citations
More filters
Proceedings Article
01 Mar 2019
TL;DR: In this survey, different metrics under the automatic evaluation techniques in order to evaluate the output quality of MTS are discussed.
Abstract: Machine translation is a process of translating one natural language to another without much human interaction. Evaluation of any Machine Translation System (MTS) is the most important factor in a machine learning environment. There are many techniques existing to determine and optimize the quality of output in any MTS. Earlier methods are based on human judgments. Even though human evaluation methods are very much reliable, they suffer due to some disadvantages such as high cost, more time consuming and also poor re-usability. Hence, automatic methods have been proposed to reduce time and cost. In this survey, we have discussed different metrics under the automatic evaluation techniques in order to evaluate the output quality of MTS. It is believed that machine learning system developers at large would get befitted by this survey.

1 citations

Journal ArticleDOI
13 Apr 2021
TL;DR: This research article proposes a generative adversarial network, a solution to pixel-to-pixel rendering problems and reduced the loss function to the maximum under all interactions.
Abstract: In many existing solutions of image-to-image rendering problems, the only focus is to find the closest output of the Generative Adversarial Network (GAN). In this research article, authors propose a generative adversarial network, a solution to pixel-to-pixel rendering problems and reduced the loss function to the maximum under all interactions. For achieving the best result, we have considered the mean square loss function in the generator and binary cross for the discriminator. Our proposed model deals with not only images but also read sketches where the edges are not sharp too. We have used a facade dataset to test our proposed model.

1 citations


Cites background from "An improvement in BLEU metric for E..."

  • ...It is designed with the size of the receptive field, sometimes referred to as the effective receptive field [24-31]....

    [...]

Proceedings Article
01 May 2020
TL;DR: This work proposes a systematic method for generating exemplary sentences for each newly integrated dialect word in the Awadhi dialect to the Hindi IndoWordnet and generates exemplary sentences to illustrate the meaning and usage of the word.
Abstract: Due to rapid urbanization and a homogenized medium of instruction imposed in educational institutions, we have lost much of the golden literary offerings of the diverse languages and dialects that India once possessed. There is an urgent need to mitigate the paucity of online linguistic resources for several Hindi dialects. Given the corpus of a dialect, our system integrates the vocabulary of the dialect to the synsets of IndoWordnet along with their corresponding meta-data. Furthermore, we propose a systematic method for generating exemplary sentences for each newly integrated dialect word. The vocabulary thus integrated follows the schema of the wordnet and generates exemplary sentences to illustrate the meaning and usage of the word. We illustrate our methodology with the integration of words in the Awadhi dialect to the Hindi IndoWordnet to achieve an enrichment of 11.68 % to the existing Hindi synsets. The BLEU metric for evaluating the quality of sentences yielded a 75th percentile score of 0.6351.
References
More filters
Proceedings ArticleDOI
06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

21,126 citations

Proceedings ArticleDOI
24 Mar 2002
TL;DR: NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.
Abstract: Evaluation is recognized as an extremely helpful forcing function in Human Language Technology R&D. Unfortunately, evaluation has not been a very powerful tool in machine translation (MT) research because it requires human judgments and is thus expensive and time-consuming and not easily factored into the MT research agenda. However, at the July 2001 TIDES PI meeting in Philadelphia, IBM described an automatic MT evaluation technique that can provide immediate feedback and guidance in MT research. Their idea, which they call an "evaluation understudy", compares MT output with expert reference translations in terms of the statistics of short sequences of words (word N-grams). The more of these N-grams that a translation shares with the reference translations, the better the translation is judged to be. The idea is elegant in its simplicity. But far more important, IBM showed a strong correlation between these automatically generated scores and human judgments of translation quality. As a result, DARPA commissioned NIST to develop an MT evaluation facility based on the IBM work. This utility is now available from NIST and serves as the primary evaluation measure for TIDES MT research.

1,734 citations

Proceedings Article
01 Apr 2006
TL;DR: It is shown that an improved Bleu score is neither necessary nor sufficient for achieving an actual improvement in translation quality, and two significant counterexamples to Bleu’s correlation with human judgments of quality are given.
Abstract: We argue that the machine translation community is overly reliant on the Bleu machine translation evaluation metric We show that an improved Bleu score is neither necessary nor sufficient for achieving an actual improvement in translation quality, and give two significant counterexamples to Bleu’s correlation with human judgments of quality This offers new potential for research which was previously deemed unpromising by an inability to improve upon Bleu scores

724 citations

Book
01 Jan 1994

322 citations

ReportDOI
01 Jan 2006
TL;DR: The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives and has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved.
Abstract: Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. The relevant software is publicly available from http://nlp.cs.nyu.edu/GTM/.

299 citations