Book ChapterDOI
Error Classification Using Automatic Measures Based on n-grams and Edit Distance
TLDR
In this paper , the authors attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak using a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations.Abstract:
AbstractMachine translation (MT) evaluation plays an important task in the translation industry. The main issue in evaluating the MT quality is an unclear definition of translation quality. Several methods and techniques for measuring MT quality have been designed. Our study aims at interconnecting manual error classification with automatic metrics of MT evaluation. We attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak. We created a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations. The MT outputs, produced by Google translate, were manually annotated by three professionals using a categorical framework for error analysis and evaluated using reference proximity through the metrics of automated MT evaluation. The results showed that not all examined automatic metrics based on n-grams or edit distance should be implemented into a model for determining the MT quality. When determining the quality of machine translation in respect to syntactic-semantic correlativeness, it is sufficient to consider only the Recall, BLEU-4 or F-measure, ROUGE-L and NIST (based on n-grams) and the metric CharacTER, which is based on edit distance.KeywordsMachine translationAutomatic metricsError classification read more
References
More filters
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings ArticleDOI
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
TL;DR: NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.
Proceedings ArticleDOI
chrF: character n-gram F-score for automatic MT evaluation
TL;DR: The proposed use of character n-gram F-score for automatic evaluation of machine translation output shows very promising results, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.
Proceedings ArticleDOI
CharacTer: Translation Edit Rate on Character Level
TL;DR: This work proposes translation edit rate on character level (CharacTER), which calculates the character level edit distance while performing the shift edit on word level, and applies the hypothesis sentence length for normalizing the edit distance in CharacTER.
Proceedings ArticleDOI
chrF deconstructed: beta parameters and n-gram weights
TL;DR: This work investigated CHRF in more details, namely β parameters in range from 1/6 to 6, and it was found that CHRF2 is the most promising version, and investigated different n-gram weights for CHRF 2 and found out that the uniform weights are the best option.