Error Classification Using Automatic Measures Based on n-grams and Edit Distance

doi:10.1007/978-3-031-20319-0_26

Book ChapterDOI

Error Classification Using Automatic Measures Based on n-grams and Edit Distance

L. Benko, +4 more

- 01 Jan 2022 -

Communications in computer and informati...

- pp 345-356

TLDR

In this paper , the authors attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak using a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations.

Abstract:

AbstractMachine translation (MT) evaluation plays an important task in the translation industry. The main issue in evaluating the MT quality is an unclear definition of translation quality. Several methods and techniques for measuring MT quality have been designed. Our study aims at interconnecting manual error classification with automatic metrics of MT evaluation. We attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak. We created a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations. The MT outputs, produced by Google translate, were manually annotated by three professionals using a categorical framework for error analysis and evaluated using reference proximity through the metrics of automated MT evaluation. The results showed that not all examined automatic metrics based on n-grams or edit distance should be implemented into a model for determining the MT quality. When determining the quality of machine translation in respect to syntactic-semantic correlativeness, it is sufficient to consider only the Recall, BLEU-4 or F-measure, ROUGE-L and NIST (based on n-grams) and the metric CharacTER, which is based on edit distance.KeywordsMachine translationAutomatic metricsError classification

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Proceedings ArticleDOI

Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

George R. Doddington

TL;DR: NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.

...read moreread less

Proceedings ArticleDOI

chrF: character n-gram F-score for automatic MT evaluation

Maja Popović

TL;DR: The proposed use of character n-gram F-score for automatic evaluation of machine translation output shows very promising results, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.

...read moreread less

Proceedings ArticleDOI

CharacTer: Translation Edit Rate on Character Level

Weiyue Wang, +3 more

TL;DR: This work proposes translation edit rate on character level (CharacTER), which calculates the character level edit distance while performing the shift edit on word level, and applies the hypothesis sentence length for normalizing the edit distance in CharacTER.

...read moreread less

Proceedings ArticleDOI

chrF deconstructed: beta parameters and n-gram weights

Maja Popović

TL;DR: This work investigated CHRF in more details, namely β parameters in range from 1/6 to 6, and it was found that CHRF2 is the most promising version, and investigated different n-gram weights for CHRF 2 and found out that the uniform weights are the best option.

...read moreread less