scispace - formally typeset
Book ChapterDOI

Error Classification Using Automatic Measures Based on n-grams and Edit Distance

TLDR
In this paper , the authors attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak using a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations.
Abstract
AbstractMachine translation (MT) evaluation plays an important task in the translation industry. The main issue in evaluating the MT quality is an unclear definition of translation quality. Several methods and techniques for measuring MT quality have been designed. Our study aims at interconnecting manual error classification with automatic metrics of MT evaluation. We attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak. We created a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations. The MT outputs, produced by Google translate, were manually annotated by three professionals using a categorical framework for error analysis and evaluated using reference proximity through the metrics of automated MT evaluation. The results showed that not all examined automatic metrics based on n-grams or edit distance should be implemented into a model for determining the MT quality. When determining the quality of machine translation in respect to syntactic-semantic correlativeness, it is sufficient to consider only the Recall, BLEU-4 or F-measure, ROUGE-L and NIST (based on n-grams) and the metric CharacTER, which is based on edit distance.KeywordsMachine translationAutomatic metricsError classification

read more

Content maybe subject to copyright    Report

References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings ArticleDOI

Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

TL;DR: NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.
Proceedings ArticleDOI

chrF: character n-gram F-score for automatic MT evaluation

TL;DR: The proposed use of character n-gram F-score for automatic evaluation of machine translation output shows very promising results, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.
Proceedings ArticleDOI

CharacTer: Translation Edit Rate on Character Level

TL;DR: This work proposes translation edit rate on character level (CharacTER), which calculates the character level edit distance while performing the shift edit on word level, and applies the hypothesis sentence length for normalizing the edit distance in CharacTER.
Proceedings ArticleDOI

chrF deconstructed: beta parameters and n-gram weights

TL;DR: This work investigated CHRF in more details, namely β parameters in range from 1/6 to 6, and it was found that CHRF2 is the most promising version, and investigated different n-gram weights for CHRF 2 and found out that the uniform weights are the best option.
Related Papers (5)