Results of the WMT17 metrics shared task
Citations
1,456 citations
Cites background or methods from "Results of the WMT17 metrics shared..."
...Machine Translation We use the WMT17 metric evaluation dataset (Bojar et al., 2017), which...
[...]
...Machine Translation We use the WMT17 metric evaluation dataset (Bojar et al., 2017), which contains translation systems outputs, gold reference translations, and two types of human judgment scores....
[...]
...In machine translation, BERTSCORE correlates better with segment-level human judgment than existing metrics on the common WMT17 benchmark (Bojar et al., 2017), including outperforming metrics learned specifically for this dataset....
[...]
819 citations
543 citations
465 citations
Cites background or methods from "Results of the WMT17 metrics shared..."
...Although this approach is quite straightforward, we will show in Section 5 that it gives state-of-theart results on WMT Metrics Shared Task 17-19, which makes it a high-performing evaluation metric....
[...]
...First, we benchmark BLEURT against existing text generation metrics on the last 3 years of the WMT Metrics Shared Task (Bojar et al., 2017)....
[...]
...5The organizers managed to collect 15 adequacy scores for each translation, and thus the ratings are almost perfectly repeatable (Bojar et al., 2017) Results: Figure 2 presents BLEURT’s performance as we vary the train and test skew independently....
[...]
...All the experiments that follow are based on the WMT Metrics Shared Task 2017, because the ratings for this edition are particularly reliable.5 Methodology: We create increasingly challenging datasets by sub-sampling the records from the WMT Metrics shared task, keeping low-rated translations for training and high-rated translations for test....
[...]
...To illustrate, consider the WMT Metrics Shared Task, an annual benchmark in which translation metrics are compared on their ability to imitate human assessments....
[...]
387 citations
Cites methods from "Results of the WMT17 metrics shared..."
...Data We obtain the source language sentences, their system and reference translations from the WMT 2017 news translation shared task (Bojar et al., 2017)....
[...]
...1 Machine Translation Data We obtain the source language sentences, their system and reference translations from the WMT 2017 news translation shared task (Bojar et al., 2017)....
[...]
...Other metrics include SentBLEU, NIST, chrF, TER, WER, PER, CDER, and METEOR (Lavie and Agarwal, 2007) that are used and described in the WMT metrics shared task (Bojar et al., 2017; Ma et al., 2018)....
[...]
References
20,027 citations
1,734 citations
299 citations
231 citations
174 citations