Findings of the 2013 Workshop on Statistical Machine Translation
Citations
1,124 citations
Cites methods from "Findings of the 2013 Workshop on St..."
...We release one thousand new SpanishEnglish STS pairs sourced from the 2013 WMT translation task and produced by a phrase-based Moses SMT system (Bojar et al., 2013)....
[...]
929 citations
743 citations
Cites methods from "Findings of the 2013 Workshop on St..."
..., 2012), WMT13 (Bojar et al., 2013) and WMT14 (Bojar et al....
[...]
...System-level correlations The evaluation metrics were compared with human rankings on the system-level by means of Spearman’s correlation coefficients ρ for the WMT12 and WMT13 data and Pearson’s correlation coefficients r for the WMT14 data....
[...]
...The CHRF scores were calculated for all available translation outputs from the WMT12 (Callison-Burch et al., 2012), WMT13 (Bojar et al., 2013) and WMT14 (Bojar et al., 2014) shared tasks, and then compared with human rankings....
[...]
616 citations
Cites background or methods from "Findings of the 2013 Workshop on St..."
...This conference builds on nine previous WMT workshops (Koehn and Monz, 2006; Callison-Burch et al., 2007, 2008, 2009, 2010, 2011, 2012; Bojar et al., 2013, 2014, 2015)....
[...]
...a trivial “all-BAD” baseline outperforms many real systems in terms of F1-BAD score (Bojar et al., 2013)....
[...]
...…the WMT shared task on quality estimation (QE) of machine translation (MT) builds on the previous editions of the task (Callison-Burch et al., 2012; Bojar et al., 2013, 2014, 2015), with “traditional” tasks at sentence and word levels, a new task for entire documents quality prediction, and a…...
[...]
511 citations
Cites background or methods or result from "Findings of the 2013 Workshop on St..."
...Compared to the results regarding time prediction in the Quality Evaluation shared task from 2013 (Bojar et al., 2013), we note that this time all submissions were able to beat the baseline system (compared to only 1/3 of the submissions in 2013)....
[...]
...• Ranking: DeltaAvg (primary metric) (Bojar et al., 2013) and Spearman’s rank correlation....
[...]
...(Bojar et al., 2013) has focused on prediction of automatically derived labels, generally due to practical considerations as the manual annotation is labour intensive....
[...]
...This workshop builds on eight previous WMT workshops (Koehn and Monz, 2006; Callison-Burch et al., 2007, 2008, 2009, 2010, 2011, 2012; Bojar et al., 2013)....
[...]
...It has proved robust across a range of language pairs, MT systems, and text domains for predicting various forms of postediting effort (Callison-Burch et al., 2012; Bojar et al., 2013)....
[...]
References
64,109 citations
"Findings of the 2013 Workshop on St..." refers background in this paper
...The exact interpretation of the kappa coefficient is difficult, but according to Landis and Koch (1977), 0–0.2 is slight, 0.2–0.4 is fair, 0.4–0.6 is moderate, 0.6–0.8 is substantial, and 0.8–1.0 is almost perfect....
[...]
35,847 citations
34,965 citations
"Findings of the 2013 Workshop on St..." refers methods in this paper
...We measured pairwise agreement among annotators using Cohen’s kappa coefficient (κ) (Cohen, 1960), which is defined as...
[...]
...We measured pairwise agreement among annotators using Cohen’s kappa coefficient (κ) (Cohen, 1960), which is defined as κ = P (A)− P (E) 1− P (E) where P (A) is the proportion of times that the annotators agree, and P (E) is the proportion of time that they would agree by chance....
[...]
30,190 citations
"Findings of the 2013 Workshop on St..." refers methods in this paper
...For GermanEnglish, LogReg was trained with Stepwise Feature Selection (Hosmer, 1989) on two feature sets: Feature Set 24 includes basic counts augmented with PCFG parsing features (number of VPs, alternative parses, parse probability) on both source and target sentences (Avramidis et al., 2011),…...
[...]
...For GermanEnglish, LogReg was trained with Stepwise Feature Selection (Hosmer, 1989) on two feature sets: Feature Set 24 includes basic counts augmented with PCFG parsing features (number of VPs, alternative parses, parse probability) on both source and target sentences (Avramidis et al., 2011), and pseudo-reference METEOR score; the most successful set, Feature Set 33 combines those 24 features with the 17 baseline features....
[...]
...For GermanEnglish, LogReg was trained with Stepwise Feature Selection (Hosmer, 1989) on two feature sets: Feature Set 24 includes basic counts augmented with PCFG parsing features (number of VPs, alternative parses, parse probability) on both source and target sentences (Avramidis et al....
[...]
19,603 citations
"Findings of the 2013 Workshop on St..." refers methods in this paper
...The prediction models were trained using four classifiers in the Weka toolkit (Hall et al., 2009): linear regression, M5P trees, multi layer perceptron and SVM regression....
[...]