Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction
Citations
235 citations
Cites background or methods from "Automatic Annotation and Evaluation..."
...is calculated using the ERRANT scorer (Bryant et al., 2017), rather than the M2 scorer (Dahlmeier and Ng, 2012), because the ERRANT scorer can provide much more detailed feedback, e....
[...]
...5 scores in both the gold and auto reference settings, we note that MaxMatch exploits a dynamic alignment to artificially minimise the false positive rate and hence produces slightly inflated scores (Bryant et al., 2017)....
[...]
...Systems are evaluated on the W&I+LOCNESS test set using the ERRANT scorer (Bryant et al., 2017), an improved version of the MaxMatch scorer (Dahlmeier and Ng, 2012) that was previously used in the CoNLL shared tasks....
[...]
...Since FCE and NUCLE were annotated according to different error type frameworks and Lang8 and W&I+LOCNESS were not annotated with error types at all, we re-annotated all corpora automatically using ERRANT (Bryant et al., 2017)....
[...]
...Unlike CoNLL-2014 however, this is calculated using the ERRANT scorer (Bryant et al., 2017), rather than the M2 scorer (Dahlmeier and Ng, 2012), because the ERRANT scorer can provide much more detailed feedback, e.g. in terms of performance on specific error types....
[...]
175 citations
Cites background or methods from "Automatic Annotation and Evaluation..."
..., 2014) evaluated by official M2 scorer (Dahlmeier and Ng, 2012), and on BEA-2019 dev and test sets evaluated by ERRANT (Bryant et al., 2017)....
[...]
...We report results on CoNLL2014 test set (Ng et al., 2014) evaluated by official M2 scorer (Dahlmeier and Ng, 2012), and on BEA-2019 dev and test sets evaluated by ERRANT (Bryant et al., 2017)....
[...]
..., grammatical error type classification (Bryant et al., 2017)....
[...]
...…2http://nlpprogress.com/english/ grammatical_error_correction.html (Accessed 1 April 2020). large amounts of training data and (iii) interpretability and explainability; they require additional functionality to explain corrections, e.g., grammatical error type classification (Bryant et al., 2017)....
[...]
155 citations
Cites methods from "Automatic Annotation and Evaluation..."
...The performance of participating systems was evaluated using the ERRANT scorer (Bryant et al., 2017) which reports a F0....
[...]
131 citations
Cites methods from "Automatic Annotation and Evaluation..."
...The error categories were tagged using the approach in Bryant et al. (2017)....
[...]
113 citations
Additional excerpts
...…error detection (Leacock et al., 2010; Rei and Yannakoudakis, 2016; Kaneko et al., 2017) and GEC evaluation (Tetreault et al., 2010b; Madnani et al., 2011; Dahlmeier and Ng, 2012c; Napoles et al., 2015; Sakaguchi et al., 2016; Napoles et al., 2016; Bryant et al., 2017; Asano et al., 2017)....
[...]
..., 2017) and GEC evaluation (Tetreault et al., 2010b; Madnani et al., 2011; Dahlmeier and Ng, 2012c; Napoles et al., 2015; Sakaguchi et al., 2016; Napoles et al., 2016; Bryant et al., 2017; Asano et al., 2017)....
[...]
References
37,183 citations
521 citations
"Automatic Annotation and Evaluation..." refers methods in this paper
...For example, a classifier trained on the First Certificate in English (FCE) corpus (Yannakoudakis et al., 2011) is unlikely to perform as well on the National University of Singapore Corpus of Learner English (NUCLE) (Dahlmeier and Ng, 2012) or vice versa, because both corpora have been annotated according to different standards (cf....
[...]
...For example, a classifier trained on the First Certificate in English (FCE) corpus (Yannakoudakis et al., 2011) is unlikely to perform as well on the National University of Singapore Corpus of Learner English (NUCLE) (Dahlmeier and Ng, 2012) or vice versa, because both corpora have been annotated…...
[...]
502 citations
484 citations
322 citations
"Automatic Annotation and Evaluation..." refers background or methods in this paper
..., 2011) is unlikely to perform as well on the National University of Singapore Corpus of Learner English (NUCLE) (Dahlmeier and Ng, 2012) or vice versa, because both corpora have been annotated according to different standards (cf....
[...]
...To show that automatic references are feasible alternatives to gold references, we evaluated each team in the CoNLL-2014 shared task using both types of reference with the M2 scorer (Dahlmeier and Ng, 2012), the de facto standard of GEC evaluation, and our own scorer....
[...]
...Since no scorer is currently capable of calculating error type performance however (Dahlmeier and Ng, 2012; Felice and Briscoe, 2015; Napoles et al., 2015), we instead built our own....
[...]
...It is worth mentioning that despite an increased interest in GEC evaluation in recent years (Dahlmeier and Ng, 2012; Felice and Briscoe, 2015; Bryant and Ng, 2015; Napoles et al., 2015; Grundkiewicz et al., 2015; Sakaguchi et al., 2016), ERRANT is the only toolkit currently capable of producing error types scores....
[...]
...For example, a classifier trained on the First Certificate in English (FCE) corpus (Yannakoudakis et al., 2011) is unlikely to perform as well on the National University of Singapore Corpus of Learner English (NUCLE) (Dahlmeier and Ng, 2012) or vice versa, because both corpora have been annotated according to different standards (cf. Xue and Hwa (2014))....
[...]