Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

Open AccessProceedings Article

Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

- pp 22-31

TLDR

The annotation schema and the data collection and annotation process of NUCLE are described and an unpublished study of annotator agreement for grammatical error correction is reported on.

Abstract:

We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the corpus in detail. In this paper, we address this need. We describe the annotation schema and the data collection and annotation process of NUCLE. Most importantly, we report on an unpublished study of annotator agreement for grammatical error correction. Finally, we present statistics on the distribution of grammatical errors in the NUCLE corpus.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

The CoNLL-2014 Shared Task on Grammatical Error Correction

Hwee Tou Ng, +5 more

TL;DR: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types as discussed by the authors, where a participating system is expected to detect and correct grammatical errors of all types.

...read moreread less

Proceedings ArticleDOI

The BEA-2019 Shared Task on Grammatical Error Correction.

Christopher Bryant, +3 more

TL;DR: This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC), which introduces a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities.

...read moreread less

Proceedings ArticleDOI

Grammatical error correction using neural machine translation

Zheng Yuan, +1 more

TL;DR: This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC) with a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task.

...read moreread less

Proceedings ArticleDOI

GECToR -- Grammatical Error Correction: Tag, Not Rewrite

Kostiantyn Omelianchuk, +3 more

TL;DR: This paper presents a simple and efficient GEC sequence tagger using a Transformer encoder, pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora.

...read moreread less

Proceedings Article

The CoNLL-2013 Shared Task on Grammatical Error Correction

Hwee Tou Ng, +4 more

TL;DR: The task definition is given, the data sets are presented, and the evaluation metric and scorer used in the shared task are described, to give an overview of the various approaches adopted by the participating teams, and present the evaluation results.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The measurement of observer agreement for categorical data

J. R. Landis, +1 more

- 01 Mar 1977 -

Biometrics

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

...read moreread less

Journal ArticleDOI

A Coefficient of agreement for nominal Scales

Jacob Cohen

- 01 Apr 1960 -

Educational and Psychological Measuremen...

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

...read moreread less

Proceedings Article

A New Dataset and Method for Automatically Grading ESOL Texts

Helen Yannakoudakis, +2 more

TL;DR: It is demonstrated how supervised discriminative machine learning techniques can be used to automate the assessment of 'English as a Second or Other Language' (ESOL) examination scripts by using rank preference learning to explicitly model the grade relationships between scripts.

...read moreread less

Proceedings ArticleDOI

The CoNLL-2014 Shared Task on Grammatical Error Correction

Hwee Tou Ng, +5 more

TL;DR: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types as discussed by the authors, where a participating system is expected to detect and correct grammatical errors of all types.

...read moreread less

Proceedings Article

Better Evaluation for Grammatical Error Correction

Daniel Dahlmeier, +1 more

TL;DR: This work presents a novel method for evaluating grammatical error correction that is an algorithm for efficiently computing the sequence of phrase-level edits between a source sentence and a system hypothesis that achieves the highest overlap with the gold-standard annotation.

...read moreread less

Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

Citations

The CoNLL-2014 Shared Task on Grammatical Error Correction

The BEA-2019 Shared Task on Grammatical Error Correction.

Grammatical error correction using neural machine translation

GECToR -- Grammatical Error Correction: Tag, Not Rewrite

The CoNLL-2013 Shared Task on Grammatical Error Correction

References

The measurement of observer agreement for categorical data

A Coefficient of agreement for nominal Scales

A New Dataset and Method for Automatically Grading ESOL Texts

The CoNLL-2014 Shared Task on Grammatical Error Correction

Better Evaluation for Grammatical Error Correction

Related Papers (5)

The CoNLL-2014 Shared Task on Grammatical Error Correction

A New Dataset and Method for Automatically Grading ESOL Texts

Better Evaluation for Grammatical Error Correction

Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

Attention is All you Need