scispace - formally typeset
Open AccessProceedings Article

Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

TLDR
The annotation schema and the data collection and annotation process of NUCLE are described and an unpublished study of annotator agreement for grammatical error correction is reported on.
Abstract
We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the corpus in detail. In this paper, we address this need. We describe the annotation schema and the data collection and annotation process of NUCLE. Most importantly, we report on an unpublished study of annotator agreement for grammatical error correction. Finally, we present statistics on the distribution of grammatical errors in the NUCLE corpus.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

The CoNLL-2014 Shared Task on Grammatical Error Correction

TL;DR: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types as discussed by the authors, where a participating system is expected to detect and correct grammatical errors of all types.
Proceedings ArticleDOI

The BEA-2019 Shared Task on Grammatical Error Correction.

TL;DR: This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC), which introduces a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities.
Proceedings ArticleDOI

Grammatical error correction using neural machine translation

TL;DR: This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC) with a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task.
Proceedings ArticleDOI

GECToR -- Grammatical Error Correction: Tag, Not Rewrite

TL;DR: This paper presents a simple and efficient GEC sequence tagger using a Transformer encoder, pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora.
Proceedings Article

The CoNLL-2013 Shared Task on Grammatical Error Correction

TL;DR: The task definition is given, the data sets are presented, and the evaluation metric and scorer used in the shared task are described, to give an overview of the various approaches adopted by the participating teams, and present the evaluation results.
References
More filters
Journal ArticleDOI

The measurement of observer agreement for categorical data

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Journal ArticleDOI

A Coefficient of agreement for nominal Scales

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Proceedings Article

A New Dataset and Method for Automatically Grading ESOL Texts

TL;DR: It is demonstrated how supervised discriminative machine learning techniques can be used to automate the assessment of 'English as a Second or Other Language' (ESOL) examination scripts by using rank preference learning to explicitly model the grade relationships between scripts.
Proceedings ArticleDOI

The CoNLL-2014 Shared Task on Grammatical Error Correction

TL;DR: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types as discussed by the authors, where a participating system is expected to detect and correct grammatical errors of all types.
Proceedings Article

Better Evaluation for Grammatical Error Correction

TL;DR: This work presents a novel method for evaluating grammatical error correction that is an algorithm for efficiently computing the sequence of phrase-level edits between a source sentence and a system hypothesis that achieves the highest overlap with the gold-standard annotation.
Related Papers (5)