Home
/
Authors
/
Christopher Bryant

Author

Christopher Bryant

Other affiliations: National University of Singapore

Bio: Christopher Bryant is an academic researcher from University of Cambridge. The author has contributed to research in topics: Computer science & Task (project management). The author has an hindex of 9, co-authored 18 publications receiving 1084 citations. Previous affiliations of Christopher Bryant include National University of Singapore.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The CoNLL-2014 Shared Task on Grammatical Error Correction

[...]

Hwee Tou Ng¹, Siew Mei Wu¹, Ted Briscoe¹, Christian Hadiwinoto¹, Raymond Hendy Susanto¹, Christopher Bryant² - Show less +2 more•Institutions (2)

National University of Singapore¹, University of Cambridge²

01 Jun 2014

TL;DR: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types as discussed by the authors, where a participating system is expected to detect and correct grammatical errors of all types.

...read moreread less

Abstract: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types. In this paper, we give the task definition, present the data sets, and describe the evaluation metric and scorer used in the shared task. We also give an overview of the various approaches adopted by the participating teams, and present the evaluation results. Compared to the CoNLL2013 shared task, we have introduced the following changes in CoNLL-2014: (1) A participating system is expected to detect and correct grammatical errors of all types, instead of just the five error types in CoNLL-2013; (2) The evaluation metric was changed from F1 to F0.5, to emphasize precision over recall; and (3) We have two human annotators who independently annotated the test essays, compared to just one human annotator in CoNLL-2013.

...read moreread less

484 citations

Proceedings Article•DOI•

Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

[...]

Christopher Bryant¹, Mariano Felice², Ted Briscoe²•Institutions (2)

National University of Singapore¹, University of Cambridge²

01 Jul 2017

TL;DR: ERRANT, a grammatical ERRor ANnotation Toolkit designed to automatically extract edits from parallel original and corrected sentences and classify them according to a new, dataset-agnostic, rule-based framework, which facilitates error type evaluation at different levels of granularity.

...read moreread less

Abstract: Until now, error type performance for Grammatical Error Correction (GEC) systems could only be measured in terms of recall because system output is not annotated. To overcome this problem, we introduce ERRANT, a grammatical ERRor ANnotation Toolkit designed to automatically extract edits from parallel original and corrected sentences and classify them according to a new, dataset-agnostic, rule-based framework. This not only facilitates error type evaluation at different levels of granularity, but can also be used to reduce annotator workload and standardise existing GEC datasets. Human experts rated the automatic edits as “Good” or “Acceptable” in at least 95% of cases, so we applied ERRANT to the system output of the CoNLL-2014 shared task to carry out a detailed error type analysis for the first time.

...read moreread less

241 citations

Proceedings Article•DOI•

The BEA-2019 Shared Task on Grammatical Error Correction.

[...]

Christopher Bryant¹, Mariano Felice¹, Øistein E. Andersen¹, Ted Briscoe¹•Institutions (1)

University of Cambridge¹

01 Aug 2019

TL;DR: This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC), which introduces a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities.

...read moreread less

Abstract: This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC). As with the CoNLL-2014 shared task, participants are required to correct all types of errors in test data. One of the main contributions of the BEA-2019 shared task is the introduction of a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities. Another contribution is the introduction of tracks, which control the amount of annotated data available to participants. Systems are evaluated in terms of ERRANT F_0.5, which allows us to report a much wider range of performance statistics. The competition was hosted on Codalab and remains open for further submissions on the blind test set.

...read moreread less

235 citations

Proceedings Article•DOI•

The CoNLL-2015 Shared Task on Shallow Discourse Parsing

[...]

Nianwen Xue¹, Hwee Tou Ng¹, Sameer Pradhan², Rashmi Prasad², Christopher Bryant², Attapol Rutherford³ - Show less +2 more•Institutions (3)

Brandeis University¹, National University of Singapore², University of Wisconsin–Milwaukee³

01 Jul 2015

TL;DR: The CoNLL-2015 Shared Task is on Shallow Discourse Parsing, a task focusing on identifying individual discourse relations that are present in a natural language text, and the evaluation protocol and metric used during this shared task is presented.

...read moreread less

Abstract: The CoNLL-2015 Shared Task is on Shallow Discourse Parsing, a task focusing on identifying individual discourse relations that are present in a natural language text. A discourse relation can be expressed explicitly or implicitly, and takes two arguments realized as sentences, clauses, or in some rare cases, phrases. Sixteen teams from three continents participated in this task. For the first time in the history of the CoNLL shared tasks, participating teams, instead of running their systems on the test set and submitting the output, were asked to deploy their systems on a remote virtual machine and use a web-based evaluation platform to run their systems on the test set. This meant they were unable to actually see the data set, thus preserving its integrity and ensuring its replicability. In this paper, we present the task definition, the training and test sets, and the evaluation protocol and metric used during this shared task. We also summarize the different approaches adopted by the participating teams, and present the evaluation results. The evaluation data sets and the scorer will serve as a benchmark for future research on shallow discourse parsing.

...read moreread less

152 citations

Proceedings Article•DOI•

How Far are We from Fully Automatic High Quality Grammatical Error Correction

[...]

Christopher Bryant¹, Hwee Tou Ng¹•Institutions (1)

National University of Singapore¹

01 Jul 2015

TL;DR: It is concluded that inter-annotator agreement statistics in grammatical error correction are less informative in fields where there may be more than one correct answer and a new metric is proposed based on the ratio between human and system performance.

...read moreread less

Abstract: In this paper, we first explore the role of inter-annotator agreement statistics in grammatical error correction and conclude that they are less informative in fields where there may be more than one correct answer. We next created a dataset of 50 student essays, each corrected by 10 different annotators for all error types, and investigated how both human and GEC system scores vary when different combinations of these annotations are used as the gold standard. Upon learning that even humans are unable to score higher than 75% F0.5, we propose a new metric based on the ratio between human and system performance. We also use this method to investigate the extent to which annotators agree on certain error categories, and find that similar results can be obtained from a smaller subset of just 10 essays.

...read moreread less

81 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Book•

Information retrieval

[...]

C. J. Van Rijsbergen

01 Jan 1975

TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.

...read moreread less

Abstract: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval. This chapter has been included because I think this is one of the most interesting and active areas of research in information retrieval. There are still many problems to be solved so I hope that this particular chapter will be of some help to those who want to advance the state of knowledge in this area. All the other chapters have been updated by including some of the more recent work on the topics covered. In preparing this new edition I have benefited from discussions with Bruce Croft, The material of this book is aimed at advanced undergraduate information (or computer) science students, postgraduate library science students, and research workers in the field of IR. Some of the chapters, particularly Chapter 6 * , make simple use of a little advanced mathematics. However, the necessary mathematical tools can be easily mastered from numerous mathematical texts that now exist and, in any case, references have been given where the mathematics occur. I had to face the problem of balancing clarity of exposition with density of references. I was tempted to give large numbers of references but was afraid they would have destroyed the continuity of the text. I have tried to steer a middle course and not compete with the Annual Review of Information Science and Technology. Normally one is encouraged to cite only works that have been published in some readily accessible form, such as a book or periodical. Unfortunately, much of the interesting work in IR is contained in technical reports and Ph.D. theses. For example, most the work done on the SMART system at Cornell is available only in reports. Luckily many of these are now available through the National Technical Information Service (U.S.) and University Microfilms (U.K.). I have not avoided using these sources although if the same material is accessible more readily in some other form I have given it preference. I should like to acknowledge my considerable debt to many people and institutions that have helped me. Let me say first that they are responsible for many of the ideas in this book but that only I wish to be held responsible. My greatest debt is to Karen Sparck Jones who taught me to research information retrieval as an experimental science. Nick Jardine and Robin …

...read moreread less

822 citations

Journal Article•DOI•

Cross-Sentence N-ary Relation Extraction with Graph LSTMs

[...]

Nanyun Peng¹, Hoifung Poon², Chris Quirk², Kristina Toutanova³, Wen-tau Yih² - Show less +1 more•Institutions (3)

Johns Hopkins University¹, Microsoft², Google³

05 Apr 2017-Transactions of the Association for Computational Linguistics

TL;DR: A general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction is explored, demonstrating its effectiveness with both conventional supervised learning and distant supervision.

...read moreread less

Abstract: Past work in relation extraction focuses on binary relations in single sentences. Recent NLP inroads in high-valued domains have kindled strong interest in the more general setting of extracting n-ary relations that span multiple sentences. In this paper, we explore a general relation extraction framework based on graph long short-term memory (graph LSTM), which can be easily extended to cross-sentence n-ary relation extraction. The graph formulation provides a unifying way to explore different LSTM approaches and incorporate various intra-sentential and inter-sentential dependencies, such as sequential, syntactic, and discourse relations. A robust contextual representation is learned for the entities, which serves as input to the relation classifier, making it easy for scaling to arbitrary relation arity n, as well as for multi-task learning with related relations. We evaluated this framework in two important domains in precision medicine and demonstrated its effectiveness with both supervised learning and distant supervision. Cross-sentence extraction produced far more knowledge, and multi-task learning significantly improved extraction accuracy. A thorough analysis comparing various LSTM approaches yielded interesting insight on how linguistic analysis impacts the performance.

...read moreread less

400 citations

Proceedings Article•

Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

[...]

Daniel Dahlmeier¹, Hwee Tou Ng¹, Siew Mei Wu¹•Institutions (1)

National University of Singapore¹

01 Jun 2013

TL;DR: The annotation schema and the data collection and annotation process of NUCLE are described and an unpublished study of annotator agreement for grammatical error correction is reported on.

...read moreread less

Abstract: We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the corpus in detail. In this paper, we address this need. We describe the annotation schema and the data collection and annotation process of NUCLE. Most importantly, we report on an unpublished study of annotator agreement for grammatical error correction. Finally, we present statistics on the distribution of grammatical errors in the NUCLE corpus.

...read moreread less

345 citations

Proceedings Article•DOI•

Semi-supervised Multitask Learning for Sequence Labeling

[...]

Marek Rei¹•Institutions (1)

University of Cambridge¹

01 Jan 2017

TL;DR: The authors proposed a language modeling objective to incentivize the system to learn general-purpose patterns of semantic and syntactic composition, which are also useful for improving accuracy on different sequence labeling tasks.

...read moreread less

Abstract: We propose a sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset. This language modeling objective incentivises the system to learn general-purpose patterns of semantic and syntactic composition, which are also useful for improving accuracy on different sequence labeling tasks. The architecture was evaluated on a range of datasets, covering the tasks of error detection in learner texts, named entity recognition, chunking and POS-tagging. The novel language modeling objective provided consistent performance improvements on every benchmark, without requiring any additional annotated or unannotated data.

...read moreread less

249 citations

Proceedings Article•DOI•

Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

[...]

Christopher Bryant¹, Mariano Felice², Ted Briscoe²•Institutions (2)

National University of Singapore¹, University of Cambridge²

01 Jul 2017

...read moreread less

241 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184

Collapse