Home
/
Authors
/
Roman Grundkiewicz

Author

Roman Grundkiewicz

Other affiliations: Adam Mickiewicz University in Poznań

Bio: Roman Grundkiewicz is an academic researcher from Microsoft. The author has contributed to research in topics: Machine translation & Task (project management). The author has an hindex of 21, co-authored 47 publications receiving 1658 citations. Previous affiliations of Roman Grundkiewicz include Adam Mickiewicz University in Poznań.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2011
2004

Papers

PDF

Open Access

More filters

Posted Content•

Marian: Fast Neural Machine Translation in C++

[...]

Marcin Junczys-Dowmunt¹, Roman Grundkiewicz, Tomasz Dwojak², Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide³, Ulrich Germann, Alham Fikri Aji², Nikolay Bogoychev², André F. T. Martins⁴, Alexandra Birch - Show less +8 more•Institutions (4)

Adam Mickiewicz University in Poznań¹, University of Edinburgh², Microsoft³, Instituto Superior Técnico⁴

01 Apr 2018-arXiv: Computation and Language

TL;DR: Marian is an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs that can achieve high training and translation speed.

...read moreread less

Abstract: We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

...read moreread less

220 citations

Book Chapter•DOI•

Automatic Extraction of Polish Language Errors from Text Edition History

[...]

Roman Grundkiewicz¹•Institutions (1)

Adam Mickiewicz University in Poznań¹

01 Sep 2013

TL;DR: By applying the methods of automatic extraction of various kinds of errors such as spelling, typographical, grammatical, syntactic, semantic, and stylistic ones from text edition histories to Wikipedia's article revision history, this paper created the large and publicly available corpus of naturally-occurring language errors for Polish, called PlEWi.

...read moreread less

Abstract: There are no large error corpora for a number of languages, despite the fact that they have multiple applications in natural language processing. The main reason underlying this situation is a high cost of manual corpora creation. In this paper we present the methods of automatic extraction of various kinds of errors such as spelling, typographical, grammatical, syntactic, semantic, and stylistic ones from text edition histories. By applying of these methods to the Wikipedia’s article revision history, we created the large and publicly available corpus of naturally-occurring language errors for Polish, called PlEWi. Finally, we analyse and evaluate the detected error categories in our corpus.

...read moreread less

178 citations

Proceedings Article•DOI•

Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data

[...]

Roman Grundkiewicz, Marcin Junczys-Dowmunt¹, Kenneth Heafield•Institutions (1)

Microsoft¹

02 Aug 2019

TL;DR: This work proposes a simple and surprisingly effective unsupervised synthetic error generation method based on confusion sets extracted from a spellchecker to increase the amount of training data.

...read moreread less

Abstract: Considerable effort has been made to address the data sparsity problem in neural grammatical error correction. In this work, we propose a simple and surprisingly effective unsupervised synthetic error generation method based on confusion sets extracted from a spellchecker to increase the amount of training data. Synthetic data is used to pre-train a Transformer sequence-to-sequence model, which not only improves over a strong baseline trained on authentic error-annotated data, but also enables the development of a practical GEC system in a scenario where little genuine error-annotated data is available. The developed systems placed first in the BEA19 shared task, achieving 69.47 and 64.24 F0.5 in the restricted and low-resource tracks respectively, both on the W&I+LOCNESS test set. On the popular CoNLL 2014 test set, we report state-of-the-art results of 64.16 M² for the submitted system, and 61.30 M² for the constrained system trained on the NUCLE and Lang-8 data.

...read moreread less

155 citations

Proceedings Article•DOI•

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

[...]

Marcin Junczys-Dowmunt¹, Roman Grundkiewicz, Shubha Guha, Kenneth Heafield•Institutions (1)

Adam Mickiewicz University in Poznań¹

06 Jun 2018

TL;DR: This article proposed a set of model-independent methods for neural grammatical error correction (GEC) that can be easily applied in most GEC settings, including adding source-side noise, domain-adaptation techniques, a GEC-specific training-objective, transfer learning with monolingual data, and ensembling of independently trained GEC models and language models.

...read moreread less

Abstract: Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines. We demonstrate parallels between neural GEC and low-resource neural MT and successfully adapt several methods from low-resource MT to neural GEC. We further establish guidelines for trustable results in neural GEC and propose a set of model-independent methods for neural GEC that can be easily applied in most GEC settings. Proposed methods include adding source-side noise, domain-adaptation techniques, a GEC-specific training-objective, transfer learning with monolingual data, and ensembling of independently trained GEC models and language models. The combined effects of these methods result in better than state-of-the-art neural GEC models that outperform previously best neural GEC systems by more than 10% M² on the CoNLL-2014 benchmark and 5.9% on the JFLEG test set. Non-neural state-of-the-art systems are outperformed by more than 2% on the CoNLL-2014 benchmark and by 4% on JFLEG.

...read moreread less

137 citations

Proceedings Article•DOI•

Marian: Fast Neural Machine Translation in C++

[...]

Adam Mickiewicz University in Poznań¹, University of Edinburgh², Microsoft³, Instituto Superior Técnico⁴

01 Apr 2018

TL;DR: In this paper, an efficient and self-contained NMT framework with an integrated automatic differentiation engine based on dynamic computation graphs is presented. But their work is limited to C++.

...read moreread less

114 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

[...]

Myle Ott¹, Sergey Edunov¹, Alexei Baevski¹, Angela Fan¹, Sam Gross¹, Nathan Ng, David Grangier², Michael Auli¹ - Show less +4 more•Institutions (2)

Facebook¹, Google²

01 Apr 2019-arXiv: Computation and Language

TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.

...read moreread less

Abstract: fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at this https URL

...read moreread less

1,650 citations

Proceedings Article•DOI•

Findings of the 2016 Conference on Machine Translation

[...]

Ondˇrej Bojar, Rajen Chatterjee¹, Christian Federmann¹, Yvette Graham², Barry Haddow, Matthias Huck, Antonio Jimeno Yepes³, Philipp Koehn¹, Varvara Logacheva⁴, Christof Monz⁵, Matteo Negri⁶, Aurélie Névéol⁷, Mariana Neves⁸, Martin Popel⁹, Matt Post¹⁰, Raphael Rubino², Carolina Scarton⁴, Lucia Specia⁴, Marco Turchi⁶, Karin Verspoor¹¹, Marcos Zampieri¹² - Show less +17 more•Institutions (12)

University of Edinburgh¹, Dublin City University², IBM³, University of Sheffield⁴, University of Amsterdam⁵, fondazione bruno kessler⁶, Université Paris-Saclay⁷, Hasso Plattner Institute⁸, Charles University in Prague⁹, Johns Hopkins University¹⁰, University of Melbourne¹¹, Saarland University¹²

12 Aug 2016

TL;DR: The results of the WMT16 shared tasks are presented, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task.

...read moreread less

Abstract: This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.

...read moreread less

616 citations

Proceedings Article•DOI•

The CoNLL-2014 Shared Task on Grammatical Error Correction

[...]

Hwee Tou Ng¹, Siew Mei Wu¹, Ted Briscoe¹, Christian Hadiwinoto¹, Raymond Hendy Susanto¹, Christopher Bryant² - Show less +2 more•Institutions (2)

National University of Singapore¹, University of Cambridge²

01 Jun 2014

TL;DR: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types as discussed by the authors, where a participating system is expected to detect and correct grammatical errors of all types.

...read moreread less

Abstract: The CoNLL-2014 shared task was devoted to grammatical error correction of all error types. In this paper, we give the task definition, present the data sets, and describe the evaluation metric and scorer used in the shared task. We also give an overview of the various approaches adopted by the participating teams, and present the evaluation results. Compared to the CoNLL2013 shared task, we have introduced the following changes in CoNLL-2014: (1) A participating system is expected to detect and correct grammatical errors of all types, instead of just the five error types in CoNLL-2013; (2) The evaluation metric was changed from F1 to F0.5, to emphasize precision over recall; and (3) We have two human annotators who independently annotated the test essays, compared to just one human annotator in CoNLL-2013.

...read moreread less

484 citations

Proceedings Article•DOI•

Findings of the 2019 Conference on Machine Translation (WMT19)

[...]

Loïc Barrault¹, Ondřej Bojar², Marta R. Costa-jussà³, Christian Federmann⁴, Mark Fishel⁵, Yvette Graham⁶, Barry Haddow⁷, Matthias Huck⁷, Philipp Koehn⁸, Shervin Malmasi⁹, Christof Monz¹⁰, Mathias Müller¹¹, Santanu Pal¹², Matt Post⁸, Marcos Zampieri¹³ - Show less +11 more•Institutions (13)

University of Maine¹, Charles University in Prague², Polytechnic University of Catalonia³, Microsoft⁴, University of Tartu⁵, Dublin City University⁶, University of Edinburgh⁷, Johns Hopkins University⁸, Harvard University⁹, University of Amsterdam¹⁰, University of Zurich¹¹, Saarland University¹², University of Wolverhampton¹³

02 Aug 2019

TL;DR: This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019, asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories.

...read moreread less

Abstract: This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.

...read moreread less

433 citations

Proceedings Article•DOI•

Findings of the 2018 Conference on Machine Translation (WMT18)

[...]

Ondřej Bojar¹, Christian Federmann², Mark Fishel³, Yvette Graham⁴, Barry Haddow, Philipp Koehn, Christof Monz - Show less +3 more•Institutions (4)

Charles University in Prague¹, Microsoft², University of Tartu³, Dublin City University⁴

31 Oct 2018

TL;DR: This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018, asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories.

...read moreread less

Abstract: This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test suites to probe specific aspects of translation.

...read moreread less

390 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse