CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

doi:10.18653/V1/K17-3001

Home
/
Papers
/
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Open AccessProceedings ArticleDOI

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Daniel Zeman,Martin Popel,Milan Straka,Jan Hajič,Joakim Nivre,Filip Ginter,Juhani Luotolahti,Sampo Pyysalo,Slav Petrov,Martin Potthast,Francis M. Tyers,Elena Badmaeva,Memduh Gökırmak,Anna Nedoluzhko,Silvie Cinková,Jaroslava Hlaváčová,Václava Kettnerová,Zdenka Uresova,Jenna Kanerva,Stina Ojala,Anna Missilä,Christopher D. Manning,Sebastian Schuster,Siva Reddy,Dima Taji,Nizar Habash,Herman Leung,Marie-Catherine de Marneffe,Manuela Sanguinetti,Maria Simi,Hiroshi Kanayama,Valeria dePaiva,Kira Droganova,Héctor Martínez Alonso,Ça ugrÄ± Çöltekin,Umut Sulubacak,Hans Uszkoreit,Vivien Macketanz,Aljoscha Burchardt,Kim Harris,Katrin Marheinecke,Georg Rehm,Tolga Kayadelen,Mohammed Attia,Ali Elkahky,Zhuoran Yu,Emily Pitler,Saran Lertpradit,Michael Mandl,Jesse Kirchner,Hector Fernandez Alcalde,Jana Strnadová,Esha Banerjee,Ruli Manurung,Antonio Stella,Atsuko Shimada,Sookyoung Kwak,Gustavo Mendonça,Tatiana Lando,Rattima Nitisaroj,Josie Li +60 moreCharles University in Prague,Uppsala University,University of Turku,University of Cambridge,Google,Bauhaus University, Weimar,National Research University – Higher School of Economics,University of the Basque Country,Istanbul Technical University,Stanford University,New York University,University of California, Berkeley,Ohio State University,University of Turin,University of Pisa,IBM,Nuance Communications,Thomson Reuters,University of Tübingen,German Research Centre for Artificial Intelligence

- Vol. 1, Iss: 1, pp 1-19

Show Less

TLDR

The task and evaluation methodology is defined, how the data sets were prepared, report and analyze the main results, and a brief categorization of the different approaches of the participating systems are provided.

Abstract:

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Peng Qi,Yuhao Zhang,Yuhui Zhang,Jason Bolton,Christopher D. Manning +4 moreStanford University,Tsinghua University

Show Less

TL;DR: This work introduces Stanza, an open-source Python natural language processing toolkit supporting 66 human languages that features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.

...read moreread less

Proceedings ArticleDOI

How multilingual is Multilingual BERT

Telmo Pires,Eva Schlinger,Dan Garrette +2 moreGoogle

Show Less

TL;DR: This article showed that M-BERT is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language.

...read moreread less

Proceedings ArticleDOI

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

Milan Straka,Jana Straková +1 more

Show Less

TL;DR: An update to UDPipe 1.0 (Straka et al., 2016), a trainable pipeline which performs sentence segmentation, tokenization, POS tagging, lemmatization and dependency parsing, which provides models for all 50 languages of UD 2.0.

...read moreread less

Posted Content

How multilingual is Multilingual BERT

Telmo Pires,Eva Schlinger,Dan Garrette +2 moreGoogle

- 04 Jun 2019 -

arXiv: Computation and Language

Show Less

TL;DR: It is concluded that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs, and that the model can find translation pairs.

...read moreread less

Proceedings ArticleDOI

Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task

Timothy Dozat,Peng Qi,Christopher D. Manning +2 more

Show Less

TL;DR: This paper describes the neural dependency parser submitted by Stanford to the CoNLL 2017 Shared Task on parsing Universal Dependencies, which was ranked first according to all five relevant metrics for the system.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma,Jimmy Ba +1 moreUniversity of Amsterdam,University of Toronto

Show Less

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter,Jürgen Schmidhuber +1 moreTechnische Universität München,Dalle Molle Institute for Artificial Intelligence Research

- 01 Nov 1997 -

Neural Computation

Show Less

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava,Geoffrey E. Hinton,Alex Krizhevsky,Ilya Sutskever,Ruslan Salakhutdinov +4 moreUniversity of Toronto

- 01 Jan 2014 -

Journal of Machine Learning Research

Show Less

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov,Ilya Sutskever,Kai Chen,Greg S. Corrado,Jeffrey Dean +4 moreGoogle

Show Less

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio +2 moreJacobs University Bremen,Université de Montréal

- 01 Sep 2014 -

arXiv: Computation and Language

Show Less

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

Collapse

Joakim Nivre,Marie-Catherine de Marneffe,Filip Ginter,Yoav Goldberg,Jan Hajič,Christopher D. Manning,Ryan McDonald,Slav Petrov,Sampo Pyysalo,Natalia Silveira,Reut Tsarfaty,Daniel Zeman +11 moreUppsala University,Ohio State University,Information Technology University,Bar-Ilan University,Charles University in Prague,Stanford University,Google,University of Cambridge,Weizmann Institute of Science

Show Less

Adam: A Method for Stochastic Optimization

Diederik P. Kingma,Jimmy Ba +1 moreUniversity of Amsterdam,University of Toronto

Show Less

Long short-term memory

Sepp Hochreiter,Jürgen Schmidhuber +1 moreTechnische Universität München,Dalle Molle Institute for Artificial Intelligence Research

- 01 Nov 1997 -

Neural Computation

Show Less

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Citations

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

How multilingual is Multilingual BERT

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

How multilingual is Multilingual BERT

Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Dropout: a simple way to prevent neural networks from overfitting

Distributed Representations of Words and Phrases and their Compositionality

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Deep contextualized word representations

Universal Dependencies v1: A Multilingual Treebank Collection

Adam: A Method for Stochastic Optimization

Long short-term memory