Universal Dependency Annotation for Multilingual Parsing

Open AccessProceedings Article

Universal Dependency Annotation for Multilingual Parsing

- Vol. 2, pp 92-97

TLDR

A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing.

Abstract:

We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing. 1

Citations

PDF

Open Access

More filters

Proceedings Article

Universal Dependencies v1: A Multilingual Treebank Collection

Joakim Nivre, +11 more

TL;DR: This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.

...read moreread less

Book

Neural Network Methods in Natural Language Processing

Yoav Goldberg, +1 more

TL;DR: Neural networks are a family of powerful machine learning models as mentioned in this paper, and they have been widely used in natural language processing applications such as machine translation, syntactic parsing, and multi-task learning.

...read moreread less

Proceedings ArticleDOI

CamemBERT: a Tasty French Language Model

Louis Martin, +7 more

- 10 Nov 2019 -

arXiv: Computation and Language

TL;DR: This paper investigates the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating their language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks.

...read moreread less

Proceedings Article

Universal Stanford dependencies: A cross-linguistic typology

Marie-Catherine de Marneffe, +6 more

TL;DR: This work proposes a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added, and a lexicalist stance of the Stanford Dependencies, which leads to a particular, partially new treatment of compounding, prepositions, and morphology.

...read moreread less

Proceedings ArticleDOI

Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser

Long Duong, +3 more

TL;DR: This work proposes a learning method that needs less data, based on the observation that there are underlying shared structures across languages, and exploits cues from a different source language in order to guide the learning process.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

OntoNotes: The 90% Solution

Eduard Hovy, +4 more

TL;DR: It is described the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement, which will be made available to the community during 2007.

...read moreread less

Posted Content

A Universal Part-of-Speech Tagset

Slav Petrov, +2 more

- 11 Apr 2011 -

arXiv: Computation and Language

TL;DR: This paper proposed a tagset that consists of twelve universal part-of-speech categories and developed a mapping from 25 different treebank tagsets to this universal set, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts of speech for 22 different languages.

...read moreread less

Proceedings Article

A Universal Part-of-Speech Tagset

Slav Petrov, +2 more

TL;DR: This work proposes a tagset that consists of twelve universal part-of-speech categories and develops a mapping from 25 different treebank tagsets to this universal set, which when combined with the original treebank data produces a dataset consisting of common parts- of-speech for 22 different languages.

...read moreread less

Proceedings Article

The CoNLL 2007 Shared Task on Dependency Parsing

Joakim Nivre, +6 more

TL;DR: The tasks of the different tracks are defined and how the data sets were created from existing treebanks for ten languages are described, to characterize the different approaches of the participating systems and report the test results and provide a first analysis of these results.

...read moreread less

Proceedings ArticleDOI

Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency

Dan Klein, +1 more

TL;DR: This work presents a generative model for the unsupervised learning of dependency structures and describes the multiplicative combination of this dependency model with a model of linear constituency that works and is robust cross-linguistically.

...read moreread less

Collapse

Related Papers (5)

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

Universal Dependency Annotation for Multilingual Parsing

Citations

Universal Dependencies v1: A Multilingual Treebank Collection

Neural Network Methods in Natural Language Processing

CamemBERT: a Tasty French Language Model

Universal Stanford dependencies: A cross-linguistic typology

Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser

References

OntoNotes: The 90% Solution

A Universal Part-of-Speech Tagset

A Universal Part-of-Speech Tagset

The CoNLL 2007 Shared Task on Dependency Parsing

Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency

Related Papers (5)

Building a large annotated corpus of English: the penn treebank

CoNLL-X Shared Task on Multilingual Dependency Parsing

Universal Dependencies v1: A Multilingual Treebank Collection

The Stanford Typed Dependencies Representation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding