Open AccessProceedings Article
Universal Dependency Annotation for Multilingual Parsing
Ryan McDonald,Joakim Nivre,Yvonne Quirmbach-Brundage,Yoav Goldberg,Dipanjan Das,Kuzman Ganchev,Keith Hall,Slav Petrov,Hao Zhang,Oscar Täckström,Claudia Bedini,Núria Bertomeu Castelló,Jungmee Lee +12 more
- Vol. 2, pp 92-97
TLDR
A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing.Abstract:
We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing. 1read more
Citations
More filters
Proceedings Article
Universal Dependencies v1: A Multilingual Treebank Collection
Joakim Nivre,Marie-Catherine de Marneffe,Filip Ginter,Yoav Goldberg,Jan Hajič,Christopher D. Manning,Ryan McDonald,Slav Petrov,Sampo Pyysalo,Natalia Silveira,Reut Tsarfaty,Daniel Zeman +11 more
TL;DR: This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.
Book
Neural Network Methods in Natural Language Processing
Yoav Goldberg,Graeme Hirst +1 more
TL;DR: Neural networks are a family of powerful machine learning models as mentioned in this paper, and they have been widely used in natural language processing applications such as machine translation, syntactic parsing, and multi-task learning.
Proceedings ArticleDOI
CamemBERT: a Tasty French Language Model
Louis Martin,Benjamin Muller,Pedro Javier Ortiz Suárez,Yoann Dupont,Laurent Romary,Éric Villemonte de la Clergerie,Djamé Seddah,Benoît Sagot +7 more
TL;DR: This paper investigates the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating their language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks.
Proceedings Article
Universal Stanford dependencies: A cross-linguistic typology
Marie-Catherine de Marneffe,Timothy Dozat,Natalia Silveira,Katri Haverinen,Filip Ginter,Joakim Nivre,Christopher D. Manning +6 more
TL;DR: This work proposes a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added, and a lexicalist stance of the Stanford Dependencies, which leads to a particular, partially new treatment of compounding, prepositions, and morphology.
Proceedings ArticleDOI
Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser
TL;DR: This work proposes a learning method that needs less data, based on the observation that there are underlying shared structures across languages, and exploits cues from a different source language in order to guide the learning process.
References
More filters
Proceedings ArticleDOI
OntoNotes: The 90% Solution
TL;DR: It is described the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement, which will be made available to the community during 2007.
Posted Content
A Universal Part-of-Speech Tagset
TL;DR: This paper proposed a tagset that consists of twelve universal part-of-speech categories and developed a mapping from 25 different treebank tagsets to this universal set, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts of speech for 22 different languages.
Proceedings Article
A Universal Part-of-Speech Tagset
TL;DR: This work proposes a tagset that consists of twelve universal part-of-speech categories and develops a mapping from 25 different treebank tagsets to this universal set, which when combined with the original treebank data produces a dataset consisting of common parts- of-speech for 22 different languages.
Proceedings Article
The CoNLL 2007 Shared Task on Dependency Parsing
Joakim Nivre,Johan Hall,Sandra K"ubler,Ryan McDonald,Jens Nilsson,Sebastian Riedel,Deniz Yuret +6 more
TL;DR: The tasks of the different tracks are defined and how the data sets were created from existing treebanks for ten languages are described, to characterize the different approaches of the participating systems and report the test results and provide a first analysis of these results.
Proceedings ArticleDOI
Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
Dan Klein,Christopher D. Manning +1 more
TL;DR: This work presents a generative model for the unsupervised learning of dependency structures and describes the multiplicative combination of this dependency model with a model of linear constituency that works and is robust cross-linguistically.