Findings of the VarDial Evaluation Campaign 2017
Marcos Zampieri,Shervin Malmasi,Nikola Ljubešić,Preslav Nakov,Ahmed Ali,Jörg Tiedemann,Yves Scherrer,Noëmi Aepli +7 more
- pp 1-15
TLDR
The VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which was organized as part of the fourth edition of the VarDial workshop at EACL’2017, is presented.Abstract:
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL’2017 This year, we included four shared tasks: Discriminating between Similar Languages (DSL), Arabic Dialect Identification (ADI), German Dialect Identification (GDI), and Cross-lingual Dependency Parsing (CLP) A total of 19 teams submitted runs across the four tasks, and 15 of them wrote system description papersread more
Citations
More filters
Proceedings ArticleDOI
Speech recognition challenge in the wild: Arabic MGB-3
TL;DR: The Arabic MGB-Challenge comprised two tasks: speech transcription and Arabic dialect identification, introduced this year in order to distinguish between four major Arabic dialects — Egyptian, Levantine, North African, Gulf, as well as Modern Standard Arabic.
Proceedings ArticleDOI
A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages
TL;DR: This work systematically compares a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration.
Proceedings Article
You Tweet What You Speak: A City-Level Dataset of Arabic Dialects
TL;DR: This work presents a considerably large dataset of > 1/4 billion tweets representing a wide range of dialects of Arabic, more nuanced than previously reported work in that it is labeled at the fine-grained level of city.
Posted Content
Speech Recognition Challenge in the Wild: Arabic MGB-3
TL;DR: The Arabic MGB-3 Challenge as mentioned in this paper focused on dialectal Arabic using a multi-genre collection of Egyptian YouTube videos, including comedy, cooking, family/kids, fashion, drama, sports, and science.
Proceedings ArticleDOI
Learning to Identify Arabic and German Dialects using Multiple Kernels.
TL;DR: The proposed approach combines several kernels using multiple kernel learning, most of which are based on character p-grams extracted from speech transcripts, but also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided only for the Arabic data.
References
More filters
Proceedings Article
Okapi at TREC
TL;DR: Much of the work involved investigating plausible methods of applying Okapi-style weighting to phrases, and expansion using terms from the top documents retrieved by a pilot search on topic terms was used.
Proceedings Article
Parallel Data, Tools and Interfaces in OPUS
TL;DR: New data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the OPUS project are reported.
Proceedings Article
Universal Dependencies v1: A Multilingual Treebank Collection
Joakim Nivre,Marie-Catherine de Marneffe,Filip Ginter,Yoav Goldberg,Jan Hajič,Christopher D. Manning,Ryan McDonald,Slav Petrov,Sampo Pyysalo,Natalia Silveira,Reut Tsarfaty,Daniel Zeman +11 more
TL;DR: This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.
Proceedings Article
Universal Dependency Annotation for Multilingual Parsing
Ryan McDonald,Joakim Nivre,Yvonne Quirmbach-Brundage,Yoav Goldberg,Dipanjan Das,Kuzman Ganchev,Keith Hall,Slav Petrov,Hao Zhang,Oscar Täckström,Claudia Bedini,Núria Bertomeu Castelló,Jungmee Lee +12 more
TL;DR: A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing.
Journal ArticleDOI
Bootstrapping parsers via syntactic projection across parallel texts
TL;DR: Using parallel text to help solving the problem of creating syntactic annotation in more languages by annotating the English side of a parallel corpus, project the analysis to the second language, and train a stochastic analyzer on the resulting noisy annotations.