Open AccessProceedings Article
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign
Marcos Zampieri,Shervin Malmasi,Preslav Nakov,Ahmed Ali,Suwon Shon,James Glass,Yves Scherrer,Tanja Samardžić,Nikola Ljubešić,Nikola Ljubešić,Jörg Tiedemann,Chris van der Lee,Stefan Grondelaers,Nelleke Oostdijk,Dirk Speelman,Antal van den Bosch,Ritesh Kumar,Bornini Lahiri,Mayank Jain +18 more
- pp 1-17
TLDR
The results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects and Indo-Aryan Language Identification are presented.Abstract:
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.read more
Citations
More filters
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
TL;DR: The second Nuanced Arabic Dialect Identification Shared Task (NADI 2021) as discussed by the authors was the first shared task to include four subtasks: country-level ModernStandard Arabic (MSA) identification (Subtask 1.1), countrylevel dialect identification, province level MSA identification, and province-level sub-dialect identifica-tion (SubTask 2.2).
Posted Content
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
TL;DR: The results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI), the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level, are presented.
Proceedings Article
A Report on the VarDial Evaluation Campaign 2020
Mihaela Gaman,Dirk Hovy,Radu Tudor Ionescu,Heidi Jauhiainen,Tommi Jauhiainen,Krister Lindén,Nikola Ljubešić,Niko Partanen,Christoph Purschke,Yves Scherrer,Marcos Zampieri +10 more
TL;DR: The VarDial Evaluation Campaign 2020 included three shared tasks each focusing on a different challenge of language and dialect identification: Romanian Dialect Identification (RDI), Social Media Variety Geolocation (SMG), and Uralic Language Identification (ULI).
Proceedings Article
Character Level Convolutional Neural Network for Arabic Dialect Identification.
TL;DR: This submission is for the description paper for the system in the ADI shared task, where the system’s architecture and user interfaces are described in detail.
Proceedings ArticleDOI
Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media.
TL;DR: The evaluation showed that even though the capsule networks have not been used commonly in natural language processing tasks, they can outperform existing state of the art solutions for offensive language detection in social media.
References
More filters
Proceedings Article
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task
TL;DR: High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well.
Proceedings ArticleDOI
Findings of the VarDial Evaluation Campaign 2017
Marcos Zampieri,Shervin Malmasi,Nikola Ljubešić,Preslav Nakov,Ahmed Ali,Jörg Tiedemann,Yves Scherrer,Noëmi Aepli +7 more
TL;DR: The VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which was organized as part of the fourth edition of the VarDial workshop at EACL’2017, is presented.
Proceedings Article
Phonotactic language identification using high quality phoneme recognition.
TL;DR: Four PRLM systems have Equal Error Rate (EER) of 2.4% on 12 languages task, which compares favorably to the best known result from this task.
Journal ArticleDOI
Automatic Language Identification in Texts: A Survey
TL;DR: A unified notation is introduced for evaluation methods, applications, as well as off-the-shelf LI systems that do not require training by the end user, to propose future directions for research in LI.
Proceedings ArticleDOI
A Report on the DSL Shared Task 2014
TL;DR: This paper summarizes the methods, results and findings of the Discriminating between Similar Languages (DSL) shared task 2014, where the best system obtained 95.7% average accuracy.