Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

Open AccessProceedings Article

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

- pp 1-17

TLDR

The results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects and Indo-Aryan Language Identification are presented.

Abstract:

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.

Citations

PDF

Open Access

More filters

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, +3 more

TL;DR: The second Nuanced Arabic Dialect Identification Shared Task (NADI 2021) as discussed by the authors was the first shared task to include four subtasks: country-level ModernStandard Arabic (MSA) identification (Subtask 1.1), countrylevel dialect identification, province level MSA identification, and province-level sub-dialect identifica-tion (SubTask 2.2).

...read moreread less

Posted Content

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, +3 more

- 21 Oct 2020 -

arXiv: Computation and Language

TL;DR: The results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI), the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level, are presented.

...read moreread less

Proceedings Article

A Report on the VarDial Evaluation Campaign 2020

Mihaela Gaman, +10 more

TL;DR: The VarDial Evaluation Campaign 2020 included three shared tasks each focusing on a different challenge of language and dialect identification: Romanian Dialect Identification (RDI), Social Media Variety Geolocation (SMG), and Uralic Language Identification (ULI).

...read moreread less

Proceedings Article

Character Level Convolutional Neural Network for Arabic Dialect Identification.

Mohamed Ali

TL;DR: This submission is for the description paper for the system in the ADI shared task, where the system’s architecture and user interfaces are described in detail.

...read moreread less

Proceedings ArticleDOI

Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media.

Hansi Hettiarachchi, +1 more

TL;DR: The evaluation showed that even though the capsule networks have not been used commonly in natural language processing tasks, they can outperform existing state of the art solutions for offensive language detection in social media.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

Shervin Malmasi, +5 more

TL;DR: High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well.

...read moreread less

Proceedings ArticleDOI

Findings of the VarDial Evaluation Campaign 2017

Marcos Zampieri, +7 more

TL;DR: The VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which was organized as part of the fourth edition of the VarDial workshop at EACL’2017, is presented.

...read moreread less

Proceedings Article

Phonotactic language identification using high quality phoneme recognition.

Pavel Matejka, +3 more

TL;DR: Four PRLM systems have Equal Error Rate (EER) of 2.4% on 12 languages task, which compares favorably to the best known result from this task.

...read moreread less

Journal ArticleDOI

Automatic Language Identification in Texts: A Survey

Tommi Jauhiainen, +4 more

- 25 Aug 2019 -

Journal of Artificial Intelligence Resea...

TL;DR: A unified notation is introduced for evaluation methods, applications, as well as off-the-shelf LI systems that do not require training by the end user, to propose future directions for research in LI.

...read moreread less

Proceedings ArticleDOI

A Report on the DSL Shared Task 2014

Marcos Zampieri, +3 more

TL;DR: This paper summarizes the methods, results and findings of the Discriminating between Similar Languages (DSL) shared task 2014, where the best system obtained 95.7% average accuracy.

...read moreread less

Collapse

Journal of Artificial Intelligence Resea...

A Report on the DSL Shared Task 2014

Marcos Zampieri, +3 more

Overview of the DSL Shared Task 2015

Marcos Zampieri, +3 more

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

Citations

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

A Report on the VarDial Evaluation Campaign 2020

Character Level Convolutional Neural Network for Arabic Dialect Identification.

Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media.

References

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

Findings of the VarDial Evaluation Campaign 2017

Phonotactic language identification using high quality phoneme recognition.

Automatic Language Identification in Texts: A Survey

A Report on the DSL Shared Task 2014

Related Papers (5)

Findings of the VarDial Evaluation Campaign 2017

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

Automatic Language Identification in Texts: A Survey

A Report on the DSL Shared Task 2014

Overview of the DSL Shared Task 2015