Open AccessProceedings Article
Fine-Grained Arabic Dialect Identification
Mohammad Salameh,Houda Bouamor,Nizar Habash +2 more
- pp 1332-1344
TLDR
This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic, and builds several classification systems and explores a large space of features.Abstract:
Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification). This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic – a very challenging task. We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9% for sentences with an average length of 7 words (a 9% relative error reduction over the state-of-the-art technique for Arabic dialect identification) and reach more than 90% when we consider 16 words. We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.read more
Citations
More filters
Proceedings Article
CAMeL tools: An open source python toolkit for arabic natural language processing
Ossama Obeid,Nasser Zalmout,Salam Khalifa,Dima Taji,Mai Oudah,Bashar Alhafni,Go Inoue,Fadhl Eryani,Alexander Erdmann,Nizar Habash +9 more
TL;DR: The design of CAMeL Tools is described and the functionalities it provides are described, including utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis.
Proceedings ArticleDOI
The MADAR Shared Task on Arabic Fine-Grained Dialect Identification
TL;DR: This shared task is the first to target a large set of dialect labels at the city and country levels and was organized as part of The Fourth Arabic Natural Language Processing Workshop, collocated with ACL 2019.
Journal ArticleDOI
Arabic natural language processing: An overview
TL;DR: This study presents and classifies the work done on the three varieties of Arabic, by concentrating on both Arabic and Arabizi, and associates each work to its publicly available resources whenever available.
Journal ArticleDOI
Arabic natural language processing: An overview
TL;DR: In this paper, a survey focusing on 90 recent research papers (74% of which were published after 2015) is presented and classifies the work done on the three varieties of Arabic, by concentrating on both Arabic and Arabizi, and associates each work to its publicly available resources whenever available.
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
TL;DR: The second Nuanced Arabic Dialect Identification Shared Task (NADI 2021) as discussed by the authors was the first shared task to include four subtasks: country-level ModernStandard Arabic (MSA) identification (Subtask 1.1), countrylevel dialect identification, province level MSA identification, and province-level sub-dialect identifica-tion (SubTask 2.2).
References
More filters
Book
Introduction to Information Retrieval
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Proceedings Article
KenLM: Faster and Smaller Language Model Queries
TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.
Book
Introduction to Arabic Natural Language Processing
TL;DR: The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing to provide system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language.
Proceedings Article
MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic
Arfath Pasha,Mohamed Al-Badrashiny,Mona Diab,Ahmed El Kholy,Ramy Eskander,Nizar Habash,Manoj Pooleery,Owen Rambow,Ryan M. Roth +8 more
TL;DR: MADAMIRA is a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing with a more streamlined Java implementation that is more robust, portable, extensible, and is faster than its ancestors by more than an order of magnitude.
Proceedings Article
Crowdsourcing Translation: Professional Quality from Non-Professionals
TL;DR: A set of features that model both the translations and the translators, such as country of residence, LM perplexity of the translation, edit rate from the other translations, and (optionally) calibration against professional translators are proposed.