Home
/
Authors
/
El Moatez Billah Nagoudi

Author

El Moatez Billah Nagoudi

Bio: El Moatez Billah Nagoudi is an academic researcher from University of British Columbia. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 7, co-authored 33 publications receiving 224 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

[...]

El Moatez Billah Nagoudi, Didier Schwab¹•Institutions (1)

University of Grenoble¹

03 Apr 2017

TL;DR: An innovative word embedding-based system devoted to calculate the semantic similarity in Arabic sentences by exploiting vectors as word representations in a multidimensional space in order to capture the semantic and syntactic properties of words.

...read moreread less

Abstract: Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation. This article proposes an innovative word embedding-based system devoted to calculate the semantic similarity in Arabic sentences. The main idea is to exploit vectors as word representations in a multidi-mensional space in order to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. The performance of our proposed system is confirmed through the Pearson correlation between our assigned semantic similarity scores and human judgments.

...read moreread less

55 citations

Proceedings Article•DOI•

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic

[...]

Muhammad Abdul-Mageed¹, AbdelRahim A. Elmadany¹, El Moatez Billah Nagoudi¹•Institutions (1)

University of British Columbia¹

01 Aug 2021

TL;DR: The authors introduced two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, for multi-dialectal Arabic language understanding evaluation, which achieved state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets).

...read moreread less

Abstract: Pre-trained language models (LMs) are currently integral to many natural language processing systems. Although multilingual LMs were also introduced to serve many languages, these have limitations such as being costly at inference time and the size and diversity of non-English data involved in their pre-training. We remedy these issues for a collection of diverse Arabic varieties by introducing two powerful deep bidirectional transformer-based models, ARBERT and MARBERT. To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation. ARLUE is built using 42 datasets targeting six different task clusters, allowing us to offer a series of standardized experiments under rich conditions. When fine-tuned on ARLUE, our models collectively achieve new state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets). Our best model acquires the highest ARLUE score (77.40) across all six task clusters, outperforming all other models including XLM-R Large ( 3.4x larger size). Our models are publicly available at https://github.com/UBC-NLP/marbert and ARLUE will be released through the same repository.

...read moreread less

52 citations

Posted Content•

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic.

[...]

Muhammad Abdul-Mageed¹, AbdelRahim A. Elmadany¹, El Moatez Billah Nagoudi¹•Institutions (1)

University of British Columbia¹

27 Dec 2020-arXiv: Computation and Language

...read moreread less

Abstract: Pre-trained language models (LMs) are currently integral to many natural language processing systems. Although multilingual LMs were also introduced to serve many languages, these have limitations such as being costly at inference time and the size and diversity of non-English data involved in their pre-training. We remedy these issues for a collection of diverse Arabic varieties by introducing two powerful deep bidirectional transformer-based models, ARBERT and MARBERT. To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation. ARLUE is built using 42 datasets targeting six different task clusters, allowing us to offer a series of standardized experiments under rich conditions. When fine-tuned on ARLUE, our models collectively achieve new state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets). Our best model acquires the highest ARLUE score (77.40) across all six task clusters, outperforming all other models including XLM-R Large (~ 3.4 x larger size). Our models are publicly available at this https URL and ARLUE will be released through the same repository.

...read moreread less

35 citations

Journal Article•DOI•

A Flexible Encryption Technique for the Internet of Things Environment

[...]

Saci Medileh, Abdelkader Laouid¹, El Moatez Billah Nagoudi, Reinhardt Euler², Ahcène Bounceur², Mohammad Hammoudeh³, Muath AlShaikh⁴, Amna Eleyan³, Osama Ahmed Khashan⁴ - Show less +5 more•Institutions (4)

University of Béjaïa¹, University of Western Brittany², Manchester Metropolitan University³, Saudi Electronic University⁴

06 Jun 2020

TL;DR: A new scalable encryption technique, called FlexenTech, to protect IoT data during storage and in transit, which offers a low encryption time, defends against common attacks such as replay attacks and defines a configurable mode, where any number of rounds or key sizes may be used.

...read moreread less

Abstract: IoT promises a new era of connectivity that goes beyond laptops and smart connected devices to connected vehicles, smart homes, smart cities and connected healthcare. The huge volume of data that is collected from millions of IoT devices raises information security and privacy concerns for users. This paper presents a new scalable encryption technique, called Flexible encryption Technique (FlexenTech), to protect IoT data during storage and in transit. FlexenTech is suitable for resource constrained devices and networks. It offers a low encryption time, defends against common attacks such as replay attacks and defines a configurable mode, where any number of rounds or key sizes may be used. Experimental analysis of FlexenTech shows its robustness in terms of its multiple configurable confidentiality levels by allowing various configurations. This configurability provides several advantages for resource constrained devices, including reducing the encryption computation time by up to 9.7% when compared to its best rivals in the literature.

...read moreread less

25 citations

Book Chapter•DOI•

Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences

[...]

El Moatez Billah Nagoudi, Jérémy Ferrero¹, Didier Schwab¹, Hadda Cherroun•Institutions (1)

University of Grenoble¹

11 Oct 2017

TL;DR: Two word embedding-based approaches devoted to measuring the semantic similarity between Arabic-English cross-language sentences are proposed and the proposed methods are confirmed through the Pearson correlation between the similarity scores and human ratings.

...read moreread less

Abstract: Semantic Textual Similarity (STS) is an important component in many Natural Language Processing (NLP) applications, and plays an important role in diverse areas such as information retrieval, machine translation, information extraction and plagiarism detection. In this paper we propose two word embedding-based approaches devoted to measuring the semantic similarity between Arabic-English cross-language sentences. The main idea is to exploit Machine Translation (MT) and an improved word embedding representations in order to capture the syntactic and semantic properties of words. MT is used to translate English sentences into Arabic language in order to apply a classical monolingual comparison. Afterwards, two word embedding-based methods are developed to rate the semantic similarity. Additionally, Words Alignment (WA), Inverse Document Frequency (IDF) and Part-of-Speech (POS) weighting are applied on the examined sentences to support the identification of words that are most descriptive in each sentence. The performances of our approaches are evaluated on a cross-language dataset containing more than 2400 Arabic-English pairs of sentence. Moreover, the proposed methods are confirmed through the Pearson correlation between our similarity scores and human ratings.

...read moreread less

19 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

The C programming language

[...]

Brian W. Kernighan¹, Dennis M. Ritchie¹•Institutions (1)

AT&T¹

01 Jan 1978

TL;DR: This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.), and is a "must-have" reference for every serious programmer's digital library.

...read moreread less

Abstract: This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.). One of the best-selling programming books published in the last fifty years, "K&R" has been called everything from the "bible" to "a landmark in computer science" and it has influenced generations of programmers. Available now for all leading ebook platforms, this concise and beautifully written text is a "must-have" reference for every serious programmers digital library. As modestly described by the authors in the Preface to the First Edition, this "is not an introductory programming manual; it assumes some familiarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless, a novice programmer should be able to read along and pick up the language, although access to a more knowledgeable colleague will help."

...read moreread less

2,120 citations

Journal Article•DOI•

Ethnologue: Languages of the World

[...]

Sarah L. Nesbeitt

01 Nov 1999-Electronic Resources Review

1,364 citations

Proceedings Article•DOI•

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

[...]

Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia - Show less +1 more

31 Jul 2017-arXiv: Computation and Language

TL;DR: The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.

...read moreread less

Abstract: Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).

...read moreread less

1,124 citations

Proceedings Article•DOI•

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

[...]

Daniel Cer¹, Mona Diab², Eneko Agirre³, Iñigo Lopez-Gazpio³, Lucia Specia⁴ - Show less +1 more•Institutions (4)

Google¹, George Washington University², University of the Basque Country³, University of Sheffield⁴

01 Jan 2017

TL;DR: The Semantic Textual Similarity (STS) shared task as discussed by the authors was the first task for assessing the state-of-the-art machine translation systems. But the task focused on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE).

...read moreread less

929 citations

Posted Content•

E fficient Identity-Based Encryption over NTRU Lattices.

[...]

Léo Ducas, Vadim Lyubashevsky, Thomas Prest

01 Jan 2014-IACR Cryptology ePrint Archive

TL;DR: This work presents the first lattice-based IBE scheme with practical parameters and obtains digital signature schemes which are shorter than the previously most-compact ones of Ducas, Durmus, Lepoint, and Lyubashevsky from Crypto 2013.

...read moreread less

Abstract: Efficient implementations of lattice-based cryptographic schemes have been limited to only the most basic primitives like encryption and digital signatures. The main reason for this limitation is that at the core of many advanced lattice primitives is a trapdoor sampling algorithm (Gentry, Peikert, Vaikuntanathan, STOC 2008) that produced outputs that were too long for practical applications. In this work, we show that using a particular distribution over NTRU lattices can make GPV-based schemes suitable for practice. More concretely, we present the first lattice-based IBE scheme with practical parameters – key and ciphertext sizes are between two and four kilobytes, and all encryption and decryption operations take approximately one millisecond on a moderately-powered laptop. As a by-product, we also obtain digital signature schemes which are shorter than the previously most-compact ones of Ducas, Durmus, Lepoint, and Lyubashevsky from Crypto 2013.

...read moreread less

153 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

Collapse