scispace - formally typeset
Search or ask a question
Author

Samhaa R. El-Beltagy

Bio: Samhaa R. El-Beltagy is an academic researcher from Nile University. The author has contributed to research in topics: Sentiment analysis & Modern Standard Arabic. The author has an hindex of 20, co-authored 80 publications receiving 1802 citations. Previous affiliations of Samhaa R. El-Beltagy include Cairo University & Charles University in Prague.


Papers
More filters
Journal ArticleDOI
TL;DR: The resources used for building the models, the employed data cleaning techniques, the carried out preprocessing step, as well as the details of the employed word embedding creation techniques are described.

380 citations

Journal ArticleDOI
TL;DR: The KP-Miner system is presented, and it is demonstrated through experimentation and comparison with widely used systems that it is effective and efficient in extracting keyphrases from both English and Arabic documents of varied length.

210 citations

Book ChapterDOI
14 Apr 2015
TL;DR: Large multi-domain datasets for Sentiment Analysis in Arabic, scrapped from different reviewing websites and consist of a total of 33K annotated reviews for movies, hotels, restaurants and products are generated.
Abstract: While there has been a recent progress in the area of Arabic Sentiment Analysis, most of the resources in this area are either of limited size, domain specific or not publicly available. In this paper, we address this problem by generating large multi-domain datasets for Sentiment Analysis in Arabic. The datasets were scrapped from different reviewing websites and consist of a total of 33K annotated reviews for movies, hotels, restaurants and products. Moreover we build multi-domain lexicons from the generated datasets. Different experiments have been carried out to validate the usefulness of the datasets and the generated lexicons for the task of sentiment classification. From the experimental results, we highlight some useful insights addressing: the best performing classifiers and feature representation methods, the effect of introducing lexicon based features and factors affecting the accuracy of sentiment classification in general. All the datasets, experiments code and results have been made publicly available for scientific purposes.

164 citations

Proceedings ArticleDOI
17 Mar 2013
TL;DR: The paper presents a case study the goal of which is to investigate the possibility of determining the semantic orientation of Arabic Egyptian tweets and comments given limited Arabic resources and an Egyptian dialect sentiment lexicon.
Abstract: With the rapid increase in the volume of Arabic opinionated posts on different microblogging mediums comes an increasing demand for Arabic sentiment analysis tools. Yet, research in the area of Arabic sentiment analysis is progressing at a very slow pace compared to that being carried out in English and other languages. This paper highlights the major problems and open research issues that face sentiment analysis of Arabic social media. The paper also presents a case study the goal of which is to investigate the possibility of determining the semantic orientation of Arabic Egyptian tweets and comments given limited Arabic resources. One of the outcomes of the presented study, is an Egyptian dialect sentiment lexicon.

134 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: The results and conclusions of the participation in SemEval-2017 task 8: Determining rumour veracity and support for rumours, which involved participation in 2 subtasks, are presented.
Abstract: This paper presents the results and conclusions of our participation in SemEval-2017 task 8: Determining rumour veracity and support for rumours. We have participated in 2 subtasks: SDQC (Subtask A) which deals with tracking how tweets orient to the accuracy of a rumourous story, and Veracity Prediction (Subtask B) which deals with the goal of predicting the veracity of a given rumour. Our participation was in the closed task variant, in which the prediction is made solely from the tweet itself. For subtask A, linear support vector classification was applied to a model of bag of words, and the help of a naïve Bayes classifier was used for semantic feature extraction. For subtask B, a similar approach was used. Many features were used during the experimentation process but only a few proved to be useful with the data set provided. Our system achieved 71% accuracy and ranked 5th among 8 systems for subtask A and achieved 53% accuracy with the lowest RMSE value of 0.672 ranking at the first place among 5 systems for subtask B.

97 citations


Cited by
More filters
Patent
25 Jan 2001
TL;DR: In this paper, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc.
Abstract: Media objects are transformed into active, connected objects via identifiers embedded into them or their containers. In the context of a user's playback experience, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server. The server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc. The linking process applies to broadcast objects as well as objects transmitted over networks in streaming and compressed file formats.

1,026 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: The SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. as mentioned in this paper discusses the fourth year of the Sentiment Analysis in Twitter Task and discusses the three new subtasks focus on two variants of the basic sentiment classification in Twitter task.
Abstract: This paper discusses the fourth year of the ”Sentiment Analysis in Twitter Task”. SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. The first two subtasks are reruns from prior years and ask to predict the overall sentiment, and the sentiment towards a topic in a tweet. The three new subtasks focus on two variants of the basic “sentiment classification in Twitter” task. The first variant adopts a five-point scale, which confers an ordinal character to the classification task. The second variant focuses on the correct estimation of the prevalence of each class of interest, a task which has been called quantification in the supervised learning literature. The task continues to be very popular, attracting a total of 43 teams.

702 citations

Proceedings ArticleDOI
12 Aug 2016
TL;DR: The results of the WMT16 shared tasks are presented, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task.
Abstract: This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.

616 citations