scispace - formally typeset
Search or ask a question
Author

Wael Hassan Gomaa

Other affiliations: Modern Academy In Maadi
Bio: Wael Hassan Gomaa is an academic researcher from Beni-Suef University. The author has contributed to research in topics: Semantic similarity & Document clustering. The author has an hindex of 7, co-authored 11 publications receiving 727 citations. Previous affiliations of Wael Hassan Gomaa include Modern Academy In Maadi.

Papers
More filters
Journal ArticleDOI
TL;DR: Overall, the obtained correlation and error rate results prove that the presented system performs well enough for deployment in a real scoring environment.
Abstract: In this paper, we explore text similarity techniques for the task of automatic short answer scoring in Arabic language. We compare a number of string-based and corpus-based similarity measures, evaluate the effect of combining these measures, handle student’s answers holistically and partially, provide immediate useful feedback to student and also introduce a new benchmark Arabic data set that contains 50 questions and 600 student answers. Overall, the obtained correlation and error rate results prove that the presented system performs well enough for deployment in a real scoring environment. General Terms Natural Language Processing, Text Mining

20 citations

Proceedings ArticleDOI
01 Dec 2018
TL;DR: A classification model based on supervised machine learning techniques is proposed to detect credibility on Twitter using both content-based and source-based features and achieves improvement of 22% when compared to CRF which applies the same approach in terms of F1-measure.
Abstract: Twitter is the most popular micro-blogging medium that allows users to exchange short messages, provides a platform for public people to share the news. Nowadays, Twitter counts with an average of 328 million monthly active users and is growing rapidly. Detecting the credibility of shared information on Twitter becomes a necessity, especially during high impact events. In this paper a classification model based on supervised machine learning techniques is proposed to detect credibility. The proposed model uses an extensive set of features including both content-based and source-based features. The research compares the performance of five different machine learning classifiers using three feature sets: content based, source based and a combination of both sets. The best performance is achieved when using a combined set of features and applying Random Forests as a classifier with accuracy 78.4%, precision 79.6%, recall 91.6% and f1-measure 85.2%. Experiments also revealed that the proposed model achieves improvement of 22% when compared to CRF which applies the same approach in terms of F1-measure. Feature analysis is presented to highlight the importance of the source-based features compared with the content-based features as deciders for credibility.

16 citations

Journal ArticleDOI
TL;DR: This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection and verified results with Microsoft Research Paraphrase Corpus dataset.
Abstract: Paraphrase detection (PD) is a very essential and important task in Natural language processing. The goal of paraphrase detection is to check whether two statements written in natural language have the identical semantic or not. Its importance appears in many fields like plagiarism detection, question answering, document clustering and information retrieval, etc. This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection. This model verified results with Microsoft Research Paraphrase Corpus (MSPR) dataset, shows that accuracy measure is about 76.6% and F-measure is about 83.5%.

7 citations

Proceedings Article
01 Jan 2008
TL;DR: The experimental results proved that the efficiency of document clustering using WSD increases linearly with the size of the documents dataset, and different part of speech taggers were tested to determine the best.
Abstract: In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence . This paper handles text document clustering as one of the major tasks of text processing. Document clustering is the process of finding out groups of information from the text documents and cluster these documents into the most relevant groups. Large document corpus suffers from ambiguity problems like synonyms, polysemous and other semantic relations. For this reason we perform WSD task for all terms in all documents to get the best sense to be used as document features in the clustering process. Our experimental results proved that the efficiency of document clustering using WSD increases linearly with the size of the documents dataset. Different part of speech (POS) taggers were tested to determine the best; also the effect of different window sizes on WSD task was compared.

5 citations

Journal ArticleDOI
TL;DR: This paper explores the impact of applying various deep learning techniques on SMS spam filtering; by comparing the results of seven different deep neural network architectures and six classifiers for classical machine learning.
Abstract: Over the past decade, phone calls and bulk SMS have been fashionable. Although many advertisers assume that SMS has died, it is still alive. It is one of the simplest and most cost-effective marketing tools for companies to communicate on a personal level to their customers. The spread of SMS has led to the risk of spam. Most of the previous studies that attempted to detect spam were based on manually extracted features using classical machine learning classifiers. This paper explores the impact of applying various deep learning techniques on SMS spam filtering; by comparing the results of seven different deep neural network architectures and six classifiers for classical machine learning. Proposed methodologies are based on the automatic extraction of the required features. On a benchmark data set consisting of 5574 records, a fabulous accuracy of 99.26% has been resulted using Random Multimodel Deep Learning (RMDL) architecture.

3 citations


Cited by
More filters
Journal ArticleDOI

164 citations

Journal ArticleDOI
Justin Farrell1
TL;DR: In this article, an application of network science reveals the institutional and corporate structure of the climate change counter-movement in the United States, while computational text analysis shows its influence in the news media and within political circles.
Abstract: An application of network science reveals the institutional and corporate structure of the climate change counter-movement in the United States, while computational text analysis shows its influence in the news media and within political circles.

144 citations

Proceedings ArticleDOI
26 Apr 2016
TL;DR: This research implemented the weighting of Term Frequency - Inverse Document Frequency (TF-IDF) method and Cosine Similarity with the measuring degree concept of similarity terms in a document to rank the document weight that have closesness match level with expert's document.
Abstract: Development of technology in educational field brings the easier ways through the variety of facilitation for learning process, sharing files, giving assignment and assessment. Automated Essay Scoring (AES) is one of the development systems for determining a score automatically from text document source to facilitate the correction and scoring by utilizing applications that run on the computer. AES process is used to help the lecturers to score efficiently and effectively. Besides it can reduce the subjectivity scoring problem. However, implementation of AES depends on many factors and cases, such as language and mechanism of scoring process especially for essay scoring. A number of methods implemented for weighting the terms from document and reaching the solutions for handling comparative level between documents answer and expert's document still defined. In this research, we implemented the weighting of Term Frequency — Inverse Document Frequency (TF-IDF) method and Cosine Similarity with the measuring degree concept of similarity terms in a document. Tests carried out on a number of Indonesian text-based documents that have gone through the stage of pre-processing for data extraction purposes. This process results is in a ranking of the document weight that have closesness match level with expert's document.

137 citations

Journal ArticleDOI
31 Oct 2013
TL;DR: This paper used a similarity metric between student responses, and then used this metric to group responses into clusters and subclusters, which allowed teachers to grade multiple responses with a single action, provide rich feedback to groups of similar answers, and discover modalities of misunderstanding among students.
Abstract: We introduce a new approach to the machine-assisted grading of short answer questions. We follow past work in automated grading by first training a similarity metric between student responses, but then go on to use this metric to group responses into clusters and subclusters. The resulting groupings allow teachers to grade multiple responses with a single action, provide rich feedback to groups of similar answers, and discover modalities of misunderstanding among students; we refer to this amplification of grader effort as “powergrading.” We develop the means to further reduce teacher effort by automatically performing actions when an answer key is available. We show results in terms of grading progress with a small “budget” of human actions, both from our method and an LDA-based approach, on a test corpus of 10 questions answered by 698 respondents.

134 citations