scispace - formally typeset
Search or ask a question
Author

Hawaf Abdalhakim

Bio: Hawaf Abdalhakim is an academic researcher. The author has contributed to research in topics: Paraphrase & Similarity (network science). The author has an hindex of 1, co-authored 1 publications receiving 5 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection and verified results with Microsoft Research Paraphrase Corpus dataset.
Abstract: Paraphrase detection (PD) is a very essential and important task in Natural language processing. The goal of paraphrase detection is to check whether two statements written in natural language have the identical semantic or not. Its importance appears in many fields like plagiarism detection, question answering, document clustering and information retrieval, etc. This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection. This model verified results with Microsoft Research Paraphrase Corpus (MSPR) dataset, shows that accuracy measure is about 76.6% and F-measure is about 83.5%.

7 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection shows that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.
Abstract: Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

14 citations

Journal ArticleDOI
TL;DR: In this paper, the authors give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection, which is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc.
Abstract: Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection-where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

8 citations

Journal ArticleDOI
TL;DR: In this paper , a natural language processing model, called CSE-PersistenceBERT, was proposed for paraphrase detection to recognize persistency as a social engineering attacker's behavior during a chat-based dialogue.
Abstract: Chat-based social engineering (CSE) attacks are attracting increasing attention in the Small-Medium Enterprise (SME) environment, given the ease and potential impact of such an attack. During a CSE attack, malicious users will repeatedly use linguistic tricks to eventually deceive their victims. Thus, to protect SME users, it would be beneficial to have a cyber-defense mechanism able to detect persistent interlocutors who repeatedly bring up critical topics that could lead to sensitive data exposure. We build a natural language processing model, called CSE-PersistenceBERT, for paraphrase detection to recognize persistency as a social engineering attacker’s behavior during a chat-based dialogue. The CSE-PersistenceBERT model consists of a pre-trained BERT model fine-tuned using our handcrafted CSE-Persistence corpus; a corpus appropriately annotated for the specific downstream task of paraphrase recognition. The model identifies the linguistic relationship between the sentences uttered during the dialogue and exposes the malicious intent of the attacker. The results are satisfactory and prove the efficiency of CSE-PersistenceBERT as a recognition mechanism of a social engineer’s persistent behavior during a CSE attack.

1 citations

Book ChapterDOI
12 Oct 2020
TL;DR: A similarity-based approach towards paraphrase detection in Spanish is presented and a threshold is obtained for each of the similarity metrics with the aim of determining a classification boundary to decide if two sentences are paraphrased.
Abstract: In this paper, we present a similarity-based approach towards paraphrase detection in Spanish. We evaluate various models for semantic similarity computation using a gold-standard paraphrase corpus. It contains one original document and paraphrased documents on different levels (low and high), and reference documents on the same topic or same vocabulary. It allows to assess the similarity between a pair of texts or individual sentences. We found that some of the similarity metrics have a larger difference when comparing paraphrased sentences than others. Finally, we obtained a threshold for each of the similarity metrics with the aim of determining a classification boundary to decide if two sentences are paraphrased.

1 citations