scispace - formally typeset
Open AccessJournal ArticleDOI

A Hybrid Model for Paraphrase Detection Combines pros of Text Similarity with Deep Learning

TLDR
This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection and verified results with Microsoft Research Paraphrase Corpus dataset.
Abstract
Paraphrase detection (PD) is a very essential and important task in Natural language processing. The goal of paraphrase detection is to check whether two statements written in natural language have the identical semantic or not. Its importance appears in many fields like plagiarism detection, question answering, document clustering and information retrieval, etc. This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection. This model verified results with Microsoft Research Paraphrase Corpus (MSPR) dataset, shows that accuracy measure is about 76.6% and F-measure is about 83.5%.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Corpus-Based Paraphrase Detection Experiments and Review

TL;DR: A performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection shows that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.
Journal ArticleDOI

Corpus-Based Paraphrase Detection Experiments and Review

TL;DR: In this paper, the authors give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection, which is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc.
Journal ArticleDOI

Applying BERT for Early-Stage Recognition of Persistence in Chat-Based Social Engineering Attacks

TL;DR: In this paper , a natural language processing model, called CSE-PersistenceBERT, was proposed for paraphrase detection to recognize persistency as a social engineering attacker's behavior during a chat-based dialogue.
Book ChapterDOI

Evaluation of Similarity Measures in a Benchmark for Spanish Paraphrasing Detection

TL;DR: A similarity-based approach towards paraphrase detection in Spanish is presented and a threshold is obtained for each of the similarity metrics with the aim of determining a classification boundary to decide if two sentences are paraphrased.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Skip-thought vectors

TL;DR: This article used the continuity of text from books to train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage, which can produce highly generic sentence representations that are robust and perform well in practice.
Proceedings Article

Corpus-based and knowledge-based measures of text semantic similarity

TL;DR: This paper shows that the semantic similarity method out-performs methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.
Proceedings Article

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

TL;DR: This work introduces a method for paraphrase detection based on recursive autoencoders (RAE) and unsupervised RAEs based on a novel unfolding objective and learns feature vectors for phrases in syntactic trees to measure word- and phrase-wise similarity between two sentences.
Journal ArticleDOI

A Survey of Text Similarity Approaches

TL;DR: This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities, and samples of combination between these similarities are presented.
Related Papers (5)