scispace - formally typeset
Open Access

A Comparison of Rabin Karp and Semantic-Based Plagiarism Detection

Reads0
Chats0
TLDR
The result showed Rabin Karp has better performance than LSA Plagiarism, via Singular Value Decomposition (SVD) asthe semantic-based document plagiarism.
Abstract
Document plagiarism is a challenging task for scholars.Similarity computation of two documents is the main step ofdocument plagiarism. The accuracy of Rabin Karp andsemantic-based document plagiarism is measured forcomparison. This paper employed Latent Semantic Analysis(LSA) approach via Singular Value Decomposition (SVD) asthe semantic-based document plagiarism. The result showedRabin Karp has better performance than LSA Plagiarism. Proc. of 3rd International Conference on Soft Computing, Intelligent System and Information Technology (ICSIIT 2012), Bali, Indonesia, 2012 http://www.academia.edu/5388466/A_Comparison_of_Rabin_Karp_and_Semantic-Based_Plagiarism_Detection

read more

Citations
More filters
Proceedings ArticleDOI

Non-relevant document reduction in anti-plagiarism using asymmetric similarity and AVL tree index

TL;DR: A method to reduce non-relevant documents (those having no similar topic with query document) by using asymmetric similarity by using AVL Tree algorithm to fasten document comparing process.
Proceedings ArticleDOI

An Effective Compressive Sensing based N-gram Approach for plagiarism detection

TL;DR: An advance novel approach which is compressive sensing based on N-gram (CS-RKP) is proposed, which used sampling module for data processing and further cost function for document redundancy detection, minimization of iteration and further finding similarity over the document.
Journal ArticleDOI

Aplikasi Pendeteksi Tingkat Kesamaan Dokumen Teks: Algoritma Rabin Karp Vs. Winnowing

TL;DR: An application that can detect the index of similarity of text documents is built by first comparing the level of reliability of the two text similarity algorithms, i.e., Rabin-Karp and Winnowing.

Indonesian Journal of Electrical Engineering and Computer Science

TL;DR: This research aims to determine the effect of the number of K-grams on the performance of Rabin Karp in text matching, and finds the K- Gram 3 is the best among K- grams 0 to 8.
Journal ArticleDOI

Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms

TL;DR: A word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time.
References
More filters
Journal ArticleDOI

A Survey of Text Summarization Extractive Techniques

TL;DR: A Survey of Text Summarization Extractive techniques has been presented and it is shown that extracting important sentences, paragraphs etc. from the source text and concatenating them into shorter form conveys the most important information from the original text document.

Intrinsic Plagiarism Detection Using Character n-gram Profiles

TL;DR: A new method is presented that attempts to quantify the style variation within a document using character n-gram profiles and a style change function based on an appropriate dissimilarity measure originally proposed for author identification.
Journal ArticleDOI

Developing a corpus of plagiarised short answers

TL;DR: The initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated are described, designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarisms detection systems.

External and Intrinsic Plagiarism Detection Using Vector Space Models

TL;DR: This work presents a conceptually simple space partitioning approach to achieve search times sub linear in the number of ref- erence documents, trading precision for speed.
Proceedings ArticleDOI

Singular Value Decomposition for dimensionality reduction in unsupervised text learning problems

TL;DR: The results show that the quality of the clusters is very comparable to that of when the dimensions are not reduced, and the computational cost to cluster documents can be reduced significantly when the clustering is done on a small dimension.