scispace - formally typeset
Journal ArticleDOI

SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

TLDR
A novel plagiarism-detection method, called SimPaD, which establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors and generates a graphical view of sentences that are similar (or the same) in D2.
Abstract
Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.

read more

Citations
More filters
Journal ArticleDOI

State-of-the-art in detecting academic plagiarism

TL;DR: In the future, plagiarism detection systems may benefit from combining traditional character-based detection methods with these emerging detection approaches, including intrinsic, cross-lingual and citation-based plagiarism Detection.
Journal ArticleDOI

Study on extrinsic text plagiarism detection techniques and tools

TL;DR: The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art, and an overview of some of the available detection software tools, their features and detection efficiency is discussed.
Dissertation

A study on plagiarism detection and plagiarism direction identification using natural language processing techniques

TL;DR: Man Yan Miranda Chong A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy in 2013.
Proceedings ArticleDOI

Analyzing Semantic Concept Patterns to Detect Academic Plagiarism

TL;DR: This work presents Semantic Concept Pattern Analysis - an approach that performs an integrated analysis of semantic text relatedness and structural text similarity and demonstrates that this approach can detect plagiarism that established text matching approaches would not identify.
Proceedings ArticleDOI

Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space

TL;DR: It is shown that a hybrid approach that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms allows semantic plagiarism detection to become feasible even on large collections for the first time.
References
More filters
Journal ArticleDOI

Induction of Decision Trees

J. R. Quinlan
- 25 Mar 1986 - 
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Journal ArticleDOI

An algorithm for suffix stripping

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Related Papers (5)