Topic
Plagiarism detection
About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.
Papers published on a yearly basis
Papers
More filters
01 May 2003
TL;DR: The tests made on chunking methods used for plagiarism detection makes it possible to decide on the best fitting chunking method for a given application.
Abstract: This paper describes the tests made on chunking methods used for
plagiarism detection. The result of the tests makes it possible to
decide on the best fitting chunking method for a given
application. For example, overlapping word chunking is good for
a grammar analyzer or for small databases, sentence chunking
suits best for finding quoted texts, hashed breakpoint chunking is
the fastest method therefore advisable for search in big set of
documents, or if more reliability is needed overlapping hashed
breakpoint chunking can be used as well.
9 citations
••
TL;DR: The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing, and designed an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching.
Abstract: Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.
9 citations
••
01 Oct 2019
TL;DR: A tool, SPPlagiarise, is presented, which is designed to produce simulated source code plagiarism of Java source code, and an evaluation of a generated plagiarism data set is presented.
Abstract: Source code plagiarism is a common occurrence in undergraduate computer science education. Studies have indicated at least 50% of students plagiarize during their undergraduate career. To identity cases of source code plagiarism, many source code plagiarism detection tools have been proposed. However, conclusively determining the effectiveness these tools at identifying cases of source code plagiarism is difficult. Evaluations are typically performed using unreleased data sets. Without a comprehensive publicly available data set for source code plagiarism detection evaluation, it is difficult to perform an unbiased and reproducible evaluations of tools. To address this problem, this paper presents a tool, SPPlagiarise, which is designed to produce simulated source code plagiarism of Java source code. SPPlagiarise applies a random number of semantics-preserving source code obfuscations at random locations to a Java code base to simulate source code plagiarism. In this paper the design of the tool and an evaluation of a generated plagiarism data set is presented.
9 citations
••
19 Sep 2016TL;DR: In the paper various combination of feature point detectors and descriptors are investigated as potential tool for finding similar images in document as well as how the algorithms computing the image similarity may extend the functionality of plagiarism detection systems.
Abstract: The paper presents results of research oriented towards an application of image processing methods into document comparisons in view of their application into plagiarism-detection systems. Among all image processing methods, the feature-point ones, thanks to their invariance to various image transforms, are best suited for computing image similarity. In the paper various combination of feature point detectors and descriptors are investigated as potential tool for finding similar images in document. The methods are tested on the database consisting of scientific papers containing 5 well known image processing test images. Also, an idea is presented in the paper how the algorithms computing the image similarity may extend the functionality of plagiarism detection systems.
9 citations
••
24 Jul 2016TL;DR: The extent and practicality of plagiarism detection systems using multiple classifications of detection engines are studied using 8 individual articles from different fields of work to determine the effectiveness and extent of each detection engine.
Abstract: This article studies the extent and practicality of plagiarism detection systems using multiple classifications of detection engines, further described within the article. An in-depth analysis of 8 individual articles from different fields of work was carried out allowing comparisons both between detection systems and different writing styles/formats. The first analysis used unmodified versions of the 8 selected papers as a control and base for the performance of the detection engines, before a second analysis was conducted. This analysis used modified versions of the selected papers by formatting the plagiarized sentences detected in the first test. This formatting involved simple shuffling and manipulation of the text to determine the effectiveness and extent of each detection engine.
9 citations