Topic
Plagiarism detection
About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments.
Abstract: Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools fo...
9 citations
••
29 Oct 2014TL;DR: An ensemble method was developed based on the Bayes optimal classifier for authorship attribution of source code that successfully attributed 98.2% of all documents in the data set, compared to 88.9% by the Burrows baseline method and 91.0% by SCAP.
Abstract: Authorship attribution of source code is the task of deciding who wrote software, given its source code, when the author of the software is not explicitly known. There are numerous scenarios in which it is necessary to identify the author of a piece of software whose author is unknown, including software forensics investigations, plagiarism detection, and questions of software ownership. A number of methods for authorship attribution of source code have been presented in the past, including two state-of-the-art methods: SCAP and Burrows. Each of these two state-of-the-art methods was individually improved, and – as presented in this paper – an ensemble method was developed from them based on the Bayes optimal classifier. An empirical study was performed using a data set consisting of 7,231 open-source and textbook programs written in C++ and Java by thirty unique authors. The ensemble method successfully attributed 98.2% of all documents in the data set, compared to 88.9% by the Burrows baseline method and 91.0% by the SCAP baseline method.
9 citations
••
TL;DR: A mechanism to generate hints for investigating source code plagiarism and identifying the culprits on in-class individual programming assessment is proposed and can be helpful for indicating the culprit's codes have at least one of the authors' predefined conditions for the copying behaviour.
Abstract: Most source code plagiarism detection tools only rely on source code similarity to indicate plagiarism. This can be an issue since not all source code pairs with high similarity are plagiarism. Moreover, the culprits (i.e., the ones who plagiarise) cannot be differentiated from the victims even though they need to be educated further on different ways. This paper proposes a mechanism to generate hints for investigating source code plagiarism and identifying the culprits on in-class individual programming assessment. The hints are collected from the culprits’ copying behaviour during the assessment. According to our evaluation, the hints from source code creation process and seating position are 76.88% and at least 80.87% accurate for indicating plagiarism. Further, the hints from source code creation process can be helpful for indicating the culprits as the culprits’ codes have at least one of our predefined conditions for the copying behaviour.
9 citations
••
01 Nov 2015TL;DR: A method of searching for a plagiarized image in a database based on the technique of F- transform, particularly Fs-transform, which significantly reduces the domain dimension and therefore, is speeds-up the whole process.
Abstract: The goal of this paper is to introduce a task of image plagiarism detection. More specifically, we propose a method of searching for a plagiarized image in a database. The main requirements for searching in the database are computational speed and success rate. The proposed method is based on the technique of F-transform, particularly Fs-transform, s ≥ 0. This technique significantly reduces the domain dimension and therefore, is speeds-up the whole process. we present several experiments and measurements which prove the speed and accuracy of our method. We also propose examples to demonstrate an ability of using this method in many applications.
9 citations
••
TL;DR: It is found that a systematic combination of different heuristics greatly improves the performance of the document retrieval system.
Abstract: This article describes an ongoing research which intends to develop a plagiarism detection system for Arabic documents. We developed different heuristics to generate effective queries for document retrieval from the Web. The performance of those heuristics was empirically evaluated against a sizeable corpus in terms of precision, recall and f- measure. We found that a systematic combination of different heuristics greatly improves the performance of the document retrieval system.
9 citations