scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments.
Abstract: Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools fo...

9 citations

Book ChapterDOI
29 Oct 2014
TL;DR: An ensemble method was developed based on the Bayes optimal classifier for authorship attribution of source code that successfully attributed 98.2% of all documents in the data set, compared to 88.9% by the Burrows baseline method and 91.0% by SCAP.
Abstract: Authorship attribution of source code is the task of deciding who wrote software, given its source code, when the author of the software is not explicitly known. There are numerous scenarios in which it is necessary to identify the author of a piece of software whose author is unknown, including software forensics investigations, plagiarism detection, and questions of software ownership. A number of methods for authorship attribution of source code have been presented in the past, including two state-of-the-art methods: SCAP and Burrows. Each of these two state-of-the-art methods was individually improved, and – as presented in this paper – an ensemble method was developed from them based on the Bayes optimal classifier. An empirical study was performed using a data set consisting of 7,231 open-source and textbook programs written in C++ and Java by thirty unique authors. The ensemble method successfully attributed 98.2% of all documents in the data set, compared to 88.9% by the Burrows baseline method and 91.0% by the SCAP baseline method.

9 citations

Journal ArticleDOI
TL;DR: A mechanism to generate hints for investigating source code plagiarism and identifying the culprits on in-class individual programming assessment is proposed and can be helpful for indicating the culprit's codes have at least one of the authors' predefined conditions for the copying behaviour.
Abstract: Most source code plagiarism detection tools only rely on source code similarity to indicate plagiarism. This can be an issue since not all source code pairs with high similarity are plagiarism. Moreover, the culprits (i.e., the ones who plagiarise) cannot be differentiated from the victims even though they need to be educated further on different ways. This paper proposes a mechanism to generate hints for investigating source code plagiarism and identifying the culprits on in-class individual programming assessment. The hints are collected from the culprits’ copying behaviour during the assessment. According to our evaluation, the hints from source code creation process and seating position are 76.88% and at least 80.87% accurate for indicating plagiarism. Further, the hints from source code creation process can be helpful for indicating the culprits as the culprits’ codes have at least one of our predefined conditions for the copying behaviour.

9 citations

Proceedings ArticleDOI
01 Nov 2015
TL;DR: A method of searching for a plagiarized image in a database based on the technique of F- transform, particularly Fs-transform, which significantly reduces the domain dimension and therefore, is speeds-up the whole process.
Abstract: The goal of this paper is to introduce a task of image plagiarism detection. More specifically, we propose a method of searching for a plagiarized image in a database. The main requirements for searching in the database are computational speed and success rate. The proposed method is based on the technique of F-transform, particularly Fs-transform, s ≥ 0. This technique significantly reduces the domain dimension and therefore, is speeds-up the whole process. we present several experiments and measurements which prove the speed and accuracy of our method. We also propose examples to demonstrate an ability of using this method in many applications.

9 citations

Journal ArticleDOI
TL;DR: It is found that a systematic combination of different heuristics greatly improves the performance of the document retrieval system.
Abstract: This article describes an ongoing research which intends to develop a plagiarism detection system for Arabic documents. We developed different heuristics to generate effective queries for document retrieval from the Web. The performance of those heuristics was empirically evaluated against a sizeable corpus in terms of precision, recall and f- measure. We found that a systematic combination of different heuristics greatly improves the performance of the document retrieval system.

9 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125