scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Book ChapterDOI
30 Aug 2013
TL;DR: The proposed solution based on simhash document fingerprints essentially reduces the problem to a secure XOR computation between two bit vectors, which improves the computational and communication costs by at least one order of magnitude compared to the current state-of-the-art protocol.
Abstract: Similar document detection is a well-studied problem with important application domains, such as plagiarism detection, document archiving, and patent/copyright protection. Recently, the research focus has shifted towards the privacy-preserving version of the problem, in which two parties want to identify similar documents within their respective datasets. These methods apply to scenarios such as patent protection or intelligence collaboration, where the contents of the documents at both parties should be kept secret. Nevertheless, existing protocols on secure similar document detection suffer from high computational and/or communication costs, which renders them impractical for large datasets. In this work, we introduce a solution based on simhash document fingerprints, which essentially reduce the problem to a secure XOR computation between two bit vectors. Our experimental results demonstrate that the proposed method improves the computational and communication costs by at least one order of magnitude compared to the current state-of-the-art protocol. Moreover, it achieves a high level of precision and recall.

25 citations

Proceedings ArticleDOI
05 Mar 2014
TL;DR: This work presents an experiment in which teachers are requested to compare different code solutions to the same problem and detects that comparison of students' codes has significant potential to be automated to help teachers in their work.
Abstract: In introductory programming courses it is common to demand from students exercises based on the production of code. However, it is difficult for the teacher to give fast feedback to the students about the main solutions tried, the main errors and the drawbacks and advantages of certain solutions. If we could use automatic code comparison algorithms to build visualisation tools to support the teacher in analysing how each solution provided is similar or different from another, such information would be able to be rapidly obtained. However, can computers compare students code solutions as well as teachers? In this work we present an experiment in which we have requested teachers to compare different code solutions to the same problem. Then we have evaluated the level of agreement among each teacher comparison strategy and some algorithms generally used for plagiarism detection and automatic grading. We found out a maximum rate of 77% of agreement between one of the teachers and the algorithms, but a minimum agreement of 75%. However, for most of the teachers, the maximum agreement rate was over 90% for at least one of the automatic strategies to compare code. We have also detected that the level of agreement among teachers regarding their personal strategies to compare students solutions was between 62% and 95%, which shows that there may be more agreement between a teacher and an algorithm than between a teacher and one of her colleagues regarding their strategies to compare students' solutions. The results also seem to support that comparison of students' codes has significant potential to be automated to help teachers in their work.

25 citations

Proceedings ArticleDOI
11 Nov 2008
TL;DR: In this paper, a plagiarism detection technique for Java programs using bytecodes without referring their source codes is proposed, which can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.
Abstract: Most plagiarism detection systems evaluate the similarity of source codes and detect plagiarized program pairs. If we use the source codes in plagiarism detection, the source code security can be a significant problem. Plagiarism detection based on target code can be used for protecting the security of source codes. In this paper, we propose anew plagiarism detection technique for Java programs using bytecodes without referring their source codes. The plagiarism detection procedure using bytecode consists of two major steps. First, we generate the token sequences from the Java class file by analyzing the code area of methods. Then, we evaluate the similarity between token sequences using the adaptive local alignment. According to the experimental results, we can find the distributions of similarities of the source codes and that of bytecodes are very similar. Also, the correlation between the similarities of source code pairs and those of bytecode pairs is high enough for typical test data. The plagiarism detection system using bytecode can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.

25 citations

Proceedings ArticleDOI
25 Jun 2001
TL;DR: A Four-Stage Plagiarism Detection Process that attempts to ensure no suspicious similarity is missed and that no student is unfairly accused of plagiarism is described.
Abstract: For decades many computing departments have deployed systems for the detection of plagiarised student source code submissions. Automated systems to detect free-text student plagiarism are just becoming available and the experience of computing educators is valuable for their successful deployment.This paper describes a Four-Stage Plagiarism Detection Process that attempts to ensure no suspicious similarity is missed and that no student is unfairly accused of plagiarism. Required characteristics of an effective similarity detection engine are proposed and an investigation of a simple engine is described. An innovative prototype tool designed to decrease the workload of tutors investigating undue similarity is also presented.

24 citations

Book
04 Sep 2014
TL;DR: This paper proposes eight ethical techniques to avoid unconscious and accidental plagiarism in manuscripts without using online systems such as Turnitin and/or iThenticate for cross checking and plagiarism detection.
Abstract: This paper discusses plagiarism origins, and the ethical solutions to prevent it. It also reviews some unethical approaches, which may be used to decrease the plagiarism rate in academic writings. We propose eight ethical techniques to avoid unconscious and accidental plagiarism in manuscripts without using online systems such as Turnitin and/or iThenticate for cross checking and plagiarism detection. The efficiency of the proposed techniques is evaluated on five different texts using students individually. After application of the techniques on the texts, they were checked by Turnitin to produce the plagiarism and similarity report. At the end, the “effective factor” of each method has been compared with each other; and the best result went to a hybrid combination of all techniques to avoid plagiarism. The hybrid of ethical methods decreased the plagiarism rate reported by Turnitin from nearly 100% to the average of 8.4% on 5 manuscripts.

24 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125