scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
01 May 2003
TL;DR: The tests made on chunking methods used for plagiarism detection makes it possible to decide on the best fitting chunking method for a given application.
Abstract: This paper describes the tests made on chunking methods used for plagiarism detection. The result of the tests makes it possible to decide on the best fitting chunking method for a given application. For example, overlapping word chunking is good for a grammar analyzer or for small databases, sentence chunking suits best for finding quoted texts, hashed breakpoint chunking is the fastest method therefore advisable for search in big set of documents, or if more reliability is needed overlapping hashed breakpoint chunking can be used as well.

9 citations

Journal ArticleDOI
TL;DR: The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing, and designed an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching.
Abstract: Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.

9 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A tool, SPPlagiarise, is presented, which is designed to produce simulated source code plagiarism of Java source code, and an evaluation of a generated plagiarism data set is presented.
Abstract: Source code plagiarism is a common occurrence in undergraduate computer science education. Studies have indicated at least 50% of students plagiarize during their undergraduate career. To identity cases of source code plagiarism, many source code plagiarism detection tools have been proposed. However, conclusively determining the effectiveness these tools at identifying cases of source code plagiarism is difficult. Evaluations are typically performed using unreleased data sets. Without a comprehensive publicly available data set for source code plagiarism detection evaluation, it is difficult to perform an unbiased and reproducible evaluations of tools. To address this problem, this paper presents a tool, SPPlagiarise, which is designed to produce simulated source code plagiarism of Java source code. SPPlagiarise applies a random number of semantics-preserving source code obfuscations at random locations to a Java code base to simulate source code plagiarism. In this paper the design of the tool and an evaluation of a generated plagiarism data set is presented.

9 citations

Book ChapterDOI
19 Sep 2016
TL;DR: In the paper various combination of feature point detectors and descriptors are investigated as potential tool for finding similar images in document as well as how the algorithms computing the image similarity may extend the functionality of plagiarism detection systems.
Abstract: The paper presents results of research oriented towards an application of image processing methods into document comparisons in view of their application into plagiarism-detection systems. Among all image processing methods, the feature-point ones, thanks to their invariance to various image transforms, are best suited for computing image similarity. In the paper various combination of feature point detectors and descriptors are investigated as potential tool for finding similar images in document. The methods are tested on the database consisting of scientific papers containing 5 well known image processing test images. Also, an idea is presented in the paper how the algorithms computing the image similarity may extend the functionality of plagiarism detection systems.

9 citations

Proceedings ArticleDOI
24 Jul 2016
TL;DR: The extent and practicality of plagiarism detection systems using multiple classifications of detection engines are studied using 8 individual articles from different fields of work to determine the effectiveness and extent of each detection engine.
Abstract: This article studies the extent and practicality of plagiarism detection systems using multiple classifications of detection engines, further described within the article. An in-depth analysis of 8 individual articles from different fields of work was carried out allowing comparisons both between detection systems and different writing styles/formats. The first analysis used unmodified versions of the 8 selected papers as a control and base for the performance of the detection engines, before a second analysis was conducted. This analysis used modified versions of the selected papers by formatting the plagiarized sentences detected in the first test. This formatting involved simple shuffling and manipulation of the text to determine the effectiveness and extent of each detection engine.

9 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125