scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
01 Jan 2012
TL;DR: The main changes have been done to the heuristic that tries to recognize the clusters of N-grams matches as matching passages in the pair of documents examined, which aims for high recall under difficult conditions (sparse matches) which are typical for real-life rephrasing by people.
Abstract: This article describes the latest changes to our plagiarism detection system Encoplot. We have sent the modified system to the PAN@CLEF 2012 automatic detection of plagiarism challenge, where it ranked 2nd by the F-measure and 3rd by the “plagdet“ scoring method that we had previously shown to be flawed to some extent. The main changes have been done to the heuristic that tries to recognize the clusters of N-grams matches as matching passages in the pair of documents examined. We have aimed for high recall under difficult conditions (sparse matches) which are typical for real-life rephrasing by people. The result of the evaluation on the training and test PAN 2012 corpora shows that we have achieved our goal of improving the performance of this piece of the Encoplot plagiarism detection system. In the final part of this article we analyze the anomalies of the plagdet scoring method, show that those are not negligible, and propose a modified plagdet version that lowers those anomalies.

8 citations

Book ChapterDOI
08 Sep 2010
TL;DR: Social Network Analysis is applied to discover groups of people associated to each other by their documents' similarity in a plagiarism detection context to tackle the plagiarism problem among students in Chile.
Abstract: Nowadays, the technology usage is a massive practice where internet and digital documents are considered as powerful tools in both professional and personal domains. Although, as useful as they can be in a proper way, wrong practices can appear easily, where the copy & paste or plagiarism phenomenon is not far away from this. Documents' copy & paste is a world-wide growing practice, and Chile is not the exception. Therefore, all levels of educational fields, from elementary school to graduate students, are directly affected by this. Regarding to this concern, in Chile it's been decided to tackle the plagiarism problem among students. For this, we apply Social Network Analysis to discover groups of people associated to each other by their documents' similarity in a plagiarism detection context. Experiments were successfully performed in real reports of graduate students at University of Chile.

8 citations

Proceedings ArticleDOI
21 Mar 2013
TL;DR: The trial and evaluation of a plagiarism detection tool that is seamlessly integrated into the Moodle virtual learning environment indicates a considerable level of interest in using the tool, and supports the suitability of this tool for wider institutional adoption in the computing education community.
Abstract: Technology empowers students but can also entice them to plagiarise. To tackle this problem, plagiarism detection tools are especially useful, not only in popular thinking as a deterrent for students, but also as an educational tool to raise students' awareness of the offence and to improve their academic skills. Commercial text matching tools (e.g. Turnitin) are at a high level of maturity. These tools offer the ability to interact with students, making them suitable for an educational objective. Additionally, they can be readily integrated into learning environments enabling uniform application at an institutional level. On the other hand, computer source code matching tools, despite their successful detection performance, are mostly used as standalone tools that are difficult to adopt at an institutional level. The research presented in this paper describes the trial and evaluation of a tool that is seamlessly integrated into the Moodle virtual learning environment. The tool provides code similarity scanning capability within Moodle so that institutions using this learning environment could apply this tool easily at an enterprise level. Additionally, the educational aspects available in text matching tools have been added into the tool capability. The tool relies on two popular code matching services, MOSS and JPlag, as underlying engines to provide good code similarity scanning performance. The evaluation of the tool from both academics' and students' perspectives indicates a considerable level of interest in using the tool, and supports the suitability of this tool for wider institutional adoption in the computing education community.

8 citations

Proceedings ArticleDOI
14 Nov 2016
TL;DR: This paper explores how to automatically identify error classes by clustering a set of submitted codes, using code plagiarism detection tools to measure the similarity between the codes.
Abstract: Online platforms to learn programming are very popular nowadays. These platforms must automatically assess codes submitted by the learners and must provide good quality feedbacks in order to support their learning. Classical techniques to produce useful feedbacks include using unit testing frameworks to perform systematic functional tests of the submitted codes or using code quality assessment tools. This paper explores how to automatically identify error classes by clustering a set of submitted codes, using code plagiarism detection tools to measure the similarity between the codes. The proposed approach and analysis framework are presented in the paper, along with a first experiment using the Code Hunt dataset.

8 citations

Proceedings ArticleDOI
Kensuke Baba1
28 Jun 2017
TL;DR: A plagiarism detection algorithm based on approximate string matching to be specified in “copy and paste”-type plagiarisms, and a speed improvement to an implementation of the algorithm are proposed.
Abstract: Plagiarism detection in a large number of documents requires efficient methods. This paper proposes a plagiarism detection algorithm based on approximate string matching to be specified in “copy and paste”-type plagiarisms, and a speed improvement to an implementation of the algorithm. Most of the computations required in the algorithm are omitted by two kinds of approximations of the output used for plagiarism detection, while the decrease of accuracy caused by the approximations is acceptable. The effect of the improvement on the processing time and accuracy of the algorithm is evaluated by conducting experiments with a data set. The experimental results show that the improvement can reduce the processing time to approximately one-twentieth for a 6.4% decrease of the accuracy from those for the normal implementation of the algorithm.

8 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125