scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings ArticleDOI
01 Nov 2018
TL;DR: A new scalable approach to the detection of plagiarism in source code in the academic environment by using an incremental clustering approach to achieve modularity and scalability of the algorithm.
Abstract: Nowadays, the plagiarism is a growing problem due to a lot of easily accessible resources on-line. New algorithms are constantly being developed, but there are not currently many systems, that could be used for successful plagiarism detection in large source files databases. Aim of our work is to deal with plagiarism on a large scale. This paper describes our new scalable approach to the detection of plagiarism in source code in the academic environment. The aim of the algorithm is to search for plagiarism in a huge number of source code files. An incremental clustering approach is applied to achieve modularity and scalability of the algorithm. The paper also details structures of data persistence and methods of searching for source code snippet matches. In addition, we present some results of this approach on real student submissions and compare the results with other detection systems.

5 citations

Proceedings Article
22 Jul 2008
TL;DR: The preliminary experiments, carried out on two specialised and literary corpora, show that perplexity of a text segment, given a Language Model calculated over an author text, could be a relevant feature in plagiarism detection.
Abstract: To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, could be a relevant feature in plagiarism detection.

5 citations

01 Jan 2016
TL;DR: Mahak Samim is introduced, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded and which can be used for evaluating plagiarism Detection systems.
Abstract: In this paper we introduce Mahak Samim, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded. This corpus, which can be used for evaluating plagiarism detection systems, consists of more than five thousand artificial plagiarism cases with various lengths and diverse degrees of obfuscation. The development process and the features of the corpus are described here. CCS Concepts • Information systems ➝ Information retrieval ➝ Retrieval tasks and goals ➝ Near-duplicate and plagiarism detection.

5 citations

Journal ArticleDOI
TL;DR: All health sciences manuscripts should be tested through plagiarism detection system before accepting them for publications.
Abstract: There are many available algorithms for plagiarism detection in natural languages. Generally, these algorithms belong to main categories including plagiarism detection algorithms which is based on fingerprint and also plagiarism detection algorithms which is based on content comparison that contains string matching and tree matching algorithms. Available systems of plagiarism detection usually use specific types of detection algorithms or mixture of detection algorithms to achieve effective detection systems (fast and accurate). On rhetorical structure theory a system for plagiarism detection in Arabic and English health sciences publications has been developed using Bing search engine; Conclusion, all health sciences manuscripts should be tested through plagiarism detection system before accepting them for publications

5 citations

DOI
02 Dec 2019
TL;DR: The paper discusses how to optimize the implementation of clustering, so the whole system would deliver results in a reasonable time because allocating the different parts of the source code into suitable clusters will allow faster and more memory-efficient search for similar part of the code.
Abstract: The problem of plagiarism is becoming increasingly more significant with the growth of Internet technologies and the availability of information resources. Many tools have been successfully developed to detect plagiarisms in textual documents, but the situation is more complicated in the field of plagiarism of source codes, where the problem is equally serious. At present, there are no complex tools available to detect plagiarism in a large number of software projects, such as student projects, which are created hundreds per year at each faculty of informatics. Our project aim is to create such a system for finding plagiarism in a large dataset of source codes. The whole system consists of several parts. A classification of source code is an essential part of the whole system because it makes it much more efficient to manipulate source code and divide data into individual clusters so that searching in large volumes of source code is as efficient as possible. The paper discusses how to optimize the implementation of clustering, so the whole system would deliver results in a reasonable time because allocating the different parts of the source code into suitable clusters will allow faster and more memory-efficient search for similar parts of the code.

5 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125