Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Scalable Source Code Plagiarism Detection Using Source Code Vectors Clustering

[...]

Michal Duracik¹, Emil Krsak¹, Patrik Hrkut¹•Institutions (1)

University of Žilina¹

01 Nov 2018

TL;DR: A new scalable approach to the detection of plagiarism in source code in the academic environment by using an incremental clustering approach to achieve modularity and scalability of the algorithm.

...read moreread less

Abstract: Nowadays, the plagiarism is a growing problem due to a lot of easily accessible resources on-line. New algorithms are constantly being developed, but there are not currently many systems, that could be used for successful plagiarism detection in large source files databases. Aim of our work is to deal with plagiarism on a large scale. This paper describes our new scalable approach to the detection of plagiarism in source code in the academic environment. The aim of the algorithm is to search for plagiarism in a huge number of source code files. An incremental clustering approach is applied to achieve modularity and scalability of the algorithm. The paper also details structures of data persistence and methods of searching for source code snippet matches. In addition, we present some results of this approach on real student submissions and compare the results with other detection systems.

...read moreread less

5 citations

Proceedings Article•

Towards the exploitation of statistical language models for plagiarism detection with reference

[...]

Alberto Barrón-Cedeño¹, Paolo Rosso¹•Institutions (1)

Polytechnic University of Valencia¹

22 Jul 2008

TL;DR: The preliminary experiments, carried out on two specialised and literary corpora, show that perplexity of a text segment, given a Language Model calculated over an author text, could be a relevant feature in plagiarism detection.

...read moreread less

Abstract: To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, could be a relevant feature in plagiarism detection.

...read moreread less

5 citations

Mahak Samim: A Corpus of Persian Academic Texts for Evaluating Plagiarism Detection Systems.

[...]

Morteza Rezaei Sharifabadi, Seyed Ahmad Eftekhari

01 Jan 2016

TL;DR: Mahak Samim is introduced, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded and which can be used for evaluating plagiarism Detection systems.

...read moreread less

Abstract: In this paper we introduce Mahak Samim, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded. This corpus, which can be used for evaluating plagiarism detection systems, consists of more than five thousand artificial plagiarism cases with various lengths and diverse degrees of obfuscation. The development process and the features of the corpus are described here. CCS Concepts • Information systems ➝ Information retrieval ➝ Retrieval tasks and goals ➝ Near-duplicate and plagiarism detection.

...read moreread less

5 citations

Journal Article•DOI•

The Implementation of Plagiarism Detection System in Health Sciences Publications in Arabic and English Languages

[...]

Khaled Omar¹, Bassel Alkhatib¹, Mayssoon Dashash¹•Institutions (1)

Damascus University¹

30 Apr 2013-International Review on Computers and Software

TL;DR: All health sciences manuscripts should be tested through plagiarism detection system before accepting them for publications.

...read moreread less

Abstract: There are many available algorithms for plagiarism detection in natural languages. Generally, these algorithms belong to main categories including plagiarism detection algorithms which is based on fingerprint and also plagiarism detection algorithms which is based on content comparison that contains string matching and tree matching algorithms. Available systems of plagiarism detection usually use specific types of detection algorithms or mixture of detection algorithms to achieve effective detection systems (fast and accurate). On rhetorical structure theory a system for plagiarism detection in Arabic and English health sciences publications has been developed using Bing search engine; Conclusion, all health sciences manuscripts should be tested through plagiarism detection system before accepting them for publications

...read moreread less

5 citations

DOI•

Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection

[...]

Patrik Hrkut¹, Michal Ďuračík¹, Miroslava Mikusova, Mauro Callejas-Cuervo², Joanna Zukowska³ - Show less +1 more•Institutions (3)

University of Žilina¹, Pedagogical and Technological University of Colombia², Gdańsk University of Technology³

02 Dec 2019

TL;DR: The paper discusses how to optimize the implementation of clustering, so the whole system would deliver results in a reasonable time because allocating the different parts of the source code into suitable clusters will allow faster and more memory-efficient search for similar part of the code.

...read moreread less

Abstract: The problem of plagiarism is becoming increasingly more significant with the growth of Internet technologies and the availability of information resources. Many tools have been successfully developed to detect plagiarisms in textual documents, but the situation is more complicated in the field of plagiarism of source codes, where the problem is equally serious. At present, there are no complex tools available to detect plagiarism in a large number of software projects, such as student projects, which are created hundreds per year at each faculty of informatics. Our project aim is to create such a system for finding plagiarism in a large dataset of source codes. The whole system consists of several parts. A classification of source code is an essential part of the whole system because it makes it much more efficient to manipulate source code and divide data into individual clusters so that searching in large volumes of source code is as efficient as possible. The paper discusses how to optimize the implementation of clustering, so the whole system would deliver results in a reasonable time because allocating the different parts of the source code into suitable clusters will allow faster and more memory-efficient search for similar parts of the code.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics