Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Plagiarism An Essay in Terminology

[...]

Mohinder Partap Satija¹, Daniel Martínez-Ávila²•Institutions (2)

Guru Nanak Dev University¹, Sao Paulo State University²

11 Mar 2019-DESIDOC Journal of Library & Information Technology

TL;DR: The terminology on plagiarism is fluid, a bit ambiguous, and still emerging as mentioned in this paper, and it may take some time to settle the terms more clearly, concretely and exhaustively.

...read moreread less

Abstract: The terminology on plagiarism is not hard and fast. It is fluid, a bit ambiguous, and still emerging. It may take some time to settle the terms more clearly, concretely and exhaustively. This paper aims to provide a terminological discussion of some important and current concepts related to plagiarism. It discusses key terms/concepts such as copyright, citation cartels, citing vs. quoting, compulsive thief, cryptomnesia, data fakery, ignorance of laws and codes of ethics, information literacy, lack of training, misattribution, fair use clause, paraphrasing, plagiarism, plagiarism detection software, publish or perish syndrome, PubPeer, retraction, retraction vs. correction, retraction watch, salami publication, similarity score, Society for Scientific Values, and source attribution. The explanation and definition of these terms/concepts can be useful for LIS scholars and professionals in their efforts to fight plagiarism. We expect this terminology can be referred in future discussions on the topic and also used to improve the communications between the actors involved.

...read moreread less

11 citations

Journal Article•

A comparison of software tools for plagiarism detectionin programming assignments

[...]

Marko Misc, Zivojin Sustran, Jelica Protic

01 Jan 2016-International Journal of Engineering Education

TL;DR: Social and educational aspects of the source code plagiarism in academic environment are discussed, and an overview of software tools for source code similarity detection is presented, and results show that 5–10% of students plagiarized their solutions.

...read moreread less

Abstract: Computing education usually involves intensive practical training through laboratory exercises, programming projects,and homework assignments. Those assignments are frequent targets for plagiarism. In this paper, we discuss social andeducational aspects of the source code plagiarism in academic environment, and present an overview of software tools forsource code similarity detection. We present our experiences with JPlag, Moss, and SPD tools, and compare them usingsimulated plagiarism based on programming assignment solutions produced after 1, 2, 4, and 8 hours of work on baselineversion using more than 20 types of lexical and structural modifications that students use to hide plagiarism. We alsocompare results of the selected tools used on real-life student programming solutions from three different courses. Thecourses were attended by 100 to 300 students, and the programming assignment solutions varied in size and complexityfrom 50 to 1000 lines of source code. The results show that 5–10% of students plagiarized their solutions. In our experience,JPlag and Moss proved to be effective tools for plagiarism detection, as they clearly indicated cases of similarity which weremanually confirmed by human code inspection.

...read moreread less

11 citations

Journal Article•DOI•

Biochemia Medica has started using the CrossCheck plagiarism detection software powered by iThenticate.

[...]

Vesna Šupak Smolčić, Ana-Maria Simundic

15 Jun 2013-Biochemia Medica

TL;DR: All manuscript submitted to Biochemia Medica are now first assigned to Research integrity editor (RIE), before sending the manuscript for peer-review, to implement CrossCheck plagiarism detection service.

...read moreread less

Abstract: In February 2013, Biochemia Medica has joined CrossRef, which enabled us to implement CrossCheck plagiarism detection service. Therefore, all manuscript submitted to Biochemia Medica are now first assigned to Research integrity editor (RIE), before sending the manuscript for peer-review. RIE submits the text to CrossCheck analysis and is responsible for reviewing the results of the text similarity analysis. Based on the CrossCheck analysis results, RIE subsequently provides a recommendation to the Editor-in-chief (EIC) on whether the manuscript should be forwarded to peer-review, corrected for suspected parts prior to peer-review or immediately rejected. Final decision on the manuscript is, however, with the EIC. We hope that our new policy and manuscript processing algorithm will help us to further increase the overall quality of our Journal.

...read moreread less

11 citations

Proceedings Article•DOI•

Index-based n-gram extraction from large document collections

[...]

Michal Kratky¹, Radim Baca¹, David Bednar¹, Jiri Walder¹, Jiri Dvorsky¹, Peter Chovanec¹ - Show less +2 more•Institutions (1)

Technical University of Ostrava¹

01 Dec 2011

TL;DR: An index-based method to the n-gram extraction for large collections using common data structures like B+-tree and Hash table is shown and the scalability of this method is shown by presenting experiments with the gigabytes collection.

...read moreread less

Abstract: N-grams are applied in some applications searching in text documents, especially in cases when one must work with phrases, e.g. in plagiarism detection. N-gram is a sequence of n terms (or generally tokens) from a document. We get a set of n-grams by moving a floating window from the begin to the end of the document. During the extraction we must remove duplicate n-grams and we must store additional values to each n-gram type, e.g. n-gram type frequency for each document and so on, it depends on a query model used. Previous works utilize a sorting algorithm to compute the n-gram frequency. These approaches must handle a high number of the same n-grams resulting in high time and space overhead. Moreover, these techniques are often main-memory only, it means they must be executed for small or middle size collections. In this paper, we show an index-based method to the n-gram extraction for large collections. This method utilizes common data structures like B+-tree and Hash table. We show the scalability of our method by presenting experiments with the gigabytes collection.

...read moreread less

11 citations

Proceedings Article•DOI•

Plagiarism Detection in Programming Assignments Using Deep Features

[...]

Jitendra Yasaswi¹, Suresh Purini¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Nov 2017

TL;DR: This paper proposes a method for detecting plagiarism in source-codes using deep features, obtained using a character-level Recurrent Neural Network (char-rnn), which is pre-trained on Linux Kernel source-code.

...read moreread less

Abstract: This paper proposes a method for detecting plagiarism in source-codes using deep features. The embeddings for programs are obtained using a character-level Recurrent Neural Network (char-rnn), which is pre-trained on Linux Kernel source-code. Many popular plagiarism detection tools are based on n-gram techniques at syntactic level. However, these approaches to plagiarism detection fail to capture long term dependencies (non-contiguous interaction) present in the source-code. Contrarily, the proposed deep features capture non-contiguous interaction within n-grams. These are generic in nature and there is no need to fine-tune the char-rnn model again to program submissions from each individual problem-set. Our experiments show the effectiveness of deep features in the task of classifying assignment program submissions as copy, partial-copy and non-copy. Comparing our proposed features with handcrafted features (source-code metrics and textual features), we report f1-score improvement of 9.5% for binary classification and 5% for three-way classification tasks respectively.

...read moreread less

11 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics