scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: ES‐Plag, a plagiarism detection tool featured with cosine‐based filtering and penalty mechanism to handle aforementioned issues, is proposed and its features are beneficial for examiners.
Abstract: Source code plagiarism detection using Running‐Karp‐Rabin Greedy‐String‐Tiling (RKRGST) is a common practice in academic environment. However, such approach is time‐inefficient (due to RKRGST's cubic time complexity) and insensitive (toward token subsequence rearrangement). This paper proposes ES‐Plag, a plagiarism detection tool featured with cosine‐based filtering and penalty mechanism to handle aforementioned issues. Cosine‐based filtering mitigates time‐inefficiency by excluding non‐potential pairs from RKRGST comparison; while penalty mechanism mitigates insensitivity by reducing the number of matched tokens with the number of matched subsequences prior similarity normalization. In addition to issue‐solving features, ES‐Plag is also featured with project‐based input, colorized adjacency similarity matrix, matched token highlighting, and various similarity algorithms (e.g., Cosine Similarity and Local Alignment). Three findings can be deducted from our evaluation. First, cosine‐based filtering boosts up time efficiency with a trade‐off in effectiveness. Second, penalty mechanism enhances sensitivity even though its improvement in terms of effectiveness is quite limited. Third, ES‐Plag's features are beneficial for examiners.

27 citations

Journal ArticleDOI
TL;DR: In programming courses there are various ways in which students attempt to cheat, the most commonly used method is copying source code from other studs.
Abstract: In programming courses there are various ways in which students attempt to cheat. The most commonly used method is copying source code from other students and making minimal changes in it, like renaming variable names. Several tools like Sherlock, JPlag and Moss have been devised to detect source code plagiarism. However, for larger student assignments and projects that involve a lot of source code files these tools are not so effective. Also, issues may occur when source code is given to students in class so they can copy it. In such cases these tools do not provide satisfying results and reports. In this study, we present an improved process model for plagiarism detection when multiple student files exist and allowed source code is present. In the research in this paper we use the Sherlock detection tool, although the presented process model can be combined with any plagiarism detection engine. The proposed model is tested on assignments in three courses in two subsequent academic years.

26 citations

Journal ArticleDOI
TL;DR: An architecture is proposed that uses a semantic similarity measure that exploits the semantic similarity of words, as mined from within the data corpus, thereby using localized contextual information to detect plagiarism.

26 citations

01 Jan 2011
TL;DR: This paper aims to explain the performance of plagiarism detection system which can detect External as well as Intrinsic Plagia- rism in text and reports the results on PAN-PC-2011 test corpus.
Abstract: This paper aims to explain the performance of plagiarism detection system which can detect External as well as Intrinsic Plagia- rism in text. It reports the results on PAN-PC-2011 test corpus. We investigated Vector Space Model based techniques for detecting external plagiarism cases and discourse markers based features to detect intrinsic plagiarism cases.

26 citations

Proceedings ArticleDOI
Heinz Dreher1
TL;DR: The Maurer et al. (2006) provide a thorough analysis of the plagiarism problem and possible solutions as they pertain to academia, and divide the solution strategies into three main categories.
Abstract: Introduction Plagiarism is now acknowledged to pose a significant threat to academic integrity. There is a growing array of software packages to help address the problem. Most of these offer a string-oftext comparison. New to emerge are software packages and services to 'generate' assignments. Naturally there will be a cat and mouse game for a while and in the meantime academics need to be alert to the possibilities of academic malpractice via plagiarism and adopt appropriate and promising counter-measures, including the newly emerging algorithms to do fast conceptual analysis. One such emergent agent is the Normalised Word Vector (NWV) algorithm (Williams, 2006), which was originally developed for use in the Automated Essay Grading (AEG) domain. AEG is a relatively new technology which aims to score or grade essays at the level of expert humans. This is achieved by creating a mathematical representation of the semantic information in addition to checking spelling, grammar, and other more usual parameters associated with essay assessment. The mathematical representation is computed for each student essay and compared with a mathematical representation computed for the model answer. If we can represent the semantic content of an essay we are able to compare it to some standard model--hence determine a grade or assign an authenticity parameter relative to any given corpus; and create a persistent digital representation of the essay. AEG technology can be used for plagiarism detection because it processes the semantic information of student essays and creates a semantic footprint. Once a mathematical representation for all or parts of an essay is created it can be efficiently compared to other similarly constructed representations and facilitate plagiarism checking through semantic footprint comparison. The Plagiarism Problem The extent of plagiarism is indeed significant. Maurer et al. (2006) provide a thorough analysis of the plagiarism problem and possible solutions as they pertain to academia. They divide the solution strategies into three main categories. The most common method is based on document comparison in which a word for word check is made with each target document in a selected which could be the source of the copied material. Clearly this is language independent as one is essentially comparing character strings; it will also match misspellings. The selected set of document is usually all documents comprising assignment or paper submissions for a specific purpose. A second category is an expansion of the document check but where the set of target documents is 'everything' that is reachable on the internet and the candidate to be checked for is a characteristic paragraph or sentence rather than the entire document. The emergence of tools such as Google has made this type of check feasible. The third category mentioned by Maurer et al. is the use of stylometry, in which a language analysis algorithm compares the style of successive paragraphs and reports if a style change has occurred. This can be extended to analyzing prior documents by the same author and comparing the stylistic parameters of a succession of documents. However, the issue of plagiarism is not merely a matter for academics. Austrian journalist Josef Karner (2001) writes "Das Abschreiben ist der eigentliche Beruf des Dichters" ("Transcription is the virtual vocation of the poet"). Is then the poet essentially a professional plagiarist, taking others' ideas and presenting them in verse as his own and without attribution? This may be a rather extreme position to hold, but its consideration does point up interesting possibilities which the etymology of plagiarism may illuminate. As yet there is a paucity of statistics available to help us understand the extent of plagiarism. However a recent Canadian study (Kloda & Nicholson, 2005) has reported that one in three students admit to turning to plagiarism prior to graduation - serious enough one may think. …

26 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125