Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Tackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detector

[...]

James A. Malcolm¹, Peter C. R. Lane¹•Institutions (1)

University of Hertfordshire¹

01 Jan 2009

TL;DR: Ferret was able to detect numerous files in the development corpus that contain substantial similarities not marked as plagiarism, but it also identified quite a lot of pairs where random similarities masked actual plagiarism.

...read moreread less

Abstract: Ferret is a fast and effective tool for detecting similarities in a group of files. Applying it to the PAN'09 corpus required modifications to meet the require- ments of the competition, mainly to deal with the very large number of files, the large size of some of them, and to automate some of the decisions that would nor- mally be made by a human operator. Ferret was able to detect numerous files in the development corpus that contain substantial similarities not marked as plagiarism, but it also identified quite a lot of pairs where random similarities masked actual plagiarism. An improved metric is therefore indicated if the "plagiarised" or "not plagiarised" decision is to be automated.

...read moreread less

17 citations

Proceedings Article•DOI•

Experiments on the Indonesian plagiarism detection using latent semantic analysis

[...]

Sidik Soleman¹, Ayu Purwarianti¹•Institutions (1)

Bandung Institute of Technology¹

28 May 2014

TL;DR: In this paper, the authors employed latent semantic analysis (LSA) as the term-document representation to handle the intelligence plagiarism, which was used in the Heuristic Retrieval (HR) component and Detailed Analysis (DA) component.

...read moreread less

Abstract: Plagiarism is an important task since its number is increasing and the plagiarism technique is getting difficult. It means that there is not only literal plagiarism but also intelligence plagiarism. In order to handle the intelligence plagiarism, we employed latent semantic analysis (LSA) as the term-document representation. The LSA was used in the Heuristic Retrieval (HR) component and Detailed Analysis (DA) component. We conducted several experiments to compare the token type, the text segmentation and the threshold value. The test data were prepared manually from the available Indonesian paper corpus. Experimental results showed that the LSA outperformed the VSM (Vector Space Model), especially in test cases with intelligence plagiarism.

...read moreread less

17 citations

Proceedings Article•DOI•

Style Analysis for Source Code Plagiarism Detection — An Analysis of a Dataset of Student Coursework

[...]

Olfat M. Mirza¹, Mike Joy¹, Georgina Cosma²•Institutions (2)

University of Warwick¹, Nottingham Trent University²

01 Jul 2017

TL;DR: This paper focuses to identify whether a data set consisting of student programming assignments is rich enough to apply coding style metrics to detect similarities between code sequences, and uses the BlackBox dataset as a case study.

...read moreread less

Abstract: Plagiarism has become an increasing problem in higher education in recent years. Coding style can be used to detect source code plagiarism that involves writing and deciding the structure of the code which does not affect the logic of a program, thus offering a way to differentiate between different code authors. This paper focuses to identify whether a data set consisting of student programming assignments is rich enough to apply coding style metrics to detect similarities between code sequences, and we use the BlackBox dataset as a case study.

...read moreread less

17 citations

Proceedings Article•DOI•

Arabic document similarity analysis using n-grams and singular value decomposition

[...]

Ashraf S. Hussein¹•Institutions (1)

Arab Open University¹

13 May 2015

TL;DR: The proposed method is based on modeling the relation between documents and their n-gram phrases, and outperformed Plagiarism-Checker-X, especially for the intelligent similarity cases with syntactic changes.

...read moreread less

Abstract: The computerized methods for document similarity estimation (or plagiarism detection) in natural languages, evolved during the last two decades, have focused on English language in particular and some other languages such as German and Chinese. On the other hand, there are several language-independent methods, but the accuracy of these methods is not satisfactory, especially with morphological and complicated languages such as Arabic. This paper proposes an innovative content-based method for document similarity analysis devoted to Arabic language in order to bridge the existing gap in such software solutions. The proposed method is based on modeling the relation between documents and their n-gram phrases. These phrases are generated from the normalized text, exploiting Arabic morphology analysis and lexical lookup. Resolving possible morphological ambiguity is carried out through applying Part-of-Speech (PoS) tagging on the examined documents. Text indexing and stop-words removal are performed, employing a new method based on text morphological analysis. The examined documents' TF-IDF model is constructed using Heuristic based pair-wise matching algorithm, considering lexical and syntactic changes. Then, the hidden associations between the unique n-gram phrases and their documents are investigated using Latent Semantic Analysis (LSA). Next, the pairwise document subset and similarity measures are derived from the Singular Value Decomposition (SVD) computations. The performance of the proposed method was confirmed through experiments with various data sets, exhibiting promising capabilities in estimating literal and some types of intelligent similarities. Finally, the results of the proposed method was compared to that of Plagiarism-Checker-X, and the proposed method outperformed Plagiarism-Checker-X, especially for the intelligent similarity cases with syntactic changes.

...read moreread less

17 citations

Proceedings Article•DOI•

Improving Plagiarism Detection in Coding Assignments by Dynamic Removal of Common Ground

[...]

Christian Domin¹, Henning Pohl¹, Markus Krause²•Institutions (2)

Leibniz University of Hanover¹, University of California, Berkeley²

07 May 2016

TL;DR: A new approach to detect code re-use that increases the prediction accuracy by dynamically removing parts in assignments which are part of almost every assignment--the so called common ground is proposed.

...read moreread less

Abstract: Plagiarism in online learning environments has a detrimental effect on the trust of online courses and their viability. Automatic plagiarism detection systems do exist yet the specific situation in online courses restricts their use. To allow for easy automated grading, online assignments usually are less open and instead require students to fill in small gaps. Therefore solutions tend to be very similar, yet are then not necessarily plagiarized. In this paper we propose a new approach to detect code re-use that increases the prediction accuracy by dynamically removing parts in assignments which are part of almost every assignment--the so called common ground. Our approach shows significantly better F-measure and Cohen's Kappa results than other state of the art algorithms such as Moss or JPlag. The proposed method is also language agnostic to the point that training and test data sets can be taken from different programming languages.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics