scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings Article
23 Aug 2010
TL;DR: This research developed software capable of simple plagiarism detection that has built a corpus containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C.
Abstract: Plagiarism is the use of the language and thoughts of another work and the representation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.

24 citations

Journal ArticleDOI
TL;DR: This paper investigates an unsupervised feature learning technique called sparse auto-encoder as a method of extracting features from source code files and shows that performance is very close to the state of art techniques in the source code identification field.

24 citations

Dissertation
01 Jan 2013
TL;DR: Man Yan Miranda Chong A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy in 2013.
Abstract: Man Yan Miranda Chong A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy 2013

24 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the results of a 2-year trial of the JISC plagiarism detection service (PDS) involving hundreds of students and discuss the effectiveness of the service in detecting plagiarized material and in acting as a deterrent.
Abstract: In the UK, there is great concern about the perceived increase in plagiarized work being submitted by students in higher educations. Although there is much debate, the reasons for the perceived change are not completely clear. Here we present the results of a 2‐year trial of the JISC Plagiarism Detection Service (PDS) involving hundreds of students. The effectiveness of the service in detecting plagiarized material and in acting as a deterrent are discussed. Although an increased number of cases of plagiarism were detected during the trial, the relative contributions of the electronic detection system and increased staff awareness remain unknown.

24 citations

Journal ArticleDOI
TL;DR: A multi-register corpus gathered for this purpose is introduced, in which each text has been located in a similarity space based on ratings by human readers, which provides a resource for testing similarity measures derived from computational text-processing against reference levels derived from human judgement.
Abstract: Quantifying the similarity or dissimilarity between documents is an important task in authorship attribution, information retrieval, plagiarism detection, text mining, and many other areas of linguistic computing. Numerous similarity indices have been devised and used, but relatively little attention has been paid to calibrating such indices against externally imposed standards, mainly because of the difficulty of establishing agreed reference levels of inter-text similarity. The present article introduces a multi-register corpus gathered for this purpose, in which each text has been located in a similarity space based on ratings by human readers. This provides a resource for testing similarity measures derived from computational text-processing against reference levels derived from human judgement, i.e. external to the texts themselves. We describe the results of a benchmarking study in five different languages in which some widely used measures perform comparatively poorly. In particular, several alternative correlational measures (Pearson r, Spearman rho, tetrachoric correlation) consistently outperform cosine similarity on our data. A method of using what we call ‘anchor texts’ to extend this method from monolingual inter-text similarity-scoring to inter-text similarity-scoring across languages is also proposed and tested.

23 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125