scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
01 Mar 2011
TL;DR: The initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated are described, designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarisms detection systems.
Abstract: Plagiarism is widely acknowledged to be a significant and increasing problem for higher education institutions (McCabe 2005; Judge 2008). A wide range of solutions, including several commercial systems, have been proposed to assist the educator in the task of identifying plagiarised work, or even to detect them automatically. Direct comparison of these systems is made difficult by the problems in obtaining genuine examples of plagiarised student work. We describe our initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated. This corpus is designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarism detection systems.

133 citations

Book ChapterDOI
18 Apr 2009
TL;DR: The authors' experiments with the METER corpus show that the best results are obtained when considering low level word n -grams comparisons (n = {2,3}) and the definition of proper text chunks as comparison units of the suspicious and original texts is crucial.
Abstract: When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text. The definition of proper text chunks as comparison units of the suspicious and original texts is crucial for the success of this kind of applications. Our experiments with the METER corpus show that the best results are obtained when considering low level word n -grams comparisons (n = {2,3}).

127 citations

01 Jan 2009
TL;DR: A new general plagiarism detection method, that was used in the winning entry to the 1 st International Competition on Plagia- rism Detection, the external plagiarism Detection task, which assumes the source documents are available.
Abstract: In this paper we describe a new general plagiarism detection method, that we used in our winning entry to the 1 st International Competition on Plagia- rism Detection, the external plagiarism detection task, which assumes the source documents are available. In the first phase of our method, a matrix of kernel values is computed, which gives a similarity value based on n-grams between each source and each suspicious document. In the second phase, each promising pair is further investigated, in order to extract the precise positions and lengths of the subtexts that have been copied and maybe obfuscated - using encoplot, a novel linear time pairwise sequence matching technique. We solved the significant computational chal- lenges arising from having to compare millions of document pairs by using a library developed by our group mainly for use in network security tools. The performance achieved is comparing more than 49 million pairs of documents in 12 hours on a single computer. The results in the challenge were very good, we outperformed all other methods.

126 citations

Journal IssueDOI
TL;DR: This paper proposes techniques for detecting plagiarism in program code using text similarity measures and local alignment and shows that their approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems.
Abstract: Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems. Copyright © 2006 John Wiley & Sons, Ltd.

118 citations

Journal Article
TL;DR: Plagiarism in writing essays is common among medical students and an explicit warning is not enough to deter students from plagiarism, according to a study of second year medical students attending Medical Informatics course.
Abstract: Aim. To determine the prevalence of plagiarism among medical students in writing essays. Methods. During two academic years, 198 second year medical students attending Medical Informatics course wrote an essay on one of four offered articles. Two of the source articleswere available in an electronic form and two in printed form. Two (one electronic and one paper article) were considered less complex and the other two more complex. The essays were examined using plagiarism detection software “ WCopyfind, ” which counted the number of words from matching phrases with six or more words. Plagiarism rate, expressed as the percentage of the plagiarized text, was calculated as a ratio of the absolute number of matching words and the total number of words in the essay. Results. Only 17 (9%) of students did not plagiarize at all and 68 (34%) plagiarized less than 10% of the text. The average plagiarism rate (% of plagiarized text) was 19% (5-95% percentile=0-88). Students who were strictly warned not to plagiarize had a higher total word count in their essays than students who were not warned (P=0.002) but there was no difference between them in the rate of plagiarism. Students with higher grades in Medical Informatics exam plagiarized less than those with lower grades (P=0.015). Gender, subject source, and complexity had no influence on the plagiarism rate. Conclusions.Plagiarism in writing essays is common among medical students. An explicit warning is not enough to deter students from plagiarism. Detection software can be used to trace and evaluate the rate of plagiarism in written student assays.

118 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125