Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Developing a corpus of plagiarised short answers

[...]

Paul Clough¹, Mark Stevenson¹•Institutions (1)

University of Sheffield¹

01 Mar 2011

TL;DR: The initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated are described, designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarisms detection systems.

...read moreread less

Abstract: Plagiarism is widely acknowledged to be a significant and increasing problem for higher education institutions (McCabe 2005; Judge 2008). A wide range of solutions, including several commercial systems, have been proposed to assist the educator in the task of identifying plagiarised work, or even to detect them automatically. Direct comparison of these systems is made difficult by the problems in obtaining genuine examples of plagiarised student work. We describe our initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated. This corpus is designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarism detection systems.

...read moreread less

133 citations

Book Chapter•DOI•

On Automatic Plagiarism Detection Based on n-Grams Comparison

[...]

Alberto Barrón-Cedeño¹, Paolo Rosso¹•Institutions (1)

Polytechnic University of Valencia¹

18 Apr 2009

TL;DR: The authors' experiments with the METER corpus show that the best results are obtained when considering low level word n -grams comparisons (n = {2,3}) and the definition of proper text chunks as comparison units of the suspicious and original texts is crucial.

...read moreread less

Abstract: When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text. The definition of proper text chunks as comparison units of the suspicious and original texts is crucial for the success of this kind of applications. Our experiments with the METER corpus show that the best results are obtained when considering low level word n -grams comparisons (n = {2,3}).

...read moreread less

127 citations

ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection ∗

[...]

Cristian Grozea, Fraunhofer First, Christian Gehl, Marius Popescu

01 Jan 2009

TL;DR: A new general plagiarism detection method, that was used in the winning entry to the 1 st International Competition on Plagia- rism Detection, the external plagiarism Detection task, which assumes the source documents are available.

...read moreread less

Abstract: In this paper we describe a new general plagiarism detection method, that we used in our winning entry to the 1 st International Competition on Plagia- rism Detection, the external plagiarism detection task, which assumes the source documents are available. In the first phase of our method, a matrix of kernel values is computed, which gives a similarity value based on n-grams between each source and each suspicious document. In the second phase, each promising pair is further investigated, in order to extract the precise positions and lengths of the subtexts that have been copied and maybe obfuscated - using encoplot, a novel linear time pairwise sequence matching technique. We solved the significant computational chal- lenges arising from having to compare millions of document pairs by using a library developed by our group mainly for use in network security tools. The performance achieved is comparing more than 49 million pairs of documents in 12 hours on a single computer. The results in the challenge were very good, we outperformed all other methods.

...read moreread less

126 citations

Journal Issue•DOI•

Efficient plagiarism detection for large code repositories

[...]

Steven Burrows¹, S. M. M. Tahaghoghi¹, Justin Zobel¹•Institutions (1)

RMIT University¹

01 Feb 2007-Software - Practice and Experience

TL;DR: This paper proposes techniques for detecting plagiarism in program code using text similarity measures and local alignment and shows that their approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems.

...read moreread less

Abstract: Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems. Copyright © 2006 John Wiley & Sons, Ltd.

...read moreread less

118 citations

Journal Article•

Prevalence of plagiarism among medical students.

[...]

Lidija Bilić-Zulle, Frković, Turk T, Azman J, Mladen Petrovečki - Show less +1 more

01 Jan 2005-Croatian Medical Journal

TL;DR: Plagiarism in writing essays is common among medical students and an explicit warning is not enough to deter students from plagiarism, according to a study of second year medical students attending Medical Informatics course.

...read moreread less

Abstract: Aim. To determine the prevalence of plagiarism among medical students in writing essays. Methods. During two academic years, 198 second year medical students attending Medical Informatics course wrote an essay on one of four offered articles. Two of the source articleswere available in an electronic form and two in printed form. Two (one electronic and one paper article) were considered less complex and the other two more complex. The essays were examined using plagiarism detection software “ WCopyfind, ” which counted the number of words from matching phrases with six or more words. Plagiarism rate, expressed as the percentage of the plagiarized text, was calculated as a ratio of the absolute number of matching words and the total number of words in the essay. Results. Only 17 (9%) of students did not plagiarize at all and 68 (34%) plagiarized less than 10% of the text. The average plagiarism rate (% of plagiarized text) was 19% (5-95% percentile=0-88). Students who were strictly warned not to plagiarize had a higher total word count in their essays than students who were not warned (P=0.002) but there was no difference between them in the rate of plagiarism. Students with higher grades in Medical Informatics exam plagiarized less than those with lower grades (P=0.015). Gender, subject source, and complexity had no influence on the plagiarism rate. Conclusions.Plagiarism in writing essays is common among medical students. An explicit warning is not enough to deter students from plagiarism. Detection software can be used to trace and evaluate the rate of plagiarism in written student assays.

...read moreread less

118 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics