scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings Article
23 Aug 2010
TL;DR: Two recently proposed cross-language plagiarism detection methods are compared to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA), and the effectiveness of the three approaches for less related languages is explored.
Abstract: Plagiarism, the unacknowledged reuse of text, does not end at language boundaries. Cross-language plagiarism occurs if a text is translated from a fragment written in a different language and no proper citation is provided. Regardless of the change of language, the contents and, in particular, the ideas remain the same. Whereas different methods for the detection of monolingual plagiarism have been developed, less attention has been paid to the cross-language case. In this paper we compare two recently proposed cross-language plagiarism detection methods (CL-CNG, based on character n-grams and CL-ASA, based on statistical translation), to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA). We explore the effectiveness of the three approaches for less related languages. CL-CNG shows not be appropriate for this kind of language pairs, whereas T+MA performs better than the previously proposed models.

86 citations

01 Jan 2006
TL;DR: This work proposes a novel approach, XPlag, to detect plagiarism involving multiple languages using intermediate program code produced by a compiler suite, and shows that it can detect inter-lingual plagiarism with reasonably good precision.
Abstract: Plagiarism is a widespread problem in assessment tasks; in computing courses, students often plagiarise source code. For all but the smallest classes, manual detection of such plagiarism is impractical, and, while automated tools are available, none has been applied to detect inter-lingual plagiarism, where source code is copied from one language to another. In this work, we propose a novel approach, XPlag, to detect plagiarism involving multiple languages using intermediate program code produced by a compiler suite. We describe experiments to evaluate XPlag, and show that we can detect inter-lingual plagiarism with reasonably good precision.

85 citations

Journal ArticleDOI
TL;DR: The authors used electronic plagiarism detection tools to help students understand correct academic practice in using source material, and found that 41% of students had submitted work identified by Turnitin as possible plagiarism but this reduced to 26% on inspection by academics.
Abstract: Lessons on paraphrasing and citing sources can only be partially effective if they are not perceived as immediately relevant to the individual student. We used electronic plagiarism detection tools to help students understand correct academic practice in using source material. In order to produce an essay on a specified topic, students were required to summarise a number of research papers. The 182 students who took part in this exercise were studying one-year Masters programmes in Computer Science, Automotive Engineering, and Electronics, mainly from China, India and Pakistan and new to the University. These students should have been building on previous study both in subject matter and study skills, but before they tackled the assignment, a series of lectures gave guidance on finding and summarising sources, and reminded students about what constitutes plagiarism. The students' essays were submitted to Turnitin and Ferret -- a straightforward, but resource intensive process -- and the resulting reports used to give individual feedback to students on how original their words appeared to be. This was effective in helping the students to understand plagiarism, because the reports identified plagiarised passages in their own work. Using a threshold of 15% of matching text, we found 41% of students had submitted work identified by Turnitin as possible plagiarism but this reduced to 26% on inspection by academics. After a second submission, incidence of plagiarism dropped to 3% overall. We found that the degree of matching text found correlated with a student's programme of study, but not with nationality.

85 citations

Journal ArticleDOI
TL;DR: Using Turnitin formatively was viewed positively by staff and students, and although the incidence of plagiarism did not reduce because of a worsening of referencing and citation skills, the approach encouraged students to develop their writing.
Abstract: New students face the challenge of making a smooth transition between school and university, and with regards to academic practice, there are often gaps between student expectations and university requirements. This study supports the use of the plagiarism detection service Turnitin to give students instant feedback on essays to help improve academic literacy. A student cohort ( n = 76) submitted draft essays to Turnitin and received instruction on how to interpret the 'originality report' themselves for feedback. The impact of this self-service approach was analysed by comparing the writing quality and incidence of plagiarism in draft and final essays, and comparing the results to a previous cohort ( n = 80) who had not used Turnitin formatively. Student and staff perceptions were explored by interview and questionnaire. Using Turnitin formatively was viewed positively by staff and students, and although the incidence of plagiarism did not reduce because of a worsening of referencing and citation skills, the approach encouraged students to develop their writing. To conclude, students were positive of their experience of using Turnitin. Further work is required to understand how to use the self-service approach more effectively to improve referencing and citation, and narrow the gap between student expectations and university standards. [ABSTRACT FROM AUTHOR]

85 citations

01 Jan 2010
TL;DR: Although the fuzzy semantic-based method can detect some means of obfuscation, it might not work at all levels and future work is to improve it for more detection efficiency and less time complexity, and to advance the post-processing stage to gain more ideal granularity.
Abstract: This report explains our plagiarism detection method using fuzzy semantic-based string similarity approach. The algorithm was developed through four main stages. First is pre-processing which includes tokenisation, stemming and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using shingling and Jaccard coefficient. Suspicious documents are then compared sentence-wise with the associated candidate documents. This stage entails the computation of fuzzy degree of similarity that ranges between two edges: 0 for completely different sentences and 1 for exactly identical sentences. Two sentences are marked as similar (i.e. plagiarised) if they gain a fuzzy similarity score above a certain threshold. The last step is post-processing whereby consecutive sentences are joined to form single paragraphs/sections. Our performance measures on PAN’09 training corpus for external plagiarism detection task (recall=0.3097, precision=0.5424, granularity=7.8867) indicates that about 54% of our detections are correct while we detect only 30% of the plagiarism cases. The performance measures on PAN’10 test collection is less (recall= 0.1259, precision= 0.5761, granularity= 3.5828), due to the fact that our algorithm handles external plagiarism detection but neither intrinsic nor cross-lingual. Although our fuzzy semantic-based method can detect some means of obfuscation, it might not work at all levels. Our future work is to improve it for more detection efficiency and less time complexity. In particular, we need to advance the post-processing stage to gain more ideal granularity.

84 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125