scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings ArticleDOI
12 Jun 2008
TL;DR: Some issues that might be raised in employing Turnitin are highlighted and some approaches that academics might utilise to allow efficient use of the system are suggested.
Abstract: The Turnitin plagiarism detection system allows individual student assignments to be uploaded and matched for similarity with content on the web, all other assignments uploaded by institutions using the system and certain journals. An online report is produced for each submission identifying the sources of those similarities and the percentage match. There is a significant benefit in using Turnitin to identify possible cases of plagiarism. This paper highlights some issues that might be raised in employing Turnitin and suggests some approaches that academics might utilise to allow efficient use of Turnitin.

29 citations

Posted Content
TL;DR: A taxonomy of various plagiarism forms is presented and include discussion on each of these forms to highlight a list of issues and research challenges related to this evolving research problem.
Abstract: To detect plagiarism of any form, it is essential to have broad knowledge of its possible forms and classes, and existence of various tools and systems for its detection. Based on impact or severity of damages, plagiarism may occur in an article or in any production in a number of ways. This survey presents a taxonomy of various plagiarism forms and include discussion on each of these forms. Over the years, a good number tools and techniques have been introduced to detect plagiarism. This paper highlights few promising methods for plagiarism detection based on machine learning techniques. We analyse the pros and cons of these methods and finally we highlight a list of issues and research challenges related to this evolving research problem.

29 citations

Proceedings ArticleDOI
06 Nov 2017
TL;DR: The results show that mathematical expressions are promising text-independent features to identify academic plagiarism in large collections and an open source parallel data processing pipeline built using the Apache Flink framework is developed.
Abstract: This paper presents, to our knowledge, the first study on analyzing mathematical expressions to detect academic plagiarism. We make the following contributions. First, we investigate confirmed cases of plagiarism to categorize the similarities of mathematical content commonly found in plagiarized publications. From this investigation, we derive possible feature selection and feature comparison strategies for developing math-based detection approaches and a ground truth for our experiments. Second, we create a test collection by embedding confirmed cases of plagiarism into the NTCIR-11 MathIR Task dataset, which contains approx. 60 million mathematical expressions in 105,120 documents from arXiv.org. Third, we develop a first math-based detection approach by implementing and evaluating different feature comparison approaches using an open source parallel data processing pipeline built using the Apache Flink framework. The best performing approach identifies all but two of our real-world test cases at the top rank and achieves a mean reciprocal rank of 0.86. The results show that mathematical expressions are promising text-independent features to identify academic plagiarism in large collections. To facilitate future research on math-based plagiarism detection, we make our source code and data available.

28 citations

Journal ArticleDOI
TL;DR: The purpose of this research is to uncover potential cases of source code reuse in large‐scale environments by using an automatic system based on the comparison of programs at character level to find similarities among multiple sets of source codes.
Abstract: The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. © 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608

28 citations

Journal ArticleDOI
TL;DR: In this article, a joint word-embedding model for long documents in the academic domain is proposed to improve the semantic representation quality of word vectors by incorporating a domain-specific semantic relation constraint into the traditional context constraint.

28 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125