scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Book ChapterDOI
22 Sep 2019
TL;DR: This paper investigates the issue of idea and figure plagiarism and proposes a detection method which copes with text and structure change and depends on finding similar semantic meanings between figures by applying image processing and semantic mapping techniques.
Abstract: Plagiarism is to steal others’ work using their words directly or indirectly without a credit citation. Copying others’ ideas is another type of plagiarism that may occur in many areas but the most serious one is the academic plagiarism. Academic misconduct forms high-profile plagiarism cases at universities. Therefore, technical solutions are strictly demanded for automatic idea plagiarism detection. Detection of figure plagiarism is a challenge field of research because not only the text analytics but also graphic features are analyzed. This paper investigates the issue of idea and figure plagiarism and proposes a detection method which copes with text and structure change. The procedure depends on finding similar semantic meanings between figures by applying image processing and semantic mapping techniques.

8 citations

Proceedings ArticleDOI
01 Oct 2007
TL;DR: Wang et al. as mentioned in this paper proposed an algorithm to construct the evolution tree (hylogenetic tree) for a set of similar program clones, which can be interchangeably applied for both these purposes in student assignment program domain.
Abstract: This paper addresses the evolution process of program source codes to establish the framework for software authorship identification. Since program code cheating is getting serious in academic institutions, we will be able to apply the software authorship identification tool as a detection tool for code plagiarism. The main contribution of our work is twofold. First, we have devised new asymmetric distance measure to compute the distance of authorship between two different programs. Second, we have proposed an algorithm to construct the evolution tree(hylogenetic tree) for a set of similar program clones. For the experiment we have gathered two set of codes: a set of assignment programs and another program set which have been submitted to the ICPC, an international programming contests. Our experiment showed that our distance measure for program sources has successfully identified the code authorship and has also reliably detected plagiarized programs. This experiment has showed a strong possibility that the proposed construction algorithm for phylogenetic forest can be used to trace the evolution(improving) process of software. This paper shows the confidence of the authorship identification and plagiarism detection can be interchangeably applied for both these purposes in student assignment program domain.

8 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A tool named Code ObfuscAtion Tool (COAT) that takes a program source code as input and produces another source code that is exactly equivalent to the input source code in their functional behaviors but with a different structure, demonstrating how reliable source code plagiarism detection tools are.
Abstract: There exist many plagiarism detection tools to uncover plagiarized codes by analyzing the similarity of source codes. To measure how reliable those plagiarism detection tools are, we developed a tool named Code ObfuscAtion Tool (COAT) that takes a program source code as input and produces another source code that is exactly equivalent to the input source code in their functional behaviors but with a different structure. In COAT, we particularly considered the eight representative obfuscation techniques (e.g., modifying control flow or inserting dummy codes) to test the performance of source code plagiarism detection tools. To show the practicality of COAT, we gathered 69 source codes and then tested those source codes with the four popularly used source code plagiarism detection tools (Moss, JPlag, SIM and Sherlock). In these experiments, we found that the similarity scores between the original source codes and their obfuscated plagiarized codes are very low; the mean similarity scores only ranged from 4.00 to 16.20 where the maximum possible score is 100. These results demonstrate that all the tested tools have clear limitations in detecting the plagiarized codes generated with combined code obfuscation techniques.

8 citations

01 Jan 2014
TL;DR: The design, techniques and learning models adopted for the PAN-2014 Author Profiling challenge indicate that readability metrics, function words and structural features play a vital role in identifying the age and gender of an author.
Abstract: With the evolution of internet, author profiling has become a topic of great interest in the field of forensics, security, marketing, plagiarism detection etc. However the task of identifying the characteristics of the author just based on a text document has its own limitations and challenges. This paper reports on the design, techniques and learning models we adopted for the PAN-2014 Author Profiling challenge. To identify the age and gender of an author from a document we employed ensemble learning approach by training a Random Forest classifier with the training data provided by PAN organizers for English language only. Our work indicate that readability metrics, function words and structural features play a vital role in identifying the age and gender of an author.

8 citations

Book ChapterDOI
08 Dec 2012
TL;DR: It has been concluded that not all arguments in the text affect the plagiarism detection process, and only the most important arguments were selected by the FIS, and the results have been used in the similarity calculation process.
Abstract: This paper introduces a plagiarism detection scheme based on a Fuzzy Inference System and Semantic Role Labeling (FIS-SRL). The proposed technique analyses and compares text based on a semantic allocation for each term inside the sentence. SRL offers significant advantages when generating arguments for each sentence semantically. Voting for each argument generated by the FIS in order to select important arguments is also another feature of the proposed method. It has been concluded that not all arguments in the text affect the plagiarism detection process. Therefore, only the most important arguments were selected by the FIS, and the results have been used in the similarity calculation process. Experimental tests have been applied on the PAN-PC-09 data set and the results shows that the proposed method exhibits a better performance than the available recent methods of plagiarism detection, in terms of Recall, Precision and F-measure.

8 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125