Topic
Plagiarism detection
About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.
Papers published on a yearly basis
Papers
More filters
01 Jan 2010
TL;DR: The approach in detecting external plagiarism is reported, which identifies non-English documents and translates them into English using an online translator tool and retrieves the top documents that are similar to the suspicious documents.
Abstract: In this paper, we report our approach in detecting external plagiarism. For the pre-processing stage, we identify non-English documents and translate them into English using an online translator tool. Then we index and retrieve the top documents that are similar to the suspicious documents. We divide the retrieved documents into passages where each passage contains twenty sentences. The plagiarism is detected by identifying the number of overlapped words between suspicious and source passages.
8 citations
01 Jan 2012
TL;DR: This work examines a simplified approach for unsupervised authorship and plagiarism detection which is based on binary bag of words representation and proves to be successful achieving overall average accuracy of 84% over and a 2nd place rank in the competition.
Abstract: Identifying writing style shifts and variations are fundamental capabilities when addressing authorship related tasks. In this work we examine a simplified approach for unsupervised authorship and plagiarism detection which is based on binary bag of words representation. We evaluate our approach using PAN-2012 Authorship Attribution challenge data, which includes both open/closed class authorship identification and intrinsic plagiarism tasks. Our approach proved to be successful achieving overall average accuracy of 84% over and a 2nd place rank in the competition.
8 citations
••
01 Jan 2017TL;DR: This similarity detection method for language independent source code similarity detection is based on idea of maximum reusability of standard Unix filters and achieved significantly better results than competitors, which are considered as gold standard in plagiarism detection.
Abstract: The paper describes similarity detection method for language independent source code similarity detection. It is based on idea of maximum reusability of standard Unix filters. This method was implemented and benchmarked with different datasets from real world (students' assignments) and also synthetic datasets (perfect plagiarism experiment). Our method achieved significantly better results than competitors, which are considered as gold standard in plagiarism detection.
8 citations
••
17 Jan 2013TL;DR: This paper proposes a novel software similarity analysis tool, based on runtime semantic information, that focuses on AMC (Abstract Memory Context) that can represent the intrinsic features of the analyzed code and introduces two advanced techniques, namely region filtering and sequence alignment for AMC comparison.
Abstract: Analyzing software similarity has emerged as a key ingredient for various applications such as software maintenance, bug finding, malware clustering and copyright protection. In this paper, we propose a novel software similarity analysis tool for plagiarism detection. The tool, which we refer to it as RAMC (Runtime Abstract Memory Context based tool), has the following three characteristics. First, it is based on runtime semantic information, which makes it feasible to investigate similarity in binary codes, without source codes. Second, among runtime semantic information, it focuses on AMC (Abstract Memory Context) that can represent the intrinsic features of the analyzed code. Finally, it introduces two advanced techniques, namely region filtering and sequence alignment for AMC comparison. Real implementation based experiments have shown that RAMC can identify similarity appropriately between the original and plagiarized binaries.
8 citations
••
27 May 2019TL;DR: The experimental results show that MAF can effectively improve the performance of similarity measures for test code plagiarism detection, and it can also be extended to use in test recommendation, test reuse and other engineering applications.
Abstract: Software engineering education becomes popular due to the rapid development of the software industry. In order to reduce learning costs and improve learning efficiency, some online practice platforms have emerged. This paper proposes a novel test code plagiarism detection technology, namely MAF, by introducing bidirectional static slicing to anchor methods under test and extract fragments of test codes. Combined with similarity measures, MAF can achieve effective plagiarism detection by avoiding massive unrelated noisy test codes. The experiment is conducted on the dataset of Mooctest, which so far has supported hundreds of test activities around the world in the past 3 years. The experimental results show that MAF can effectively improve the performance (precision, recall and F1-measure) of similarity measures for test code plagiarism detection. We believe that MAF can further expand and promote software testing education, and it can also be extended to use in test recommendation, test reuse and other engineering applications.
8 citations