scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings ArticleDOI
25 May 2015
TL;DR: An automated assessment system for programming assignments that includes dynamic testing of student programs, plagiarism detection, and a proper presentation of the results is introduced.
Abstract: Modern teaching paradigms promote active student participation, encouraging teachers to adapt the teaching process to involve more practical work. In the introductory programming course at the Faculty of Computer and Information Science, University of Ljubljana, Slovenia, homework assignments contribute approximately one half to the total grade, requiring a significant investment of time and human resources in the assessment process. This problem was alleviated by the automated assessment of homework assignments. In this paper, we introduce an automated assessment system for programming assignments that includes dynamic testing of student programs, plagiarism detection, and a proper presentation of the results. We share our experience and compare the introduced system with the manual assessment approach used before.

26 citations

Journal ArticleDOI
TL;DR: Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB and Jaro-Winkler Distance algorithm has advantages in terms of time.
Abstract: Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.

26 citations

Book ChapterDOI
12 Sep 2011
TL;DR: This work includes the inclusion of text outlier detection methodologies to enhance both intrinsic and external plagiarism detection and shows that the approach is highly competitive with respect to the leading research teams in plagiarism Detection.
Abstract: Plagiarism detection, one of the main problems that educational institutions have been dealing with since the massification of Internet, can be considered as a classification problem using both self-based information and text processing algorithms whose computational complexity is intractable without using space search reduction algorithms. First, self-based information algorithms treat plagiarism detection as an outlier detection problem for which the classifier must decide plagiarism using only the text in a given document. Then, external plagiarism detection uses text matching algorithms where it is fundamental to reduce the matching space with text search space reduction techniques, which can be represented as another outlier detection problem. The main contribution of this work is the inclusion of text outlier detection methodologies to enhance both intrinsic and external plagiarism detection. Results shows that our approach is highly competitive with respect to the leading research teams in plagiarism detection.

26 citations

Proceedings ArticleDOI
28 Sep 2015
TL;DR: This paper focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task and the impact of utilizing part of speech tagging (POS) in the plagiarism Detection model is analyzed.
Abstract: Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN1 -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet_score.

26 citations

Proceedings ArticleDOI
Yikun Hu1, Yuanyuan Zhang1, Juanru Li1, Hui Wang1, Bodong Li1, Dawu Gu1 
19 Aug 2018
TL;DR: In this article, a semantics-based hybrid approach is proposed to detect binary code clone functions, where the semantic signatures are extracted during the execution of the template function and emulation of the target function.
Abstract: Binary code clone analysis is an important technique which has a wide range of applications in software engineering (e.g., plagiarism detection, bug detection). The main challenge of the topic lies in the semantics-equivalent code transformation (e.g., optimization, obfuscation) which would alter representations of binary code tremendously. Another challenge is the trade-off between detection accuracy and coverage. Unfortunately, existing techniques still rely on semantics-less code features which are susceptible to the code transformation. Besides, they adopt merely either a static or a dynamic approach to detect binary code clones, which cannot achieve high accuracy and coverage simultaneously. In this paper, we propose a semantics-based hybrid approach to detect binary clone functions. We execute a template binary function with its test cases, and emulate the execution of every target function for clone comparison with the runtime information migrated from that template function. The semantic signatures are extracted during the execution of the template function and emulation of the target function. Lastly, a similarity score is calculated from their signatures to measure their likeness. We implement the approach in a prototype system designated as BinMatch which analyzes IA-32 binary code on the Linux platform. We evaluate BinMatch with eight real-world projects compiled with different compilation configurations and commonly-used obfuscation methods, totally performing over 100 million pairs of function comparison. The experimental results show that BinMatch is robust to the semantics-equivalent code transformation. Besides, it not only covers all target functions for clone analysis, but also improves the detection accuracy comparing to the state-of-the-art solutions.

26 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125