scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Book
05 Mar 2012
TL;DR: Software similarity and classification is an emerging topic with wide applications applicable to the areas of malware detection, software theft detection, plagiarism detection, and software clone detection and demonstrates that considering these applied problems as a similarity and Classification problem enables techniques to be shared between areas.
Abstract: Software similarity and classification is an emerging topic with wide applications. It is applicable to the areas of malware detection, software theft detection, plagiarism detection, and software clone detection. Extracting program features, processing those features into suitable representations, and constructing distance metrics to define similarity and dissimilarity are the key methods to identify software variants, clones, derivatives, and classes of software. Software Similarity and Classification reviews the literature of those core concepts, in addition to relevant literature in each application and demonstrates that considering these applied problems as a similarity and classification problem enables techniques to be shared between areas. Additionally, the authors present in-depth case studies using the software similarity and classification techniques developed throughout the book.

54 citations

Journal ArticleDOI
TL;DR: The sobering results show that although some web-based text-matching systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.
Abstract: There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

53 citations

Journal ArticleDOI
TL;DR: The future belongs to the algorithms that will be able to handle large amount of source code and should use one of model-based representations, which can be used for formation of large-scale anti-plagiarism systems.

53 citations

Journal ArticleDOI
TL;DR: The authors argue that plagiarism detection systems are often implemented with inappropriate assumptions about plagiarism and the way in which new members of a community of practice develop the skills to become full members of that community.
Abstract: This paper argues that the inappropriate framing and implementation of plagiarism detection systems in UK universities can unwittingly construct international students as ‘plagiarists’. It argues that these systems are often implemented with inappropriate assumptions about plagiarism and the way in which new members of a community of practice develop the skills to become full members of that community. Drawing on the literature and some primary data it shows how expectations, norms and practices become translated and negotiated in such a way that legitimate attempts to conform with the expectations of the community of practice often become identified as plagiarism and illegitimate attempts at cheating often become obscured from view. It argues that this inappropriate framing and implementation of plagiarism detection systems may make academic integrity more illusive rather than less. It argues that in its current framing – as systems for ‘detection and discipline’ – plagiarism detection systems may become a new micro-politics of power with devastating consequences for those excluded.

52 citations

Journal ArticleDOI
TL;DR: A new type of software birthmark called DYnamic Key Instruction Sequence (DYKIS) that can be extracted from an executable without the need for source code is proposed that is resilient to both weak obfuscation techniques such as compiler optimizations and strong obfuscations implemented in tools such as SandMark, Allatori and Upx.
Abstract: A software birthmark is a unique characteristic of a program. Thus, comparing the birthmarks between the plaintiff and defendant programs provides an effective approach for software plagiarism detection. However, software birthmark generation faces two main challenges: the absence of source code and various code obfuscation techniques that attempt to hide the characteristics of a program. In this paper, we propose a new type of software birthmark called DYnamic Key Instruction Sequence (DYKIS) that can be extracted from an executable without the need for source code. The plagiarism detection algorithm based on our new birthmarks is resilient to both weak obfuscation techniques such as compiler optimizations and strong obfuscation techniques implemented in tools such as SandMark , Allatori and Upx . We have developed a tool called DYKIS-PD (DYKIS Plagiarism Detection tool) and conducted extensive experiments on large number of binary programs. The tool, the benchmarks and the experimental results are all publicly available.

52 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125