scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2007
TL;DR: Current research in the field of automatic plagiarism detection for text documents focuses on the development of algorithms that compare suspicious documents against potential original documents.
Abstract: Current research in the field of automatic plagiarism detection for text documents focuses on the development of algorithms that compare suspicious documents against potential original documents. Although recent approaches perform well in identifying copied or even modified passages ([Brin et al. (1995), Stein (2005)]), they assume a closed world where a reference collection must be given (Finkel (2002)). Recall that a human reader can identify suspicious passages within a document without having a library of potential original documents in mind.

113 citations

Journal ArticleDOI
TL;DR: This paper reviews the literature on the major modern detection engines, providing a comparison of them based upon the metrics and techniques they deploy, and shows that whilst detection is well established there are still places where further research would be useful, particularly where visual support of the investigation process is possible.
Abstract: Automated techniques for finding plagiarism in student source code submissions have been in use for over 20 years and there are many available engines and services. This paper reviews the literature on the major modern detection engines, providing a comparison of them based upon the metrics and techniques they deploy. Generally the most common and effective techniques are seen to involve tokenising student submissions then searching pairs of submissions for long common substrings, an example of what is defined to be a paired structural metric. Computing academics are recommended to use one of the two Web-based detection engines, MOSS and JPlag. It is shown that whilst detection is well established there are still places where further research would be useful, particularly where visual support of the investigation process is possible.

113 citations

01 Jan 2003
TL;DR: The nature of the plagiarism problem is explored, and the approaches used so far for its detection are summarized, and a number of methods used to measure text reuse are discussed.
Abstract: Automatic methods of measuring similarity between program code and natural language text pairs have been used for many years to assist humans in detecting plagiarism. For example, over the past thirty years or so, a vast number of approaches have been proposed for detecting likely plagiarism between programs written by Computer Science students. However, more recently, approaches to identifying similarities between natural language texts have been addressed, but given the ambiguity and complexity of natural over program languages, this task is very difficult. Automatic detection is gaining further interest from both the academic and commercial worlds given the ease with which texts can now be found, copied and rewritten. Following the recent increase in the popularity of on-line services offering plagiarism detection services and the increased publicity surrounding cases of plagiarism in academia and industry, this paper explores the nature of the plagiarism problem, and in particular summarise the approaches used so far for its detection. I focus on plagiarism detection in natural language, and discuss a number of methods I have used to measure text reuse. I end by suggesting a number of recommendations for further work in the field of automatic plagiarism detection.

111 citations

Journal ArticleDOI
TL;DR: This paper focuses on how the delegation of plagiarism detection to a technical actor produces a particular set of agencies and intentionalities which unintentionally and unexpectedly conspires to constitute some students as plagiarists and others as not.

107 citations

Proceedings ArticleDOI
TL;DR: This research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis, which has many applications, such as cross-architecture vulnerability discovery and code plagiarism detection.
Abstract: Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.

106 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125