scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: DOCODE 3.0 is presented, a Web system for educational institutions that performs automatic analysis of large quantities of digital documents in relation to their degree of originality, and produces a number of visualizations and reports to let teachers and professors gain insights on the originality of the documents they review.

31 citations

Journal ArticleDOI
TL;DR: In this article, a case of serial plagiarism in the work of a graduate student in an online distance education program is discussed, and the complexity of the student's thinking and the manner in which the case was handled by the teacher and the university.
Abstract: The ease with which material may be ‘copied and pasted’ from the Internet into written work is raising concern in educational institutions, and particularly in those disciplines that use online sources and methods in their curriculum. A case of ‘serial plagiarism’ is discussed, in the work of a graduate student in an online distance education program. The complexity of the student’s thinking is emphasized, and the manner in which the case was handled by the teacher and the university. The use of an online plagiarism‐checking technology (Turnitin.com) and the value of such services are discussed. The case illustrates the importance of explaining the precise nature of plagiarism to students, of providing clear warnings about its consequences and of developing a careful institutional approach to plagiarism detection and prevention.

31 citations

Proceedings ArticleDOI
Jonathan Helfman1
04 Oct 1994
TL;DR: Dotplot is a technique for visualizing patterns of string matches in millions of lines of text and code that identify subtler relationships in text analysis, software engineering, and information retrieval.
Abstract: Dotplot is a technique for visualizing patterns of string matches in millions of lines of text and code. Patterns may be explored interactively or detected automatically. Applications include text analysis (author identification, plagiarism detection, translation alignment, etc.), software engineering (module and version identification, subroutine categorization, redundant code identification, etc.), and information retrieval (identification of similar records in results of queries). Patterns are interpreted though a visual language. Squares identify unordered matches (documents with lots of matching words or subroutines with lots of matching symbols), while diagonals identify ordered matches (copies, versions, and translations). Patterns of squares and diagonals have more complex interpretations that identify subtler relationships. >

31 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: This paper proposes a technique based on textual similarity for external plagiarism detection that uses an approach based on the traditional Vector Space Model (VSM) for this candidate selection.
Abstract: Plagiarism denotes the act of copying someone else's idea (or, works) and claiming it as his/her own. Plagiarism detection is the procedure to detect the texts of a given document which are plagiarized, i.e. copied from from some other documents. Potential challenges are due to the facts that plagiarists often obfuscate the copied texts; might shuffle, remove, insert, or replace words or short phrases; might also restructure the sentences replacing words with synonyms; and changing the order of appearances of words in a sentence. In this paper we propose a technique based on textual similarity for external plagiarism detection. For a given suspicious document we have to identify the set of source documents from which the suspicious document is copied. The method we propose comprises of four phases. In the first phase, we process all the documents to generate tokens, lemmas, finding Part-of-Speech (PoS) classes, character-offsets, sentence numbers and named-entity (NE) classes. In the second phase we select a subset of documents that may possibly be the sources of plagiarism. We use an approach based on the traditional Vector Space Model (VSM) for this candidate selection. In the third phase we use a graph-based approach to find out the similar passages in suspicious document and selected source documents. Finally we filter out the false detections1.

31 citations

Journal ArticleDOI
TL;DR: This work presents an approach called program it yourself (PIY) which is empirically shown to outperform MOSS in detection accuracy, and is also capable of maintaining detection accuracy and reasonable runtimes even when using extremely large data repositories.
Abstract: Vast amounts of information available online make plagiarism increasingly easy to commit, and this is particularly true of source code. The traditional approach of detecting copied work in a course setting is manual inspection. This is not only tedious but also typically misses code plagiarized from outside sources or even from an earlier offering of the course. Systems to automatically detect source code plagiarism exist but tend to focus on small submission sets. One such system that has become the standard in automated source code plagiarism detection is measure of software similarity (MOSS) Schleimer et al. in proceedings of the 2003 ACM SIGMOD international conference on management of data, ACM, San Diego, 2003. In this work, we present an approach called program it yourself (PIY) which is empirically shown to outperform MOSS in detection accuracy. By utilizing parallel processing and data clustering, PIY is also capable of maintaining detection accuracy and reasonable runtimes even when using extremely large data repositories.

31 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125