scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings ArticleDOI
01 Nov 2008
TL;DR: A numerical based comparison algorithm is proposed that is comparable in the computation time without loosing the word order of common parts in full text document plagiarism.
Abstract: Plagiarism is a form of academic misconduct which has increased with the easy access to obtain information through electronic documents and the Internet. The problem of finding document plagiarism in full text document can be viewed as a problem of finding the longest common parts of strings. Moreover, the detection system has to be capable to determine and visualize not only the common parts but also the location of the common parts in both the source and the observed document. Unlike previous research, this paper proposes a numerical based comparison algorithm that is comparable in the computation time without loosing the word order of common parts. Based on the experiment, the proposed algorithm outperforms the suffix tree in the length of observed paragraph below one hundred words.

10 citations

Journal ArticleDOI
TL;DR: An empirical comparison against multiple state-of-the-art plagiarism detection techniques using several sets of real students’ programs collected in early programming courses demonstrated that AuDeNTES identifies more plagiarism cases than the other techniques at the cost of a small additional inspection effort.
Abstract: In academic courses, students frequently take advantage of someone else’s work to improve their own evaluations or grades. This unethical behavior seriously threatens the integrity of the academic system, and teachers invest substantial effort in preventing and recognizing plagiarism.When students take examinations requiring the production of computer programs, plagiarism detection can be semiautomated using analysis techniques such as JPlag and Moss. These techniques are useful but lose effectiveness when the text of the exam suggests some of the elements that should be structurally part of the solution. A loss of effectiveness is caused by the many common parts that are shared between programs due to the suggestions in the text of the exam rather than plagiarism.In this article, we present the AuDeNTES anti-plagiarism technique. AuDeNTES detects plagiarism via the code fragments that better represent the individual students’ contributions by filtering from students’ submissions the parts that might be common to many students due to the suggestions in the text of the exam. The filtered parts are identified by comparing students’ submissions against a reference solution, which is a solution of the exam developed by the teachers. Specifically, AuDeNTES first produces tokenized versions of both the reference solution and the programs that must be analyzed. Then, AuDeNTES removes from the tokenized programs the tokens that are included in the tokenized reference solution. Finally, AuDeNTES computes the similarity among the filtered tokenized programs and produces a ranked list of program pairs suspected of plagiarism.An empirical comparison against multiple state-of-the-art plagiarism detection techniques using several sets of real students’ programs collected in early programming courses demonstrated that AuDeNTES identifies more plagiarism cases than the other techniques at the cost of a small additional inspection effort.

10 citations

01 Jan 2010
TL;DR: The authors' algorithm for detecting external plagiarism in PAN-10 competition has two steps Identification of similar documents and the plagiarized section for a suspicious document with the source documents using Vector Space Model and cosine similarity measure.
Abstract: Here we describe our algorithm for detecting external plagiarism in PAN-10 competition. The algorithm has two steps 1. Identification of similar documents and the plagiarized section for a suspicious document with the source documents using Vector Space Model (VSM) and cosine similarity measure and 2. Identify the plagiarized area in the suspicious document using Chunk ratio.

10 citations

01 Jan 2016
TL;DR: A review of the state-of-the-art software plagiarism detection techniques according to the scenarios they are designed for and applicable to as well as different principles adopted.
Abstract: With the burst of free and open source software projects, software plagiarism has become a serious threat to the healthy development of the software ecosystem. Researchers, educators, open source developers, and software company managers are paying more and more attention to the problem. Software plagiarism detection is critical to the protection of software intellectual property. This paper provides a review of the state-of-the-art software plagiarism detection techniques. First, the significance and threat models of plagiarism detection are presented, followed by the description and comparison of existing techniques on plagiarism detection. We classify the existing methods into three major categories, including source-code plagiarism detection, software watermark based plagiarism detection and software birthmark based plagiarism detection, according to the scenarios they are designed for and applicable to as well as different principles adopted. Finally, through analyzing the limitations of the existing plagiarism detection techniques, the emerging challenges and practical requirements, we discuss several possible future research directions.

10 citations

Book ChapterDOI
13 Jun 2018
TL;DR: A Hybrid Arabic Plagiarism Detection System (HYPLAG), which combines corpus-based and knowledge-based approaches by utilizing an Arabic semantic resource (Arabic WordNet) and obtaining a higher performance with less computational time.
Abstract: Plagiarism is specifically defined as literary theft of paragraphs or sentences from unreferenced source. This unauthorized behavior is a real problem that targets scientific research scope. This paper proposes a Hybrid Arabic Plagiarism Detection System (HYPLAG). The HYPLAG approach combines corpus-based and knowledge-based approaches by utilizing an Arabic semantic resource (Arabic WordNet). A preliminary study on texts from undergraduate students was conducted to understand their behavior and the patterns used in plagiarism. The results of the study show that students apply different techniques to plagiarized sentences, also it shows changes in sentence’s components (verbs, nouns, and adjectives). HYPLAG was evaluated on the ExAraPlagDet-2015 dataset against several other approaches that participated in the AraPlagDet PAN@FIRE shared task on Extrinsic Arabic plagiarism detection obtaining a higher performance (F-score 89% vs. 84% obtained by the best performing system at AraPlagDet) with less computational time.

10 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125