scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposed a deep learning-based approach to indicate how original and suspect documents expressed the same meaning in Arabic language, achieving good results enhancing an efficient contextual relationship detection between Arabic documents in terms of precision and recall.
Abstract: The continuous increase in extraordinary textual sources on the web has facilitated the act of paraphrase. Its detection has become a challenge in different natural language processing applications (e.g., plagiarism detection, information retrieval and extraction, question answering, etc.). Different from western languages like English, few works have been addressed the problem of extrinsic paraphrase detection in Arabic language. In this context, we proposed a deep learning-based approach to indicate how original and suspect documents expressed the same meaning. Indeed, word2vec algorithm extracted the relevant features by predicting each word to its neighbors. Subsequently, averaging the obtained vectors was efficient for generating sentence vectors representations. Then, convolutional neural network was useful to capture more contextual information and compute the degree of semantic relatedness. Faced to the lack of resources publicly available, paraphrased corpus was developed using skip gram model. It had better performance in replacing an original word by its most similar one that had the same grammatical class from a vocabulary. Finally, the proposed system achieved good results enhancing an efficient contextual relationship detection between Arabic documents in terms of precision (85%) and recall (86.8%) than previous studies.

25 citations

Proceedings ArticleDOI
02 Apr 2004
TL;DR: A detection framework with the following salient features is devised: designs, instead of code, are compared; multi--level abstractions of the design are generated; and comparison follows a stepwise process according to the abstraction levels.
Abstract: Detecting plagiarism in software is a computationally complex process. At the same time it is critical, for the lack of a deterrent through detection may result in various losses. Several systems to detect plagiarism have been proposed. However, their lexically-based analysis is not powerfull enough and can be foiled with minimal efforts. To address their shortcomings, we have devised a detection framework with the following salient features: (1) designs, instead of code, are compared; (2) multi--level abstractions of the design are generated; and (3) comparison follows a stepwise process according to the abstraction levels. A comparison with existing systems shows that this strategy results in simpler algorithms and more accurate analyses.

25 citations

Posted Content
TL;DR: A new approach to detect plagiarism is proposed which integrates the use of fingerprint matching technique with four key features to assist in the detection process and time and space usage for the comparison process is reduced without affecting the effectiveness of the plagiarism detection.
Abstract: As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is reduced without affecting the effectiveness of the plagiarism detection.

25 citations

Proceedings ArticleDOI
01 Oct 2007
TL;DR: This work introduces a new two-step approach to plagiarism detection that combines high algorithmic performance and the quality of pairwise file comparison, and shows that the proposed method does not noticeably reduce thequality of the pairwise comparison mechanism while providing better speed characteristics.
Abstract: Plagiarism and similarity detection software is well-known in universities for years. Despite the variety of methods and approaches used in plagiarism detection, the typical trade-off between the speed and the reliability of the algorithm still remains. We introduce a new two-step approach to plagiarism detection that combines high algorithmic performance and the quality of pairwise file comparison. Our system uses fast detection method to select suspicious files only, and then invokes precise (and slower) algorithms to get reliable results. We show that the proposed method does not noticeably reduce the quality of the pairwise comparison mechanism while providing better speed characteristics.

25 citations

Journal ArticleDOI
TL;DR: The findings revealed that existing plagiarism detection techniques require further enhancements as existing techniques are incapable of efficiently detecting plagiarised ideas, figures, tables, formulas and scanned documents.
Abstract: Purpose – The purpose of this paper is to analyse the state-of-the-art techniques used to detect plagiarism in terms of their limitations, features, taxonomies and processes. Design/methodology/approach – The method used to execute this study consisted of a comprehensive search for relevant literature via six online database repositories namely; IEEE xplore, ACM Digital Library, ScienceDirect, EI Compendex, Web of Science and Springer using search strings obtained from the subject of discussion. Findings – The findings revealed that existing plagiarism detection techniques require further enhancements as existing techniques are incapable of efficiently detecting plagiarised ideas, figures, tables, formulas and scanned documents. Originality/value – The contribution of this study lies in its ability to have exposed the current trends in plagiarism detection researches and identify areas where further improvements are required so as to complement the performances of existing techniques.

25 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125