scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work focuses on the similarity of different simple equations present in a document and can easily extract those equations from the documents, compare them even if the variables are changed in plagiarized document with the original one and can detect if the document is plagiarized or not.

7 citations

Posted Content
TL;DR: A new approach in detecting cross language plagiarism is proposed, which considers Bahasa Melayu as an input language of the submitted query document and English as a target language of similar, possibly plagiarised documents.
Abstract: As the Internet help us cross language and cultural border by providing different types of translation tools, cross language plagiarism, also known as translation plagiarism are bound to arise. Especially among the academic works, such issue will definitely affect the student's works including the quality of their assignments and paper works. In this paper, we propose a new approach in detecting cross language plagiarism. Our web based cross language plagiarism detection system is specially tuned to detect translation plagiarism by implementing different techniques and tools to assist the detection process. Google Translate API is used as our translation tool and Google Search API, which is used in our information retrieval process. Our system is also integrated with the fingerprint matching technique, which is a widely used plagiarism detection technique. In general, our proposed system is started by translating the input documents from Malay to English, followed by removal of stop words and stemming words, identification of similar documents in corpus, comparison of similar pattern and finally summary of the result. Three least-frequent 4-grams fingerprint matching is used to implement the core comparison phase during the plagiarism detection process. In K-gram fingerprint matching technique, although any value of K can be considered, yet K = 4 was stated as an ideal choice. This is because smaller values of K (i.e., K = 1, 2, or 3), do not provide good discrimination between sentences. On the other hand, the larger the values of K (i.e., K = 5, 6, 7...etc), the better discrimination of words in one sentence from words in another.

7 citations

Journal ArticleDOI
21 Jul 2020
TL;DR: A machine learning approach for plagiarism detection of programming assignments based on similarity score of n-grams, code style similarity and dead codes is proposed and xgboost model is used for training and predicting whether a pair of source code are plagiarised or not.
Abstract: Plagiarism in programming assignments has been increasing these days which affects the evaluation of students. Thispaper proposes a machine learning approach for plagiarism detection of programming assignments. Different features related to source code are computed based on similarity score of n-grams, code style similarity and dead codes. Then, xgboost model is used for training and predicting whether a pair of source code are plagiarised or not. Many plagiarism techniques ignores dead codes such as unused variables and functions in their predictions tasks. But number of unused variables and functions in the source code are considered in this paper. Using our features, the model achieved an accuracy score of 94% and average f1-score of 0.905 on the test set. We also compared the result of xgboost model with support vector machines(SVM) and report that xgboost model performed better on our dataset.

7 citations

01 Jan 2013
TL;DR: In this paper, a method based source code detection, which detects the simple plagiarized code like exact match, near exact match and longest common sequence is described. And also proposes the agent based detection which will perform the detection automatically.
Abstract: 2 ABSTRACT: Plagiarism detection plays an important role in software security protection and license issues. Source- code plagiarism detection method can be classified as string-based, token-based, parse-tree-based and program- dependency-based. All of these approaches have certain limitations and can not meet the requirements when the source code is large and may produce false positives. But, parse-tree based detection improves the detection ability and efficiency. This paper describes method based source code detection, which detect the simple plagiarized code like exact match, near exact match and longest common sequence. And also proposes the agent based detection which will perform the detection automatically. Automatic plagiarism detection will be helpful for code clone detection in software industry. Keyword: abstract syntax tree, plagiarism detection, source code plagiarism detection, parse tree, code clone.

6 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This research is investigating the parallelization of cross-language plagiarism detection system implemented on a lab-scale multicore based private cloud platform using OpenStack and the result of execution time of the overall parallelized computation was able to reach speed up of 1.07 to 3.52 times compared to the executionTime of the original serial computation.
Abstract: The computational performance of cross-language plagiarism detection system using winnowing algorithm developed at the Electrical Engineering Department, Universitas Indonesia became an issue for real world application. This research is investigating the parallelization of such system implemented on a lab-scale multicore based private cloud platform using OpenStack. Parallelization was done on the portion of the program where the paragraphs of reference documents are processed. The result of execution time of the overall parallelized computation was able to reach speed up of 1.07 to 3.52 times compared to the execution time of the original serial computation.

6 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125