scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
Mamdouh Farouk1
TL;DR: Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity, but structure based similarity that measures similarity between sentences structures needs more investigation.
Abstract: This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity. Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences structures needs more investigation.

26 citations

Proceedings ArticleDOI
07 Apr 2011
TL;DR: Five tools for detecting plagiarism in Java source code texts: JPlag, Marble, moss, Plaggie, and sim are compared with respect to their features and performance.
Abstract: In this paper we compare five tools for detecting plagiarism in Java source code texts: JPlag, Marble, moss, Plaggie, and sim. The tools are compared with respect to their features and performance. For the performance comparison we carried out two experiments: to compare the sensitivity of the tools for different plagiarism techniques we have applied the tools to a set of intentionally plagiarised programs. To get a picture of the precision of the tools, we have run the tools on several incarnations of a student assignment and compared the top 10's of the results.

26 citations

Journal ArticleDOI
TL;DR: The research in this context involves at first examining various metrics used in plagiarism detection in program codes and secondly selecting an appropriate statistical measure using attribute counting metrics (ATMs) for detecting plagiarism in Java programming assignments.
Abstract: Practical computing courses that involve significant amount of programming assessment tasks suffer from e-Plagiarism. A pragmatic solution for this problem could be by discouraging plagiarism particularly among the beginners in programming. One way to address this is to automate the detection of plagiarized work during the marking phase. Our research in this context involves at first examining various metrics used in plagiarism detection in program codes and secondly selecting an appropriate statistical measure using attribute counting metrics (ATMs) for detecting plagiarism in Java programming assignments. The goal of this investigation is to study the effectiveness of ATMs for detecting plagiarism among assignment submissions of introductory programming courses.

26 citations

Proceedings Article
01 Jan 2015
TL;DR: An overview paper describes these evaluation corpora of plagiarism detection methods for Arabic texts, discusses the participants' methods, and highlights their building blocks that could be language dependent.
Abstract: is the first shared task that addresses the evaluation of plagiarism detection methods for Arabic texts. It has two sub- tasks, namely external plagiarism detection and intrinsic plagiarism detection. A total of 8 runs have been submitted and tested on the standardized corpora developed for the track. This overview paper describes these evaluation corpora, discusses the participants' methods, and highlights their building blocks that could be language dependent.

26 citations

Proceedings ArticleDOI
07 Aug 2017
TL;DR: This paper attempts to replicate and reproduce the results of Severyn and Moschitti using their open-source code as well as to reproduce their results via a de novo implementation using a completely different deep learning toolkit.
Abstract: In recent years, neural networks have been applied to many text processing problems. One example is learning a similarity function between pairs of text, which has applications to paraphrase extraction, plagiarism detection, question answering, and ad hoc retrieval. Within the information retrieval community, the convolutional neural network model proposed by Severyn and Moschitti in a SIGIR 2015 paper has gained prominence. This paper focuses on the problem of answer selection for question answering: we attempt to replicate the results of Severyn and Moschitti using their open-source code as well as to reproduce their results via a de novo (i.e., from scratch) implementation using a completely different deep learning toolkit. Our de novo implementation is instructive in ascertaining whether reported results generalize across toolkits, each of which have their idiosyncrasies. We were able to successfully replicate and reproduce the reported results of Severyn and Moschitti, albeit with minor differences in effectiveness, but affirming the overall design of their model. Additional ablation experiments break down the components of the model to show their contributions to overall effectiveness. Interestingly, we find that removing one component actually increases effectiveness and that a simplified model with only four word overlap features performs surprisingly well, even better than convolution feature maps alone.

26 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125