scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Book ChapterDOI
07 Dec 2016
TL;DR: This paper proposes a novel approach to identify if two statements are paraphrased or not using various machine learning algorithms like Random Forest, Support Vector Machine, Gradient Boosting and Gaussian Naive Bayes on the given training data set of two subtasks.
Abstract: Every language possesses plausible several interpretations. With the evolution of web, smart devices and social media it has become a challenging task to identify these syntactic or semantic ambiguities. In Natural Language Processing, two statements written using different words having same meaning is termed as paraphrasing. At FIRE 2016, we have worked upon the problem of detecting paraphrases for the given Shared Task DPIL (Detecting Paraphrases in Indian Languages) in Hindi Language specifically. This paper proposes a novel approach to identify if two statements are paraphrased or not using various machine learning algorithms like Random Forest, Support Vector Machine, Gradient Boosting and Gaussian Naive Bayes on the given training data set of two subtasks. In cross validation experiments, Random Forest outperforms the other methods with F1-score of 0.94. We have extended our work by adding few more features and using the former best classifier resulting in improvement of F1-score by 1%. The experimental results depict that our algorithm got the highest F1-score and accuracy and hence, secured the first rank in Hindi language in this shared task among all participants. Our novel approach can be used in various applications such as question-answering system, document clustering, machine translation, text summarization, plagiarism detection and many more.

8 citations

Proceedings ArticleDOI
01 Feb 2018
TL;DR: Five softwares compared for detecting plagiarism are compared and it is envisaged that a comparative analysis of these tools will help the institutions to better understand their features.
Abstract: Plagiarism is widespread and a growing problem in academic institutions. Various steps are being taken to curb the menace of Plagiarism. One such is to employ Plagiarism Detection Software. Such software comprises of various approaches that aim to create a fair environment for academic publications. In this paper five such tools are compared for detecting plagiarism. The softwares compared in this paper are Jplag, MOSS, Plaggie, SIM and Turnitin. These softwars are widely used in a number of academic institutions and it is envisaged that a comparative analysis of these tools will help the institutions to better understand their features.

8 citations

Journal ArticleDOI
TL;DR: In this paper, the authors give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection, which is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc.
Abstract: Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection-where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

8 citations

10 Jul 2012
TL;DR: The Spot the Difference! project as discussed by the authors investigated the nature, scope and extent of visual plagiarism in the arts education sector and found that most commonly used search technologies rely on text, which can potentially result in inconsistency of detection, approach, policies and practices.
Abstract: Over recent years there has been considerable investment in the use of technology to identify sources of text-based plagiarism in universities. However, students of the visual arts are also required to complete numerous pieces of visual submissions for assessment, and yet very little similar work has been undertaken in the area of non-text based plagiarism detection. The Spot the Difference! project (2011-2012), funded by JISC and led by the University for the Creative Arts, seeks to address this gap by piloting the use of visual search tools developed by the University of Surrey and testing their application to support learning and teaching in the arts and specifically to the identification of visual plagiarism. Given that most commonly used search technologies rely on text, the identification and evidencing of visual plagiarism is often left to the knowledge and experience of academic staff, which can potentially result in inconsistency of detection, approach, policies and practices. This paper outlines the work of the project team, who sought to investigate the nature, scope and extent of visual plagiarism in the arts education sector.

8 citations

Book ChapterDOI
01 Jan 2016
TL;DR: This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task, and the results are compared with existing approaches, through, N-gram and K Means Clustering.
Abstract: With the advent of World Wide Web, plagiarism has become a prime issue in field of academia. A plagiarized document may contain content from a number of sources available on the web and it is beyond any individual to detect such plagiarism manually. This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task. Partial data sets from PAN 2013 corpus is used for the evaluation of the system and the results are compared with existing approaches, via, N-gram and K Means Clustering. The performance of the systems is measured using the standard measures, precision and recall and comparison is done.

8 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125