Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning

[...]

Anuj Saini, Aayushi Verma

07 Dec 2016

TL;DR: This paper proposes a novel approach to identify if two statements are paraphrased or not using various machine learning algorithms like Random Forest, Support Vector Machine, Gradient Boosting and Gaussian Naive Bayes on the given training data set of two subtasks.

...read moreread less

Abstract: Every language possesses plausible several interpretations. With the evolution of web, smart devices and social media it has become a challenging task to identify these syntactic or semantic ambiguities. In Natural Language Processing, two statements written using different words having same meaning is termed as paraphrasing. At FIRE 2016, we have worked upon the problem of detecting paraphrases for the given Shared Task DPIL (Detecting Paraphrases in Indian Languages) in Hindi Language specifically. This paper proposes a novel approach to identify if two statements are paraphrased or not using various machine learning algorithms like Random Forest, Support Vector Machine, Gradient Boosting and Gaussian Naive Bayes on the given training data set of two subtasks. In cross validation experiments, Random Forest outperforms the other methods with F1-score of 0.94. We have extended our work by adding few more features and using the former best classifier resulting in improvement of F1-score by 1%. The experimental results depict that our algorithm got the highest F1-score and accuracy and hence, secured the first rank in Hindi language in this shared task among all participants. Our novel approach can be used in various applications such as question-answering system, document clustering, machine translation, text summarization, plagiarism detection and many more.

...read moreread less

8 citations

Proceedings Article•DOI•

A Comparative Study of Plagiarism Detection Software

[...]

Vandana¹•Institutions (1)

Jaypee Institute of Information Technology¹

01 Feb 2018

TL;DR: Five softwares compared for detecting plagiarism are compared and it is envisaged that a comparative analysis of these tools will help the institutions to better understand their features.

...read moreread less

Abstract: Plagiarism is widespread and a growing problem in academic institutions. Various steps are being taken to curb the menace of Plagiarism. One such is to employ Plagiarism Detection Software. Such software comprises of various approaches that aim to create a fair environment for academic publications. In this paper five such tools are compared for detecting plagiarism. The softwares compared in this paper are Jplag, MOSS, Plaggie, SIM and Turnitin. These softwars are widely used in a number of academic institutions and it is envisaged that a comparative analysis of these tools will help the institutions to better understand their features.

...read moreread less

8 citations

Journal Article•DOI•

Corpus-Based Paraphrase Detection Experiments and Review

[...]

Tedo Vrbanec, Ana Meštrović¹•Institutions (1)

University of Rijeka¹

31 May 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection, which is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc.

...read moreread less

Abstract: Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection-where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

...read moreread less

8 citations

Spot the Difference! Visual plagiarism in the visual arts.

[...]

Leigh Garrett, Amy Robinson

10 Jul 2012

TL;DR: The Spot the Difference! project as discussed by the authors investigated the nature, scope and extent of visual plagiarism in the arts education sector and found that most commonly used search technologies rely on text, which can potentially result in inconsistency of detection, approach, policies and practices.

...read moreread less

Abstract: Over recent years there has been considerable investment in the use of technology to identify sources of text-based plagiarism in universities. However, students of the visual arts are also required to complete numerous pieces of visual submissions for assessment, and yet very little similar work has been undertaken in the area of non-text based plagiarism detection. The Spot the Difference! project (2011-2012), funded by JISC and led by the University for the Creative Arts, seeks to address this gap by piloting the use of visual search tools developed by the University of Surrey and testing their application to support learning and teaching in the arts and specifically to the identification of visual plagiarism. Given that most commonly used search technologies rely on text, the identification and evidencing of visual plagiarism is often left to the knowledge and experience of academic staff, which can potentially result in inconsistency of detection, approach, policies and practices. This paper outlines the work of the project team, who sought to investigate the nature, scope and extent of visual plagiarism in the arts education sector.

...read moreread less

8 citations

Book Chapter•DOI•

Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System

[...]

N. Riya Ravi¹, K Vani¹, Deepa Gupta¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2016

TL;DR: This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task, and the results are compared with existing approaches, through, N-gram and K Means Clustering.

...read moreread less

Abstract: With the advent of World Wide Web, plagiarism has become a prime issue in field of academia. A plagiarized document may contain content from a number of sources available on the web and it is beyond any individual to detect such plagiarism manually. This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task. Partial data sets from PAN 2013 corpus is used for the evaluation of the system and the results are compared with existing approaches, via, N-gram and K Means Clustering. The performance of the systems is measured using the standard measures, precision and recall and comparison is done.

...read moreread less

8 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics