Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Developing Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus Notebook for PAN at CLEF 2015

[...]

Habibollah Asghari, Khadijeh Khoshnava, Omid Fatemi, Heshaam Faili

01 Jan 2015

TL;DR: A bilingual Persian-English sentence aligned parallel corpus in a combination with Wikipedia articles is used to create a plagiarism detection corpus based on parallel corpus sentences.

...read moreread less

Abstract: Plagiarism detection is the process of locating text reuse within a suspicious document. The plagiarism detection corpora are used for evaluating plagiarism detection systems. In this paper, we present a bilingual Persian- English plagiarism detection corpus. We provide our corpus for the task of text alignment corpus construction in the PAN 2015 competition. Our approach is based on parallel corpus sentences. We have used a Persian-English sentence aligned parallel corpus in a combination with Wikipedia articles to create our corpus. Paired sentences in parallel corpus have a similarity score between 0 and 1. We have used similarity scores to establish the degree of obfuscation for constructing the plagiarism cases.

...read moreread less

12 citations

Expanded N-Grams for Semantic Text Alignment Notebook for PAN at CLEF 2014.

[...]

Samira Abnar, Mostafa Dehghani, Hamed Zamani, Azadeh Shakery

01 Jan 2014

TL;DR: This paper has proposed two different solutions to relax the comparison of two documents, so as to consider the semantic relations between them, and prepared a framework, which lets us combine different feature types and different strategies for merging the features.

...read moreread less

Abstract: Text alignment is a sub-task in the plagiarism detection process. In this paper we discuss our approach to address this problem. Our approach is based on mapping text alignment to the problem of subsequence matching just as previous works. We have prepared a framework, which lets us combine different feature types and different strategies for merging the features. We have proposed two different solutions to relax the comparison of two documents, so as to consider the semantic relations between them. Our first approach is based on defining a new feature type that contains semantic information about its corresponding doc- ument. In our second approach we have proposed a new method for comparing the features considering their semantic relations. Finally, We have applied DB- SCAN clustering algorithm to merge features in a neighborhood in both source and suspicious documents. Our experiments indicate that different feature sets are suitable for detecting different types of plagiarism.

...read moreread less

12 citations

Journal Article•DOI•

A Novel Approach for NearDuplicate Detection of Web Pages using TDW Matrix

[...]

Midhun Mathew, Shine N. Das, T. R. Lakshmi Narayanan, Pramod K. Vijayaraghavan

30 Apr 2011-International Journal of Computer Applications

TL;DR: A TDW matrix based algorithm with three phases, rendering, filtering and verification, which receives an input web-page and a threshold in its first phase, returns an optimal set of near-duplicate web pages in the verification phase after calculating its similarity.

...read moreread less

Abstract: voluminous amount of web documents has weakened the performance and reliability of web search engines. The subsistence of near-duplicate data is an issue that accompanies the growing need to incorporate heterogeneous data. Web content mining face huge problems due to the existence of duplicate and near-duplicate web pages. These pages either increase the index storage space or increase the serving costs thereby irritating the users. Near-duplicate detection has been recognized as an important one in the field of plagiarism detection, spam detection and in focused web crawling scenarios. Here we propose a novel idea for finding near- duplicates of an input web-page, from a huge repository. We proposes a TDW matrix based algorithm with three phases, rendering, filtering and verification, which receives an input web-page and a threshold in its first phase , prefix filtering and positional filtering to reduce the size of records in the second phase and returns an optimal set of near-duplicate web pages in the verification phase after calculating its similarity. The experimental results show that our algorithm outperforms in terms of two benchmark measures, precision and recall, and a reduction in the size of competing record set.

...read moreread less

12 citations

Are we ready for large scale use of plagiarism detection tools

[...]

Ruth Barrett, James A. Malcolm, Caroline Lyon

01 Jan 2003

TL;DR: Investigations into two plagiarism detection tools are described: the widely used commercial service Turnitin, and an in-house tool, Ferret, which are more useful in detecting plagiarism from web sources and within a group of students.

...read moreread less

Abstract: One strategy in the prevention and detection of plagiarism and collusion is to use an automated detection tool. We argue that, for consistent treatment of students, we should be applying these tools to ALL written submissions in a given assignment rather than merely using a detection tool to confirm suspicions that a single text has been plagiarised. In this paper we describe our investigations into two plagiarism detection tools: the widely used commercial service Turnitin, and an in-house tool, Ferret. We conclude that there are technical and practical problems, first in the large scale use of electronic submission of assignments and then in the further submission of these assignments to a plagiarism detector. Nevertheless, the reporting mechanisms of both tools are fast and easy to use. Turnitin is more useful in detecting plagiarism from web sources, Ferret for detecting collusion within a group of students.

...read moreread less

12 citations

Book Chapter•DOI•

Authorship Identification Using Dynamic Selection of Features from Probabilistic Feature Set

[...]

Hamed Zamani¹, Hossein Nasr Esfahani¹, Pariya Babaie¹, Samira Abnar¹, Mostafa Dehghani¹, Azadeh Shakery¹ - Show less +2 more•Institutions (1)

University of Tehran¹

15 Sep 2014

TL;DR: A probabilistic distribution model to represent each document as a feature set to increase the interpretability of the results and features is proposed and a distance measure is introduced to compute the distance between two feature sets.

...read moreread less

Abstract: Authorship identification was introduced as one of the important problems in the law and journalism fields and it is one of the major techniques in plagiarism detection. In this paper, to tackle the authorship verification problem, we propose a probabilistic distribution model to represent each document as a feature set to increase the interpretability of the results and features. We also introduce a distance measure to compute the distance between two feature sets. Finally, we exploit a KNN-based approach and a dynamic feature selection method to detect the features which discriminate the author’s writing style.

...read moreread less

12 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics