scispace - formally typeset
Open AccessProceedings Article

An Evaluation Framework for Plagiarism Detection

Reads0
Chats0
TLDR
Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.
Abstract
We present an evaluation framework for plagiarism detection. The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Overview of the 2nd International Competition on Plagiarism Detection

TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.
Proceedings ArticleDOI

Fine-Grained Analysis of Propaganda in News Article

TL;DR: In this paper, a fine-grained analysis of texts by detecting all fragments that contain propaganda techniques as well as their type is proposed. But, their work is limited to news articles manually annotated at fragment level with propaganda techniques.
Proceedings Article

Re-examining Machine Translation Metrics for Paraphrase Identification

TL;DR: It is shown that a meta-classifier trained using nothing but recent MT metrics outperforms all previous paraphrase identification approaches on the Microsoft Research Paraphrase corpus and is released for use by the community.
Book ChapterDOI

Improving the Reproducibility of PAN’s Shared Tasks:

TL;DR: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling, which forms the largest collection of softwares for these tasks to date.
Journal ArticleDOI

Plagiarism detection using stopword n -grams

TL;DR: It is shown that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries.
References
More filters
Journal ArticleDOI

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

TL;DR: A qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools is provided, and a taxonomy of editing scenarios that produce different clone types and a qualitative evaluation of current clone detectors are evaluated.
Proceedings ArticleDOI

Financial incentives and the "performance of crowds"

TL;DR: It is found that increased financial incentives increase the quantity, but not the quality, of work performed by participants, where the difference appears to be due to an "anchoring" effect.

A Survey on Software Clone Detection Research

TL;DR: The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.
Journal ArticleDOI

Hierarchical Clustering Algorithms for Document Datasets

TL;DR: The experimental evaluation shows that, contrary to the common belief, partitional algorithms always lead to better solutions than agglomerative algorithms; making them ideal for clustering large document collections due to not only their relatively low computational requirements, but also higher clustering quality.
Proceedings ArticleDOI

Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

TL;DR: This work applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences.
Related Papers (5)