An Evaluation Framework for Plagiarism Detection

Open AccessProceedings Article

An Evaluation Framework for Plagiarism Detection

Martin Potthast, +3 more

- pp 997-1005

Chats0

TLDR

Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

Abstract:

We present an evaluation framework for plagiarism detection. The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

Citations

PDF

Open Access

More filters

Proceedings Article

Overview of the 2nd International Competition on Plagiarism Detection

Martin Potthast, +5 more

TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.

...read moreread less

Proceedings ArticleDOI

Fine-Grained Analysis of Propaganda in News Article

Giovanni Da San Martino, +4 more

TL;DR: In this paper, a fine-grained analysis of texts by detecting all fragments that contain propaganda techniques as well as their type is proposed. But, their work is limited to news articles manually annotated at fragment level with propaganda techniques.

...read moreread less

Proceedings Article

Re-examining Machine Translation Metrics for Paraphrase Identification

Nitin Madnani, +2 more

TL;DR: It is shown that a meta-classifier trained using nothing but recent MT metrics outperforms all previous paraphrase identification approaches on the Microsoft Research Paraphrase corpus and is released for use by the community.

...read moreread less

Book ChapterDOI

Improving the Reproducibility of PAN’s Shared Tasks:

Martin Potthast, +5 more

TL;DR: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling, which forms the largest collection of softwares for these tasks to date.

...read moreread less

Journal ArticleDOI

Plagiarism detection using stopword n -grams

Efstathios Stamatatos

- 01 Dec 2011 -

Journal of the Association for Informati...

TL;DR: It is shown that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Chanchal K. Roy, +2 more

- 01 May 2009 -

Science of Computer Programming

TL;DR: A qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools is provided, and a taxonomy of editing scenarios that produce different clone types and a qualitative evaluation of current clone detectors are evaluated.

...read moreread less

Proceedings ArticleDOI

Financial incentives and the "performance of crowds"

Winter Mason, +1 more

TL;DR: It is found that increased financial incentives increase the quantity, but not the quality, of work performed by participants, where the difference appears to be due to an "anchoring" effect.

...read moreread less

A Survey on Software Clone Detection Research

Chanchal K. Roy, +1 more

TL;DR: The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.

...read moreread less

Journal ArticleDOI

Hierarchical Clustering Algorithms for Document Datasets

Ying Zhao, +2 more

- 01 Mar 2005 -

Data Mining and Knowledge Discovery

TL;DR: The experimental evaluation shows that, contrary to the common belief, partitional algorithms always lead to better solutions than agglomerative algorithms; making them ideal for clustering large document collections due to not only their relatively low computational requirements, but also higher clustering quality.

...read moreread less

Proceedings ArticleDOI

Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

Regina Barzilay, +1 more

TL;DR: This work applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences.

...read moreread less

Related Papers (5)

Overview of the 2nd International Competition on Plagiarism Detection

Martin Potthast, +5 more

Plagiarism - A Survey

Hermann A. Maurer, +2 more

- 01 Jan 2006 -

Journal of Universal Computer Science

An Evaluation Framework for Plagiarism Detection

Citations

Overview of the 2nd International Competition on Plagiarism Detection

Fine-Grained Analysis of Propaganda in News Article

Re-examining Machine Translation Metrics for Paraphrase Identification

Improving the Reproducibility of PAN’s Shared Tasks:

Plagiarism detection using stopword n -grams

References

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Financial incentives and the "performance of crowds"

A Survey on Software Clone Detection Research

Hierarchical Clustering Algorithms for Document Datasets

Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

Related Papers (5)

Overview of the 2nd International Competition on Plagiarism Detection

Plagiarism - A Survey

Cross-language plagiarism detection

Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

Intrinsic plagiarism detection