scispace - formally typeset
Open AccessBook ChapterDOI

A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection

TLDR
The first corpus for the evaluation of Arabic intrinsic plagiarism detection is introduced, consisting of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.
Abstract
The present paper introduces the first corpus for the evaluation of Arabic intrinsic plagiarism detection. The corpus consists of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Academic Plagiarism Detection: A Systematic Literature Review

TL;DR: The integration of heterogeneous analysis methods for textual and non-textual content features using machine learning is seen as the most promising area for future research contributions to improve the detection of academic plagiarism further.
Posted Content

Critical Survey of the Freely Available Arabic Corpora.

TL;DR: The results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources are presented and they are presented in the various categories studied.
Journal ArticleDOI

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

TL;DR: Three different paradigms for creating language resources are illustrated, namely: using crowdsourcing to produce a small resource rapidly and relatively cheaply; translating an existing gold-standard dataset; and using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality.
Proceedings Article

Overview of the AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection

TL;DR: An overview paper describes these evaluation corpora of plagiarism detection methods for Arabic texts, discusses the participants' methods, and highlights their building blocks that could be language dependent.
Proceedings ArticleDOI

Intrinsic Plagiarism Detection using N-gram Classes

TL;DR: A novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that is called n-gram classes is introduced which is comparable to the best state-of-the-art methods.
References
More filters
Proceedings Article

Overview of the 2nd International Competition on Plagiarism Detection

TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.
Proceedings Article

An Evaluation Framework for Plagiarism Detection

TL;DR: Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.
Journal ArticleDOI

Intrinsic plagiarism analysis

TL;DR: The question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form is investigated.

Overview of the 1st international competition on plagiarism detection

TL;DR: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length.

An Overview of the Traditional Authorship Attribution Subtask.

Patrick Juola
TL;DR: This paper describes the Traditional Authorship Attribution subtask of the PAN/CLEF 2012 workshop, and established a new corpus for analysis for 2012 (Rome), which consisted of eight problems, including three closed-class authorship attribution problems, three open-class (the set of correct answers included Ònone of the aboveÓ), and two clustering problems.
Related Papers (5)