A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection
Imene Bensalem,Paolo Rosso,Salim Chikhi +2 more
- pp 53-58
TLDR
The first corpus for the evaluation of Arabic intrinsic plagiarism detection is introduced, consisting of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.Abstract:
The present paper introduces the first corpus for the evaluation of Arabic intrinsic plagiarism detection. The corpus consists of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.read more
Citations
More filters
Journal ArticleDOI
Academic Plagiarism Detection: A Systematic Literature Review
TL;DR: The integration of heterogeneous analysis methods for textual and non-textual content features using machine learning is seen as the most promising area for future research contributions to improve the detection of academic plagiarism further.
Posted Content
Critical Survey of the Freely Available Arabic Corpora.
TL;DR: The results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources are presented and they are presented in the various categories studied.
Journal ArticleDOI
Creating language resources for under-resourced languages: methodologies, and experiments with Arabic
TL;DR: Three different paradigms for creating language resources are illustrated, namely: using crowdsourcing to produce a small resource rapidly and relatively cheaply; translating an existing gold-standard dataset; and using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality.
Proceedings Article
Overview of the AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection
TL;DR: An overview paper describes these evaluation corpora of plagiarism detection methods for Arabic texts, discusses the participants' methods, and highlights their building blocks that could be language dependent.
Proceedings ArticleDOI
Intrinsic Plagiarism Detection using N-gram Classes
TL;DR: A novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that is called n-gram classes is introduced which is comparable to the best state-of-the-art methods.
References
More filters
Proceedings Article
Overview of the 2nd International Competition on Plagiarism Detection
Martin Potthast,Alberto Barrón-Cedeño,Andreas Eiselt,Benno Stein,Paolo Rosso,Bauhaus-Universiät Weimar +5 more
TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.
Proceedings Article
An Evaluation Framework for Plagiarism Detection
TL;DR: Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.
Journal ArticleDOI
Intrinsic plagiarism analysis
TL;DR: The question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form is investigated.
Overview of the 1st international competition on plagiarism detection
TL;DR: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length.
An Overview of the Traditional Authorship Attribution Subtask.
TL;DR: This paper describes the Traditional Authorship Attribution subtask of the PAN/CLEF 2012 workshop, and established a new corpus for analysis for 2012 (Rome), which consisted of eight problems, including three closed-class authorship attribution problems, three open-class (the set of correct answers included Ònone of the aboveÓ), and two clustering problems.