Journal ArticleDOI
Paraphrase plagiarism identification with character-level features
Fernando Sánchez-Vega,Esaú Villatoro-Tello,Manuel Montes-y-Gómez,Paolo Rosso,Efstathios Stamatatos,Luis Villaseñor-Pineda +5 more
Reads0
Chats0
TLDR
It is established that the original author’s writing style fingerprint prevails in the plagiarized text even when paraphrases occur, and a novel text representation scheme is proposed that gathers both content and style characteristics of texts, represented by means of character-level features.Abstract:
Several methods have been proposed for determining plagiarism between pairs of sentences, passages or even full documents. However, the majority of these methods fail to reliably detect paraphrase plagiarism due to the high complexity of the task, even for human beings. Paraphrase plagiarism identification consists in automatically recognizing document fragments that contain reused text, which is intentionally hidden by means of some rewording practices such as semantic equivalences, discursive changes and morphological or lexical substitutions. Our main hypothesis establishes that the original author’s writing style fingerprint prevails in the plagiarized text even when paraphrases occur. Thus, in this paper we propose a novel text representation scheme that gathers both content and style characteristics of texts, represented by means of character-level features. As an additional contribution, we describe the methodology followed for the construction of an appropriate corpus for the task of paraphrase plagiarism identification, which represents a new valuable resource to the NLP community for future research work in this field.read more
Citations
More filters
Plagiarism detection using Rouge and WordNet
TL;DR: In this paper, the authors proposed adoption of ROUGE and WordNet to plagiarism detection, which includes n-gram co-occurrence statistics, skip-bigram, and longest common subsequence (LCS).
Journal ArticleDOI
Cross-language text alignment: A proposed two-level matching scheme for plagiarism detection
TL;DR: The experimental results show that the proposed cross-language text alignment approach significantly outperforms the state-of-the-art models and can be fed into an expert system for further improvement of cross- language plagiarism detection.
Journal ArticleDOI
An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes
TL;DR: The results show that the proposed candidate retrieval model outperforms the state-of-the-art models and can be considered as a proper choice to be embedded in cross-language plagiarism detection systems.
Journal ArticleDOI
Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase
TL;DR: This paper employs text embedding vectors to compare similarity among documents to detect plagiarism and applies the proposed method on available datasets in English, Persian and Arabic languages on the text alignment task to evaluate the robustness of the proposed methods from the language perspective.
Journal ArticleDOI
Using word semantic concepts for plagiarism detection in text documents
TL;DR: This paper uses Word2vec to transform the words into word vectors which are able to reveal the semantic relationship among different words, and this method can be done more effectively in plagiarism detection.
References
More filters
Proceedings ArticleDOI
Winnowing: local algorithms for document fingerprinting
TL;DR: The class of local document fingerprinting algorithms is introduced, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies, and a novel lower bound on the performance of any local algorithm is proved.
Journal ArticleDOI
Blindness and Insight: Essays in the Rhetoric of Contemporary Criticism
Journal IssueDOI
Computational methods in authorship attribution
TL;DR: Three scenarios are considered here for which solutions to the basic attribution problem are inadequate; it is shown how machine learning methods can be adapted to handle the special challenges of that variant.
Proceedings ArticleDOI
Measuring the Semantic Similarity of Texts
Courtney D. Corley,Rada Mihalcea +1 more
TL;DR: A method that combines word- to-word similarity metrics into a text-to-text metric is introduced, and it is shown that this method outperforms the traditional text similarity metrics based on lexical matching.
Journal ArticleDOI
Methods for identifying versioned and plagiarized documents
Timothy C. Hoad,Justin Zobel +1 more
TL;DR: The identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents, and it is demonstrated that the identity measure is clearly superior for fingerprinting parameters.