Paraphrase plagiarism identification with character-level features

doi:10.1007/S10044-017-0674-Z

Journal ArticleDOI

Paraphrase plagiarism identification with character-level features

Fernando Sánchez-Vega, +5 more

- 01 May 2019 -

Pattern Analysis and Applications

- Vol. 22, Iss: 2, pp 669-681

Chats0

TLDR

It is established that the original author’s writing style fingerprint prevails in the plagiarized text even when paraphrases occur, and a novel text representation scheme is proposed that gathers both content and style characteristics of texts, represented by means of character-level features.

Abstract:

Several methods have been proposed for determining plagiarism between pairs of sentences, passages or even full documents. However, the majority of these methods fail to reliably detect paraphrase plagiarism due to the high complexity of the task, even for human beings. Paraphrase plagiarism identification consists in automatically recognizing document fragments that contain reused text, which is intentionally hidden by means of some rewording practices such as semantic equivalences, discursive changes and morphological or lexical substitutions. Our main hypothesis establishes that the original author’s writing style fingerprint prevails in the plagiarized text even when paraphrases occur. Thus, in this paper we propose a novel text representation scheme that gathers both content and style characteristics of texts, represented by means of character-level features. As an additional contribution, we describe the methodology followed for the construction of an appropriate corpus for the task of paraphrase plagiarism identification, which represents a new valuable resource to the NLP community for future research work in this field.

Paraphrase plagiarism identification with character-level features

Citations

Plagiarism detection using Rouge and WordNet

Cross-language text alignment: A proposed two-level matching scheme for plagiarism detection

An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes

Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase

Using word semantic concepts for plagiarism detection in text documents

References

Winnowing: local algorithms for document fingerprinting

Blindness and Insight: Essays in the Rhetoric of Contemporary Criticism

Computational methods in authorship attribution

Measuring the Semantic Similarity of Texts

Methods for identifying versioned and plagiarized documents

Related Papers (5)

A systematic study of knowledge graph analysis for cross-language plagiarism detection

Extracting lexical and phrasal paraphrases: a review of the literature

Detecting translingual plagiarism and the backlash against translation plagiarists

Is This a Paraphrase? What Kind? Paraphrase Boundaries and Typology

Turkish Paraphrase Corpus