Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Intrinsic Plagiarism Detection Using Character Trigram Distance Scores Notebook for PAN at CLEF 2011

[...]

Mike Kestemont, Kim Luyckx, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

01 Jan 2011

TL;DR: In this article, each suspicious document is divided into a series of consecutive, po-tentially overlapping "windows" of equal size, represented by vectors containing the relative frequencies of a predetermined set of high-frequency char- acter trigrams.

...read moreread less

Abstract: In this paper, we describe a novel approach to intrinsic plagiarism de- tection. Each suspicious document is divided into a series of consecutive, po- tentially overlapping 'windows' of equal size. These are represented by vectors containing the relative frequencies of a predetermined set of high-frequency char- acter trigrams. Subsequently, a distance matrix is set up in which each of the document's windows is compared to each other window. The distance measure used is a symmetric adaptation of the normalized distance (nd1) proposed by Stamatatos (17). Finally, an algorithm for outlier detection in multivariate data (based on Principal Components Analysis) is applied to the distance matrix in or- der to detect plagiarized sections. In the PAN-PC-2011 competition, this system (second place) achieved a competitive recall (.4279) but only reached a plagdet of .1679 due to a disappointing precision (.1075).

...read moreread less

34 citations

Proceedings Article•

Corpus and Evaluation Measures for Automatic Plagiarism Detection

[...]

Alberto Barrón-Cedeño¹, Martin Potthast², Paolo Rosso¹, Benno Stein²•Institutions (2)

Polytechnic University of Valencia¹, Bauhaus University, Weimar²

01 May 2010

TL;DR: A newly developed large-scale corpus of artificial plagiarism is developed useful for the evaluation of intrinsic as well as external plagiarism detection.

...read moreread less

Abstract: The simple access to texts on digital libraries and the World Wide Web has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts in the analysis of documents for plagiarism. The methods can be divided into two main approaches: intrinsic and external. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and, very often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism useful for the evaluation of intrinsic as well as external plagiarism detection. Additionally, new detection performance measures tailored to the evaluation of plagiarism detection algorithms are proposed.

...read moreread less

34 citations

Journal Article•DOI•

A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts

[...]

Muhammad Haroon Shakeel¹, Asim Karim¹, Imdadullah Khan¹•Institutions (1)

Lahore University of Management Sciences¹

01 May 2020-Information Processing and Management

TL;DR: This work presents a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts and shows that it produces a comparable or state-of-the-art performance on all three benchmark datasets.

...read moreread less

Abstract: Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three.

...read moreread less

34 citations

Journal Article•DOI•

A Cautionary Note on Checking Software Engineering Papers for Plagiarism

[...]

C. Kaner¹, R.L. Fiedler²•Institutions (2)

Florida Institute of Technology¹, Saint Mary-of-the-Woods College²

01 May 2008-IEEE Transactions on Education

TL;DR: Two leading plagiarism detection tools are contrasted, TurnItIn and MyDropBox, in detecting submissions that were obviously plagiarized from articles published in IEEE journals.

...read moreread less

Abstract: Several tools are marketed to the educational community for plagiarism detection and prevention. This article briefly contrasts the performance of two leading tools, TurnItIn and MyDropBox, in detecting submissions that were obviously plagiarized from articles published in IEEE journals. Both tools performed poorly because they do not compare submitted writings to publications in the IEEE database. Moreover, these tools do not cover the Association for Computing Machinery (ACM) database or several others important for scholarly work in software engineering. Reports from these tools suggesting that a submission has ldquopassedrdquo can encourage false confidence in the integrity of a submitted writing. Additionally, students can submit drafts to determine the extent to which these tools detect plagiarism in their work. Because the tool samples the engineering professional literature narrowly, the student who chooses to plagiarize can use this tool to determine what plagiarism will be invisible to the faculty member. An appearance of successful plagiarism prevention may in fact reflect better training of students to avoid plagiarism detection.

...read moreread less

34 citations

Proceedings Article•DOI•

An Adaptive Image-based Plagiarism Detection Approach

[...]

Norman Meuschke¹, Christopher Gondek¹, Daniel Seebacher¹, Corinna Breitinger¹, Daniel A. Keim¹, Bela Gipp¹ - Show less +2 more•Institutions (1)

University of Konstanz¹

23 May 2018

TL;DR: An adaptive, scalable, and extensible image-based plagiarism detection approach suitable for analyzing a wide range of image similarities that was observed in academic documents and can complement other content-based feature analysis approaches to retrieve potential source documents for suspiciously similar content from large collections.

...read moreread less

Abstract: Identifying plagiarized content is a crucial task for educational and research institutions, funding agencies, and academic publishers. Plagiarism detection systems available for productive use reliably identify copied text, or near-copies of text, but often fail to detect disguised forms of academic plagiarism, such as paraphrases, translations, and idea plagiarism. To improve the detection capabilities for disguised forms of academic plagiarism, we analyze the images in academic documents as text-independent features. We propose an adaptive, scalable, and extensible image-based plagiarism detection approach suitable for analyzing a wide range of image similarities that we observed in academic documents. The proposed detection approach integrates established image analysis methods, such as perceptual hashing, with newly developed similarity assessments for images, such as ratio hashing and position-aware OCR text matching. We evaluate our approach using 15 image pairs that are representative of the spectrum of image similarity we observed in alleged and confirmed cases of academic plagiarism. We embed the test cases in a collection of 4,500 related images from academic texts. Our detection approach achieved a recall of 0.73 and a precision of 1. These results indicate that our image-based approach can complement other content-based feature analysis approaches to retrieve potential source documents for suspiciously similar content from large collections. We provide our code as open source to facilitate future research on image-based plagiarism detection.

...read moreread less

34 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics