Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Web Based Cross Language Semantic Plagiarism Detection

[...]

Chow Kok Kent, Naomie Salim

12 Dec 2011

TL;DR: This research proposes a new approach in detecting both cross language and semantic plagiarism and shows that it can achieve higher precision, recall and F-Measure compared to the conventional Longest Common Subsequence (LCS) approach.

...read moreread less

Abstract: As the Internet help us cross language and cultural border and with different types of translation tools, cross language plagiarism is bound to rise. Besides that, semantic plagiarism, where the student reconstructs the sentence or changes some terms into its corresponding synonyms, also raises concerns in the academic field. Both of this plagiarism is hardly detected due to the difference in their fingerprints. Plagiarism detection tools available are not capable to detect such plagiarism cases. In this research, we propose a new approach in detecting both cross language and semantic plagiarism. We consider Bahasa Melayu as the input language of the submitted document and English as a target language of similar, possibly plagiarised documents. In this system we shorten the query document by utilising fuzzy swarm-based summarisation approach. Our point of view is that using the summary will give us the most important keywords in the document. Input summary documents are translated into English using Google Translate Application Programming Interface (API) before the words are stemmed and the stop words are removed. Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. We integrate the use of Stanford Parser and Word Net to determine the semantic similarity level between the suspected documents with candidate source documents. Stanford parser assigns each terms in the sentence to their corresponding roles such as Nouns, Verbs and Adjectives. Based on these roles, we represent each sentence in a predicate form and similarity is measured based on those predicates using information content value from Word Net taxonomy. Our testing dataset is built up from two sets of Malay documents which are produced based on different plagiarism techniques. The result of our proposed semantic based similarity measurement shows that it can achieve higher precision, recall and F-Measure compared to the conventional Longest Common Subsequence (LCS) approach.

...read moreread less

6 citations

Proceedings Article•

Linguistic and Statistical Traits Characterising Plagiarism

[...]

Miranda Chong¹, Lucia Specia²•Institutions (2)

University of Wolverhampton¹, University of Sheffield²

01 Dec 2012

TL;DR: This paper investigates the problem of distinguishing between original and rewritten text materials, with focus on the application of plagiarism detection, and proposes and analyses a number of indicators that can be captured through statistical and linguistic indicators.

...read moreread less

Abstract: This paper investigates the problem of distinguishing between original and rewritten text materials, with focus on the application of plagiarism detection. The hypothesis is that original texts and rewritten texts exhibit significant and measurable differences, and that these can be captured through statistical and linguistic indicators. We propose and analyse a number of these indicators (including language models, syntactic trees, etc.) using machine learning algorithms in two main settings: (i) the classification of individual text segments as original or rewritten, and (ii) the ranking of two or more versions of a text segment according to their “originality”, thus rendering the rewriting direction. Different from standard plagiarism detection approaches, our settings do not involve comparisons between supposedly rewritten text and (a large number of) original texts. Instead, our work focuses on the sub-problem of finding segments that exhibit rewriting traits. Identifying such segments has a number of potential applications, from a first-stage filtering for standard plagiarism detection approaches, to intrinsic plagiarism detection and authorship identification. The corpus used in the experiments was extracted from the PAN-PC-10 plagiarism detection task, with two subsets containing manually and artificially generated plagiarism cases. The accuracies achieved are well above a by chance baseline across datasets and settings, with the statistical indicators being particularly effective.

...read moreread less

6 citations

Journal Article•DOI•

Cross-lingual plagiarism detection techniques for English-Hindi language pairs

[...]

Basant Agarwal¹•Institutions (1)

Indian Institutes of Information Technology¹

01 Sep 2019-Journal of Discrete Mathematical Sciences and Cryptography

TL;DR: Different cross-language plagiarism detection approaches in context of Indian languages pairs such as Hindi-English, English-Hindi language pairs are presented.

...read moreread less

Abstract: Plagiarism is defined as stealing of another author’s language, thoughts, or ideas as one’s own original work. Most of the work in the literature has been emphasized on monolingual plagiarism detec...

...read moreread less

6 citations

Journal Article•DOI•

A New Online Plagiarism Detection System based on Deep Learning

[...]

El Mostafa Hambi, Faouzia Benabbou

01 Jan 2020-International Journal of Advanced Computer Science and Applications

TL;DR: A plagiarism detection framework based on three deep learning models: Doc2vec, Siamese Long Short-term Memory (SLSTM) and Convolutional Neural Network (CNN) that can detect different types of plagiarism, enables to specify another dataset and supports to compare the document from an internet search.

...read moreread less

Abstract: The Plagiarism is an increasingly widespread and growing problem in the academic field. Several plagiarism techniques are used by fraudsters, ranging from a simple synonym replacement, sentence structure modification, to more complex method involving several types of transformation. Human based plagiarism detection is difficult, not accurate, and time-consuming process. In this paper we propose a plagiarism detection framework based on three deep learning models: Doc2vec, Siamese Long Short-term Memory (SLSTM) and Convolutional Neural Network (CNN). Our system uses three layers: Preprocessing Layer including word embedding, Learning Layers and Detection Layer. To evaluate our system, we carried out a study on plagiarism detection tools from the academic field and make a comparison based on a set of features. Compared to other works, our approach performs a good accuracy of 98.33 % and can detect different types of plagiarism, enables to specify another dataset and supports to compare the document from an internet search.

...read moreread less

6 citations

Journal Article•DOI•

Implementation of Winnowing Algorithm with Dictionary English-Indonesia Technique to Detect Plagiarism

[...]

Anton Yudhana, Sunardi, Iif Alfiatul Mukaromah

01 Jan 2018-International Journal of Advanced Computer Science and Applications

TL;DR: This research will produce plagiarism detection using winnowing algorithm with English-Indonesian dictionary technique with the aim of preventing plagiarism in the academic community.

...read moreread less

Abstract: The ease of obtaining information that is easy, fast, and cheap from all over the world through the internet network can encourage someone to take action plagiarism. Plagiarism is an intellectual crime that often occurs in the writing world where the perpetrators take the work of others without declaring the original source; if it continues to be left it will have a negative impact on the academic community and can be a chronic disease in the progress of a nation. At this time, the process of plagiarism detection is done manually and automatically using the help of technological developments (plagiarism detection), but the automatic checks available now mostly just check every letter character contained in the document, cannot check where the plagiarist takes a quote from a foreign language and changed in plagiarist language. Detection of plagiarism in this study will use a winnowing algorithm that has a function to check every character in two samples by hashing method that can generate fingerprint from two documents. While the dictionary method English-Indonesia change the writing from English to Indonesian language. This research will produce plagiarism detection using winnowing algorithm with English-Indonesian dictionary technique.

...read moreread less

6 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics