scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings ArticleDOI
12 Dec 2011
TL;DR: This research proposes a new approach in detecting both cross language and semantic plagiarism and shows that it can achieve higher precision, recall and F-Measure compared to the conventional Longest Common Subsequence (LCS) approach.
Abstract: As the Internet help us cross language and cultural border and with different types of translation tools, cross language plagiarism is bound to rise. Besides that, semantic plagiarism, where the student reconstructs the sentence or changes some terms into its corresponding synonyms, also raises concerns in the academic field. Both of this plagiarism is hardly detected due to the difference in their fingerprints. Plagiarism detection tools available are not capable to detect such plagiarism cases. In this research, we propose a new approach in detecting both cross language and semantic plagiarism. We consider Bahasa Melayu as the input language of the submitted document and English as a target language of similar, possibly plagiarised documents. In this system we shorten the query document by utilising fuzzy swarm-based summarisation approach. Our point of view is that using the summary will give us the most important keywords in the document. Input summary documents are translated into English using Google Translate Application Programming Interface (API) before the words are stemmed and the stop words are removed. Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. We integrate the use of Stanford Parser and Word Net to determine the semantic similarity level between the suspected documents with candidate source documents. Stanford parser assigns each terms in the sentence to their corresponding roles such as Nouns, Verbs and Adjectives. Based on these roles, we represent each sentence in a predicate form and similarity is measured based on those predicates using information content value from Word Net taxonomy. Our testing dataset is built up from two sets of Malay documents which are produced based on different plagiarism techniques. The result of our proposed semantic based similarity measurement shows that it can achieve higher precision, recall and F-Measure compared to the conventional Longest Common Subsequence (LCS) approach.

6 citations

Proceedings Article
01 Dec 2012
TL;DR: This paper investigates the problem of distinguishing between original and rewritten text materials, with focus on the application of plagiarism detection, and proposes and analyses a number of indicators that can be captured through statistical and linguistic indicators.
Abstract: This paper investigates the problem of distinguishing between original and rewritten text materials, with focus on the application of plagiarism detection. The hypothesis is that original texts and rewritten texts exhibit significant and measurable differences, and that these can be captured through statistical and linguistic indicators. We propose and analyse a number of these indicators (including language models, syntactic trees, etc.) using machine learning algorithms in two main settings: (i) the classification of individual text segments as original or rewritten, and (ii) the ranking of two or more versions of a text segment according to their “originality”, thus rendering the rewriting direction. Different from standard plagiarism detection approaches, our settings do not involve comparisons between supposedly rewritten text and (a large number of) original texts. Instead, our work focuses on the sub-problem of finding segments that exhibit rewriting traits. Identifying such segments has a number of potential applications, from a first-stage filtering for standard plagiarism detection approaches, to intrinsic plagiarism detection and authorship identification. The corpus used in the experiments was extracted from the PAN-PC-10 plagiarism detection task, with two subsets containing manually and artificially generated plagiarism cases. The accuracies achieved are well above a by chance baseline across datasets and settings, with the statistical indicators being particularly effective.

6 citations

Journal ArticleDOI
TL;DR: Different cross-language plagiarism detection approaches in context of Indian languages pairs such as Hindi-English, English-Hindi language pairs are presented.
Abstract: Plagiarism is defined as stealing of another author’s language, thoughts, or ideas as one’s own original work. Most of the work in the literature has been emphasized on monolingual plagiarism detec...

6 citations

Journal ArticleDOI
TL;DR: A plagiarism detection framework based on three deep learning models: Doc2vec, Siamese Long Short-term Memory (SLSTM) and Convolutional Neural Network (CNN) that can detect different types of plagiarism, enables to specify another dataset and supports to compare the document from an internet search.
Abstract: The Plagiarism is an increasingly widespread and growing problem in the academic field. Several plagiarism techniques are used by fraudsters, ranging from a simple synonym replacement, sentence structure modification, to more complex method involving several types of transformation. Human based plagiarism detection is difficult, not accurate, and time-consuming process. In this paper we propose a plagiarism detection framework based on three deep learning models: Doc2vec, Siamese Long Short-term Memory (SLSTM) and Convolutional Neural Network (CNN). Our system uses three layers: Preprocessing Layer including word embedding, Learning Layers and Detection Layer. To evaluate our system, we carried out a study on plagiarism detection tools from the academic field and make a comparison based on a set of features. Compared to other works, our approach performs a good accuracy of 98.33 % and can detect different types of plagiarism, enables to specify another dataset and supports to compare the document from an internet search.

6 citations

Journal ArticleDOI
TL;DR: This research will produce plagiarism detection using winnowing algorithm with English-Indonesian dictionary technique with the aim of preventing plagiarism in the academic community.
Abstract: The ease of obtaining information that is easy, fast, and cheap from all over the world through the internet network can encourage someone to take action plagiarism. Plagiarism is an intellectual crime that often occurs in the writing world where the perpetrators take the work of others without declaring the original source; if it continues to be left it will have a negative impact on the academic community and can be a chronic disease in the progress of a nation. At this time, the process of plagiarism detection is done manually and automatically using the help of technological developments (plagiarism detection), but the automatic checks available now mostly just check every letter character contained in the document, cannot check where the plagiarist takes a quote from a foreign language and changed in plagiarist language. Detection of plagiarism in this study will use a winnowing algorithm that has a function to check every character in two samples by hashing method that can generate fingerprint from two documents. While the dictionary method English-Indonesia change the writing from English to Indonesian language. This research will produce plagiarism detection using winnowing algorithm with English-Indonesian dictionary technique.

6 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125