scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The research shows that most of the anti‐plagiarism services can be cracked through different methods and artificial intelligence techniques can help to improve the performance of the detection procedure.
Abstract: Purpose – This paper aims to focus on plagiarism and the consequences of anti‐plagiarism services such as Turnitin.com, iThenticate, and PlagiarismDetect.com in detecting the most recent cheatings in academic and other writings.Design/methodology/approach – The most important approach is plagiarism prevention and finding proper solutions for detecting more complex kinds of plagiarism through natural language processing and artificial intelligence self‐learning techniques.Findings – The research shows that most of the anti‐plagiarism services can be cracked through different methods and artificial intelligence techniques can help to improve the performance of the detection procedure.Research limitations/implications – Accessing entire data and plagiarism algorithms is not possible completely, so comparing is just based on the outputs from detection services. They may produce different results on the same inputs.Practical implications – Academic papers and web pages are increasing over time, and it is very ...

26 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that is called n-gram classes is introduced which is comparable to the best state-of-the-art methods.
Abstract: When it is not possible to compare the suspicious document to the source document(s) plagiarism has been committed from, the evidence of plagiarism has to be looked for intrinsically in the document itself. In this paper, we introduce a novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that we called n-gram classes. The proposed method was evaluated on three publicly available standard corpora. The obtained results are comparable to the ones obtained by the best state-of-the-art methods.

25 citations

Journal ArticleDOI
TL;DR: In this paper, a survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories: word-to-word based, structure-based, and vector-based.
Abstract: Objective/Methods: This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structurebased, and vector-based are the most widely used approaches to find sentences similarity. Findings/Application: Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences’ structures needs more investigation. Keywords: Sentence Representation, Sentences Similarity, Structural Similarity, Word Embedding, Words Similarity

25 citations

Journal ArticleDOI
TL;DR: The prevalence of matches with one's own publications calls for more explicit operational standards among disciplines in this regard and points toward factors that may contribute to unintentional self-plagiarism, such as lexical bundles or authors' stylistic habits in writing.

25 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: By introducing dynamic data flow analysis into birthmark generation, DKISB is able to produce a high quality birthmark that is closely correlated to program semantics, making it resilient to various kinds of semantic-preserving code obfuscation techniques.
Abstract: With the burst of open source software, software plagiarism has been a serious threat to the healthy development of software industry. Software birthmark reflecting intrinsic properties of software, is an effective way for the detection of software theft. However, most of the existing software birthmarks face a series of challenges: (1) the absence of source code, (2) diversity of operating systems and programing languages, (3) various automated code obfuscation techniques. In this paper, a dynamic key instruction sequence based software birthmark (DKISB) is proposed. By introducing dynamic data flow analysis into birthmark generation, we are able to produce a high quality birthmark that is closely correlated to program semantics, making it resilient to various kinds of semantic-preserving code obfuscation techniques. Based on the Pin instrumentation framework, a DKISB based software plagiarism detection system is implemented, which generates birthmarks for both the plaintiff and defendant program, and then make the plagiarism decision according to the similarity of their birthmarks. The experimental results show that DKISB is effective to either weak obfuscation techniques like compiler optimization or strong obfuscation techniques provided by tools such as Sand Mark.

25 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125