Topic
Plagiarism detection
About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The research shows that most of the anti‐plagiarism services can be cracked through different methods and artificial intelligence techniques can help to improve the performance of the detection procedure.
Abstract: Purpose – This paper aims to focus on plagiarism and the consequences of anti‐plagiarism services such as Turnitin.com, iThenticate, and PlagiarismDetect.com in detecting the most recent cheatings in academic and other writings.Design/methodology/approach – The most important approach is plagiarism prevention and finding proper solutions for detecting more complex kinds of plagiarism through natural language processing and artificial intelligence self‐learning techniques.Findings – The research shows that most of the anti‐plagiarism services can be cracked through different methods and artificial intelligence techniques can help to improve the performance of the detection procedure.Research limitations/implications – Accessing entire data and plagiarism algorithms is not possible completely, so comparing is just based on the outputs from detection services. They may produce different results on the same inputs.Practical implications – Academic papers and web pages are increasing over time, and it is very ...
26 citations
••
01 Oct 2014TL;DR: A novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that is called n-gram classes is introduced which is comparable to the best state-of-the-art methods.
Abstract: When it is not possible to compare the suspicious document to the source document(s) plagiarism has been committed from, the evidence of plagiarism has to be looked for intrinsically in the document itself. In this paper, we introduce a novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that we called n-gram classes. The proposed method was evaluated on three publicly available standard corpora. The obtained results are comparable to the ones obtained by the best state-of-the-art methods.
25 citations
••
TL;DR: In this paper, a survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories: word-to-word based, structure-based, and vector-based.
Abstract: Objective/Methods: This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structurebased, and vector-based are the most widely used approaches to find sentences similarity. Findings/Application: Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences’ structures needs more investigation.
Keywords: Sentence Representation, Sentences Similarity, Structural Similarity, Word Embedding, Words Similarity
25 citations
••
TL;DR: The prevalence of matches with one's own publications calls for more explicit operational standards among disciplines in this regard and points toward factors that may contribute to unintentional self-plagiarism, such as lexical bundles or authors' stylistic habits in writing.
25 citations
••
01 Nov 2013TL;DR: By introducing dynamic data flow analysis into birthmark generation, DKISB is able to produce a high quality birthmark that is closely correlated to program semantics, making it resilient to various kinds of semantic-preserving code obfuscation techniques.
Abstract: With the burst of open source software, software plagiarism has been a serious threat to the healthy development of software industry. Software birthmark reflecting intrinsic properties of software, is an effective way for the detection of software theft. However, most of the existing software birthmarks face a series of challenges: (1) the absence of source code, (2) diversity of operating systems and programing languages, (3) various automated code obfuscation techniques. In this paper, a dynamic key instruction sequence based software birthmark (DKISB) is proposed. By introducing dynamic data flow analysis into birthmark generation, we are able to produce a high quality birthmark that is closely correlated to program semantics, making it resilient to various kinds of semantic-preserving code obfuscation techniques. Based on the Pin instrumentation framework, a DKISB based software plagiarism detection system is implemented, which generates birthmarks for both the plaintiff and defendant program, and then make the plagiarism decision according to the similarity of their birthmarks. The experimental results show that DKISB is effective to either weak obfuscation techniques like compiler optimization or strong obfuscation techniques provided by tools such as Sand Mark.
25 citations