Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Code Comparison Algorithm Based on AST for Plagiarism Detection

[...]

Jianglang Feng¹, Baojiang Cui², Kunfeng Xia²•Institutions (2)

Chengdu University of Technology¹, Beijing University of Posts and Telecommunications²

09 Sep 2013

TL;DR: This paper will introduce a technique based on the Abstract Syntax Tree (AST) that can effectively detects the plagiarism cases of changing the names of methods and variables in the code, reordering the sequences of the code and so on.

...read moreread less

Abstract: Plagiarism detection technology plays a very important role in copyright protection of computer software. The plagiarism technology mainly includes text-based, token-based and syntax-based technologies. This paper will introduce a technique based on the Abstract Syntax Tree (AST). This algorithm based on AST can effectively detects the plagiarism cases of changing the names of methods and variables in the code, reordering the sequences of the code and so on. According to algorithm of the Abstract Syntax Tree, we will calculates hash values of every node in the AST, then store the AST, and compare the hash value node by node after completing all of the above. Finally, we will use the experiments to illustrate the superiority of the AST algorithm.

...read moreread less

23 citations

Proceedings Article•DOI•

Plagiarism detection based on structural information

[...]

Efstathios Stamatatos¹•Institutions (1)

University of the Aegean¹

24 Oct 2011

TL;DR: It is shown that stopword n-grams are able to capture local syntactic similarities between suspicious and original documents and an algorithm for detecting the exact boundaries of plagiarized and source passages is proposed.

...read moreread less

Abstract: In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses mainly content terms to represent documents, the proposed method is based on structural information provided by occurrences of a small list of stopwords (i.e., very frequent words). We show that stopword n-grams are able to capture local syntactic similarities between suspicious and original documents. Moreover, an algorithm for detecting the exact boundaries of plagiarized and source passages is proposed. Experimental results on a publicly-available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified by replacing most of the words or phrases with synonyms to hide the similarity with the source documents.

...read moreread less

23 citations

Journal Article•DOI•

A set theory based similarity measure for text clustering and classification

[...]

Ali A. Amer¹, Hassan Ismail Abdalla²•Institutions (2)

Taiz University¹, Zayed University²

01 Dec 2020-Journal of Big Data

TL;DR: The proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.

...read moreread less

Abstract: Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.

...read moreread less

23 citations

Proceedings Article•DOI•

Detection of near-duplicate user generated contents: the SMS spam collection

[...]

Enrique Vallés¹, Paolo Rosso¹•Institutions (1)

Polytechnic University of Valencia¹

28 Oct 2011

TL;DR: This work investigated whether plagiarism detection tools could be used as filters for spam text messages and solved the near-duplicate detection problem on the basis of a clustering approach using CLUTO framework.

...read moreread less

Abstract: Today, the number of spam text messages has grown in number, mainly because companies are looking for free advertising. For the users is very important to filter these kinds of spam messages that can be viewed as near-duplicate texts because mostly created from templates. The identification of spam text messages is a very hard and time-consuming task and it involves to carefully scanning hundreds of text messages. Therefore, since the task of near-duplicate detection can be seen as a specific case of plagiarism detection, we investigated whether plagiarism detection tools could be used as filters for spam text messages. Moreover we solve the near-duplicate detection problem on the basis of a clustering approach using CLUTO framework. We carried out some preliminary experiments on the SMS Spam Collection that recently was made available for research purposes. The results were compared with the ones obtained with the CLUTO. Althought plagiarism detection tools detect a good number of near-duplicate SMS spam messages even better results are obtained with the CLUTO clustering tool.

...read moreread less

23 citations

Journal Article•DOI•

Reviving Sequential Program Birthmarking for Multithreaded Software Plagiarism Detection

[...]

Zhenzhou Tian¹, Ting Liu¹, Qinghua Zheng¹, Eryue Zhuang¹, Ming Fan¹, Zijiang Yang² - Show less +2 more•Institutions (2)

Xi'an Jiaotong University¹, Western Michigan University²

01 May 2018-IEEE Transactions on Software Engineering

TL;DR: A framework called TOB (Thread-oblivious dynamic Birthmark) is proposed that revives existing techniques so they can be applied to detect plagiarism of multithreaded programs by thread-ob oblivious algorithms that shield the influence of thread schedules on executions.

...read moreread less

Abstract: As multithreaded programs become increasingly popular, plagiarism of multithreaded programs starts to plague the software industry. Although there has been tremendous progress on software plagiarism detection technology, existing dynamic birthmark approaches are applicable only to sequential programs, due to the fact that thread scheduling nondeterminism severely perturbs birthmark generation and comparison. We propose a framework called TOB (Thread-oblivious dynamic Birthmark) that revives existing techniques so they can be applied to detect plagiarism of multithreaded programs. This is achieved by thread-oblivious algorithms that shield the influence of thread schedules on executions. We have implemented a set of tools collectively called TOB-PD (TOB based Plagiarism Detection tool) by applying TOB to three existing representative dynamic birthmarks, including SCSSB (System Call Short Sequence Birthmark), DYKIS (DYnamic Key Instruction Sequence birthmark) and JB (an API based birthmark for Java). Our experiments conducted on large number of binary programs show that our approach exhibits strong resilience against state-of-the-art semantics-preserving code obfuscation techniques. Comparisons against the three existing tools SCSSB, DYKIS and JB show that the new framework is effective for plagiarism detection of multithreaded programs. The tools, the benchmarks and the experimental results are all publicly available.

...read moreread less

23 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics