scispace - formally typeset
Proceedings ArticleDOI

GPLAG: detection of software plagiarism by program dependence graph analysis

Reads0
Chats0
TLDR
A new plagiarism detection tool, called GPLAG, is developed, which detects plagiarism by mining program dependence graphs (PDGs) and is more effective than state-of-the-art tools for plagiarism Detection.
Abstract
Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own products. Although current plagiarism detection tools appear sufficient for academic use, they are nevertheless short for fighting against serious plagiarists. For example, disguises like statement reordering and code insertion can effectively confuse these tools. In this paper, we develop a new plagiarism detection tool, called GPLAG, which detects plagiarism by mining program dependence graphs (PDGs). A PDG is a graphic representation of the data and control dependencies within a procedure. Because PDGs are nearly invariant during plagiarism, GPLAG is more effective than state-of-the-art tools for plagiarism detection. In order to make GPLAG scalable to large programs, a statistical lossy filter is proposed to prune the plagiarism search space. Experiment study shows that GPLAG is both effective and efficient: It detects plagiarism that easily slips over existing tools, and it usually takes a few seconds to find (simulated) plagiarism in programs having thousands of lines of code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

TL;DR: A qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools is provided, and a taxonomy of editing scenarios that produce different clone types and a qualitative evaluation of current clone detectors are evaluated.

A Survey on Software Clone Detection Research

TL;DR: The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.
Proceedings ArticleDOI

Deep learning code fragments for code clone detection

TL;DR: This work introduces learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository, and compared its approach to a traditional structure-oriented technique and found that it detected clones that were either undetected or suboptimally reported by the prominent tool Deckard.
Proceedings ArticleDOI

Scalable detection of semantic clones

TL;DR: This paper efficiently solve the tree similarity problem to create a scalable analysis that locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.
Proceedings ArticleDOI

Structural detection of android malware using embedded call graphs

TL;DR: This paper proposes a method for malware detection based on efficient embeddings of function call graphs with an explicit feature map inspired by a linear-time graph kernel that outperforms several related approaches and detects 89% of the malware with few false alarms, while also allowing to pin-point malicious code structures within Android applications.
References
More filters
Book

Computers and Intractability: A Guide to the Theory of NP-Completeness

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.

Biometery: The principles and practice of statistics in biological research

TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Book

Biometry: The Principles and Practice of Statistics in Biological Research

TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Related Papers (5)