Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Winnowing: local algorithms for document fingerprinting

[...]

Saul Schleimer¹, Daniel Shawcross Wilkerson², Alex Aiken²•Institutions (2)

University of Illinois at Chicago¹, University of California, Berkeley²

09 Jun 2003

TL;DR: The class of local document fingerprinting algorithms is introduced, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies, and a novel lower bound on the performance of any local algorithm is proved.

...read moreread less

Abstract: Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.

...read moreread less

1,220 citations

Proceedings Article•DOI•

GPLAG: detection of software plagiarism by program dependence graph analysis

[...]

Chao Liu¹, Chen Chen¹, Jiawei Han¹, Philip S. Yu²•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

20 Aug 2006

TL;DR: A new plagiarism detection tool, called GPLAG, is developed, which detects plagiarism by mining program dependence graphs (PDGs) and is more effective than state-of-the-art tools for plagiarism Detection.

...read moreread less

Abstract: Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own products. Although current plagiarism detection tools appear sufficient for academic use, they are nevertheless short for fighting against serious plagiarists. For example, disguises like statement reordering and code insertion can effectively confuse these tools. In this paper, we develop a new plagiarism detection tool, called GPLAG, which detects plagiarism by mining program dependence graphs (PDGs). A PDG is a graphic representation of the data and control dependencies within a procedure. Because PDGs are nearly invariant during plagiarism, GPLAG is more effective than state-of-the-art tools for plagiarism detection. In order to make GPLAG scalable to large programs, a statistical lossy filter is proposed to prune the plagiarism search space. Experiment study shows that GPLAG is both effective and efficient: It detects plagiarism that easily slips over existing tools, and it usually takes a few seconds to find (simulated) plagiarism in programs having thousands of lines of code.

...read moreread less

467 citations

Proceedings Article•

Overview of the 2nd International Competition on Plagiarism Detection

[...]

Martin Potthast, Alberto Barrón-Cedeño, Andreas Eiselt, Benno Stein, Paolo Rosso¹, Bauhaus-Universiät Weimar - Show less +2 more•Institutions (1)

Polytechnic University of Valencia¹

01 Jan 2011

TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.

...read moreread less

Abstract: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. We start with a unified retrieval process that sum- marizes the best practices employed this year. Then, the detectors' performances are evaluated in detail, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length. Finally, all results are compared to those of last year's competition.

...read moreread less

419 citations

Journal Article•

Plagiarism - A Survey

[...]

Hermann A. Maurer, Frank Kappe, Bilal Zaka

01 Jan 2006-Journal of Universal Computer Science

TL;DR: This paper discusses the complex general setting, then reports on some results of plagiarism detection software, and draws attention to the fact that any serious investigation in plagiarism turns up rather unexpected side-effects.

...read moreread less

Abstract: Plagiarism in the sense of "theft of intellectual property" has been around for as long as humans have produced work of art and research. However, easy access to the Web, large databases, and telecommunication in general, has turned plagiarism into a serious problem for publishers, researchers and educational institutions. In this paper, we concentrate on textual plagiarism (as opposed to plagiarism in music, paintings, pictures, maps, technical drawings, etc.). We first discuss the complex general setting, then report on some results of plagiarism detection software and finally draw attention to the fact that any serious investigation in plagiarism turns up rather unexpected side-effects. We believe that this paper is of value to all researchers, educators and students and should be considered as seminal work that hopefully will encourage many still deeper investigations.

...read moreread less

339 citations

Proceedings Article•DOI•

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

[...]

Xiaojun Xu¹, Chang Liu², Qian Feng³, Heng Yin⁴, Le Song⁵, Dawn Song² - Show less +2 more•Institutions (5)

Shanghai Jiao Tong University¹, University of California, Berkeley², Samsung³, University of California, Riverside⁴, Georgia Institute of Technology⁵

30 Oct 2017

TL;DR: This work proposes a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy.

...read moreread less

Abstract: The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph-matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art's embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.

...read moreread less

339 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics