Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Source Code Authorship Attribution Using Long Short-Term Memory Based Networks

[...]

Bander Alsulami¹, Edwin Dauber¹, Richard Harang, Spiros Mancoridis¹, Rachel Greenstadt¹ - Show less +1 more•Institutions (1)

Drexel University¹

11 Sep 2017

TL;DR: This work states that the introduction of features derived from the Abstract Syntax Tree of source code has recently set new benchmarks in this area, significantly improving over previous work that relied on easily obfuscatable lexical and format features of program source code.

...read moreread less

Abstract: Machine learning approaches to source code authorship attribution attempt to find statistical regularities in human-generated source code that can identify the author or authors of that code. This has applications in plagiarism detection, intellectual property infringement, and post-incident forensics in computer security. The introduction of features derived from the Abstract Syntax Tree (AST) of source code has recently set new benchmarks in this area, significantly improving over previous work that relied on easily obfuscatable lexical and format features of program source code. However, these AST-based approaches rely on hand-constructed features derived from such trees, and often include ancillary information such as function and variable names that may be obfuscated or manipulated.

...read moreread less

75 citations

Proceedings Article•DOI•

Using code metric histograms and genetic algorithms to perform author identification for software forensics

[...]

Robert Charles Lange¹, Spiros Mancoridis¹•Institutions (1)

Drexel University¹

07 Jul 2007

TL;DR: This paper documents the results of the experiments in author identification for software forensics and outlines future directions of research to improve the utility of the method.

...read moreread less

Abstract: We have developed a technique to characterize software developers- styles using a set of source code metrics. This style fingerprint can be used to identify the likely author of a piece of code from a pool of candidates. Author identification has applications in criminal justice, corporate litigation, and plagiarism detection. Furthermore, we can identify candidate developers who share similar styles, making our technique useful for software maintenance as well. Our method involves measuring the differences in histogram distributions for code metrics.Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics, and the time involved in exhaustive searching of the problem space prevented us from adding additional metrics. Using a genetic algorithm to perform the search, we were able to find good metric combinations in hours as opposed to weeks. The genetic algorithm has enabled us to begin adding new metrics to our catalog of available metrics. This paper documents the results of our experiments in author identification for software forensics and outlines future directions of research to improve the utility of our method.

...read moreread less

75 citations

Journal Article•DOI•

[...]

Matija Novak¹, Mike Joy², Dragutin Kermek¹•Institutions (2)

University of Zagreb¹, University of Warwick²

21 May 2019-ACM Transactions on Computing Education

TL;DR: This review gives an overview of definitions of plagiarism, plagiarism detection tools, comparison metrics, obfuscation methods, datasets used for comparison, and algorithm types and identifies interesting insights about metrics and datasets for quantitative tool comparison and categorisation of detection algorithms.

...read moreread less

Abstract: Teachers deal with plagiarism on a regular basis, so they try to prevent and detect plagiarism, a task that is complicated by the large size of some classes. Students who cheat often try to hide their plagiarism (obfuscate), and many different similarity detection engines (often called plagiarism detection tools) have been built to help teachers. This article focuses only on plagiarism detection and presents a detailed systematic review of the field of source-code plagiarism detection in academia. This review gives an overview of definitions of plagiarism, plagiarism detection tools, comparison metrics, obfuscation methods, datasets used for comparison, and algorithm types. Perspectives on the meaning of source-code plagiarism detection in academia are presented, together with categorisations of the available detection tools and analyses of their effectiveness. While writing the review, some interesting insights have been found about metrics and datasets for quantitative tool comparison and categorisation of detection algorithms. Also, existing obfuscation methods classifications have been expanded together with a new definition of “source-code plagiarism detection in academia.”

...read moreread less

75 citations

External and Intrinsic Plagiarism Detection Using Vector Space Models

[...]

Markus Muhr, Mario Zechner, Roman Kern, Michael Granitzer

01 Jan 2009

TL;DR: This work presents a conceptually simple space partitioning approach to achieve search times sub linear in the number of ref- erence documents, trading precision for speed.

...read moreread less

Abstract: Plagiarism detection can be divided in external and intrinsic methods. Naive external plagiarism analysis suffers from computationally demanding full near- est neighbor searches within a reference corpus. We present a conceptually simple space partitioning approach to achieve search times sub linear in the number of ref- erence documents, trading precision for speed. We focus on full duplicate searches while achieving acceptable results in the near duplicate case. Intrinsic plagiarism analysis tries to find plagiarized passages within a document without any exter- nal knowledge. We use several topic independent stylometric features from which a vector space model for each sentence of a suspicious document is constructed. Plagiarized passages are detected by an outlier analysis relative to the document mean vector. Our system was created for the first PAN competition on plagiarism detection in 2009. The evaluation was performed on the challenge's development

...read moreread less

74 citations

Proceedings Article•

Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling.

[...]

Martin Potthast¹, Tim Gollub¹, Francisco M. Rangel Pardo, Paolo Rosso², Efstathios Stamatatos³, Benno Stein¹ - Show less +2 more•Institutions (3)

Bauhaus University, Weimar¹, Polytechnic University of Valencia², University of the Aegean³

01 Jan 2014

TL;DR: The PAN 2014 evaluation lab as mentioned in this paper proposed a new web service called TIRA, which facilitates software submissions and allows participants to submit running softwares instead of their run output, which helps to reduce the workload for both participants and organizers.

...read moreread less

Abstract: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. To improve the reproducibility of shared tasks in general, and PAN’s tasks in particular, the Webis group developed a new web service called TIRA, which facilitates software submissions. Unlike many other labs, PAN asks participants to submit running softwares instead of their run output. To deal with the organizational overhead involved in handling software submissions, the TIRA experimentation platform helps to significantly reduce the workload for both participants and organizers, whereas the submitted softwares are kept in a running state. This year, we addressed the matter of responsibility of successful execution of submitted softwares in order to put participants back in charge of executing their software at our site. In sum, 57 softwares have been submitted to our lab; together with the 58 software submissions of last year, this forms the largest collection of softwares for our three tasks to date, all of which are readily available for further analysis. The report concludes with a brief summary of each task.

...read moreread less

74 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics