Showing papers on "Plagiarism detection published in 2007"

PDF

Open Access

Proceedings Article•DOI•

Computer-based plagiarism detection methods and tools: an overview

[...]

Romans Lukashenko¹, Vita Graudina¹, Janis Grundspenkis¹•Institutions (1)

14 Jun 2007

TL;DR: The paper is dedicated to plagiarism problem and the ways how to reduce plagiarism: both: plagiarism prevention and plagiarism detection are discussed.

...read moreread less

Abstract: The paper is dedicated to plagiarism problem. The ways how to reduce plagiarism: both: plagiarism prevention and plagiarism detection are discussed. Widely used plagiarism detection methods are described. The most known plagiarism detection tools are analysed.

...read moreread less

163 citations

Journal Issue•DOI•

Efficient plagiarism detection for large code repositories

[...]

Steven Burrows¹, S. M. M. Tahaghoghi¹, Justin Zobel¹•Institutions (1)

RMIT University¹

01 Feb 2007-Software - Practice and Experience

TL;DR: This paper proposes techniques for detecting plagiarism in program code using text similarity measures and local alignment and shows that their approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems.

...read moreread less

Abstract: Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems. Copyright © 2006 John Wiley & Sons, Ltd.

...read moreread less

118 citations

Book Chapter•DOI•

Plagiarism Detection Without Reference Collections

[...]

Sven Meyer zu Eissen¹, Benno Stein¹, Marion Kulig¹•Institutions (1)

Bauhaus University, Weimar¹

01 Jan 2007

TL;DR: Current research in the field of automatic plagiarism detection for text documents focuses on the development of algorithms that compare suspicious documents against potential original documents.

...read moreread less

Abstract: Current research in the field of automatic plagiarism detection for text documents focuses on the development of algorithms that compare suspicious documents against potential original documents. Although recent approaches perform well in identifying copied or even modified passages ([Brin et al. (1995), Stein (2005)]), they assume a closed world where a reference collection must be given (Finkel (2002)). Recall that a human reader can identify suspicious passages within a document without having a library of potential original documents in mind.

...read moreread less

113 citations

Journal Article•DOI•

Turnitin[R]: The Student Perspective on Using Plagiarism Detection Software.

[...]

Stephan Dahl¹•Institutions (1)

Middlesex University¹

01 Jul 2007-Active Learning in Higher Education

TL;DR: A study has so far tried to look at how students react towards plagiarism detection systems, such as the web-based Turnitin system, and found that students are more likely to cheat than not.

...read moreread less

Abstract: Recently there has been an increasing interest in plagiarism detection systems, such as the web-based Turnitin system. However, no study has so far tried to look at how students react towards those...

...read moreread less

96 citations

Proceedings Article•DOI•

Using code metric histograms and genetic algorithms to perform author identification for software forensics

[...]

Robert Charles Lange¹, Spiros Mancoridis¹•Institutions (1)

Drexel University¹

07 Jul 2007

TL;DR: This paper documents the results of the experiments in author identification for software forensics and outlines future directions of research to improve the utility of the method.

...read moreread less

Abstract: We have developed a technique to characterize software developers- styles using a set of source code metrics. This style fingerprint can be used to identify the likely author of a piece of code from a pool of candidates. Author identification has applications in criminal justice, corporate litigation, and plagiarism detection. Furthermore, we can identify candidate developers who share similar styles, making our technique useful for software maintenance as well. Our method involves measuring the differences in histogram distributions for code metrics.Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics, and the time involved in exhaustive searching of the problem space prevented us from adding additional metrics. Using a genetic algorithm to perform the search, we were able to find good metric combinations in hours as opposed to weeks. The genetic algorithm has enabled us to begin adding new metrics to our catalog of available metrics. This paper documents the results of our experiments in author identification for software forensics and outlines future directions of research to improve the utility of our method.

...read moreread less

75 citations

Proceedings Article•DOI•

Plagiarism detection using feature-based neural networks

[...]

Steve Engels¹, Vivek Lakshmanan¹, Michelle Craig¹•Institutions (1)

University of Toronto¹

07 Mar 2007

TL;DR: This system uses neural network techniques to create a feature-based plagiarism detector and to measure the relevance of each feature in the assessment to produce results that are comparable to the most popular plagiarism detectors.

...read moreread less

Abstract: This paper focuses on the use of code features for automatic plagiarism detection. Instead of the text-based analyses employed by current plagiarism detectors, we propose a system that is based on properties of assignments that course instructors use to judge the similarity of two submissions. This system uses neural network techniques to create a feature-based plagiarism detector and to measure the relevance of each feature in the assessment. The system was trained and tested on assignments from an introductory computer science course, and produced results that are comparable to the most popular plagiarism detectors.

...read moreread less

69 citations

Proceedings Article•DOI•

A source code linearization technique for detecting plagiarized programs

[...]

Jeong-Hoon Ji¹, Gyun Woo¹, Hwan-Gue Cho¹•Institutions (1)

Pusan National University¹

25 Jun 2007

TL;DR: The static tracing method statically executes a program at the syntax-level and then extracts predefined keywords according to the order of the executed functions proves this method can detect plagiarism more effectively than the previously released plagiarism detecting method.

...read moreread less

Abstract: It is very important to detect plagiarized programs in the field of computer science education. Therefore, many tools and algorithms have been developed for this purpose. Generally, these tools are operated in two phases. In phase 1, a program plagiarism detecting tool generates an intermediate representation from a given program set. The intermediate representation should reflect the structural characterization of the program. Most tools use the parse tree or token sequence by intermediate representation. In phase 2, the program looks for plagiarized material and evaluates the similarity of two programs. It is helpful to announce the plagiarized metarials between two programs to the instructor. In this paper, we present the static tracing method in order to improve program plagiarism detection accuracy. The static tracing method statically executes a program at the syntax-level and then extracts predefined keywords according to the order of the executed functions. The result of experiment proves this method can detect plagiarism more effectively than the previously released plagiarism detecting method.

...read moreread less

57 citations

Journal Article•DOI•

Plagiarism detection using feature-based neural networks

[...]

EngelsSteve, LakshmananVivek, CraigMichelle

07 Mar 2007-ACM Sigcse Bulletin

TL;DR: This paper focuses on the use of code features for automatic plagiarism detection, and proposes a system that is based on text-based analyses employed by current plagiarism detectors.

...read moreread less

40 citations

Proceedings Article•DOI•

Automatic conceptual analysis for plagiarism detection

[...]

Heinz Dreher¹•Institutions (1)

Curtin University¹

01 Jan 2007-Issues in Informing Science and Information Technology

TL;DR: The Maurer et al. (2006) provide a thorough analysis of the plagiarism problem and possible solutions as they pertain to academia, and divide the solution strategies into three main categories.

...read moreread less

Abstract: Introduction Plagiarism is now acknowledged to pose a significant threat to academic integrity. There is a growing array of software packages to help address the problem. Most of these offer a string-oftext comparison. New to emerge are software packages and services to 'generate' assignments. Naturally there will be a cat and mouse game for a while and in the meantime academics need to be alert to the possibilities of academic malpractice via plagiarism and adopt appropriate and promising counter-measures, including the newly emerging algorithms to do fast conceptual analysis. One such emergent agent is the Normalised Word Vector (NWV) algorithm (Williams, 2006), which was originally developed for use in the Automated Essay Grading (AEG) domain. AEG is a relatively new technology which aims to score or grade essays at the level of expert humans. This is achieved by creating a mathematical representation of the semantic information in addition to checking spelling, grammar, and other more usual parameters associated with essay assessment. The mathematical representation is computed for each student essay and compared with a mathematical representation computed for the model answer. If we can represent the semantic content of an essay we are able to compare it to some standard model--hence determine a grade or assign an authenticity parameter relative to any given corpus; and create a persistent digital representation of the essay. AEG technology can be used for plagiarism detection because it processes the semantic information of student essays and creates a semantic footprint. Once a mathematical representation for all or parts of an essay is created it can be efficiently compared to other similarly constructed representations and facilitate plagiarism checking through semantic footprint comparison. The Plagiarism Problem The extent of plagiarism is indeed significant. Maurer et al. (2006) provide a thorough analysis of the plagiarism problem and possible solutions as they pertain to academia. They divide the solution strategies into three main categories. The most common method is based on document comparison in which a word for word check is made with each target document in a selected which could be the source of the copied material. Clearly this is language independent as one is essentially comparing character strings; it will also match misspellings. The selected set of document is usually all documents comprising assignment or paper submissions for a specific purpose. A second category is an expansion of the document check but where the set of target documents is 'everything' that is reachable on the internet and the candidate to be checked for is a characteristic paragraph or sentence rather than the entire document. The emergence of tools such as Google has made this type of check feasible. The third category mentioned by Maurer et al. is the use of stylometry, in which a language analysis algorithm compares the style of successive paragraphs and reports if a style change has occurred. This can be extended to analyzing prior documents by the same author and comparing the stylistic parameters of a succession of documents. However, the issue of plagiarism is not merely a matter for academics. Austrian journalist Josef Karner (2001) writes "Das Abschreiben ist der eigentliche Beruf des Dichters" ("Transcription is the virtual vocation of the poet"). Is then the poet essentially a professional plagiarist, taking others' ideas and presenting them in verse as his own and without attribution? This may be a rather extreme position to hold, but its consideration does point up interesting possibilities which the etymology of plagiarism may illuminate. As yet there is a paucity of statistics available to help us understand the extent of plagiarism. However a recent Canadian study (Kloda & Nicholson, 2005) has reported that one in three students admit to turning to plagiarism prior to graduation - serious enough one may think. …

...read moreread less

26 citations

Proceedings Article•DOI•

Fast and reliable plagiarism detection system

[...]

Maxim Mozgovoy¹, Sergey Karakovskiy², Vitaly Klyuev³•Institutions (3)

University of Eastern Finland¹, Saint Petersburg State University², University of Aizu³

01 Oct 2007

TL;DR: This work introduces a new two-step approach to plagiarism detection that combines high algorithmic performance and the quality of pairwise file comparison, and shows that the proposed method does not noticeably reduce thequality of the pairwise comparison mechanism while providing better speed characteristics.

...read moreread less

Abstract: Plagiarism and similarity detection software is well-known in universities for years. Despite the variety of methods and approaches used in plagiarism detection, the typical trade-off between the speed and the reliability of the algorithm still remains. We introduce a new two-step approach to plagiarism detection that combines high algorithmic performance and the quality of pairwise file comparison. Our system uses fast detection method to select suspicious files only, and then invokes precise (and slower) algorithms to get reliable results. We show that the proposed method does not noticeably reduce the quality of the pairwise comparison mechanism while providing better speed characteristics.

...read moreread less

25 citations

Posted Content•

AC: An Integrated Source Code Plagiarism Detection Environment

[...]

Manuel Freire¹, Manuel Cebrian, Emilio del Rosal•Institutions (1)

Autonomous University of Madrid¹

28 Mar 2007-arXiv: Information Theory

TL;DR: AC, a modular plagiarism detection system, is presented, which is portable across platforms and assignment formats and provides easy extraction into the internal assignment representation.

...read moreread less

Abstract: Plagiarism detection in educational programming assignments is still a problematic issue in terms of resource waste, ethical controversy, legal risks, and technical complexity. This paper presents AC, a modular plagiarism detection system. The design is portable across platforms and assignment formats and provides easy extraction into the internal assignment representation. Multiple similarity measures have been incorporated, both existing and newly-developed. Statistical analysis and several graphical visualizations aid in the interpretation of analysis results. The system has been evaluated with a survey that encompasses several academic semesters of use at the authors' institution.

...read moreread less

Journal Article•DOI•

To cheat or not to cheat? A trial of the JISC Plagiarism Detection Service with biological sciences students

[...]

Joanne Louise Badge¹, Alan J. Cann¹, Jon Scott¹•Institutions (1)

University of Leicester¹

19 Jun 2007-Assessment & Evaluation in Higher Education

TL;DR: In this article, the authors present the results of a 2-year trial of the JISC plagiarism detection service (PDS) involving hundreds of students and discuss the effectiveness of the service in detecting plagiarized material and in acting as a deterrent.

...read moreread less

Abstract: In the UK, there is great concern about the perceived increase in plagiarized work being submitted by students in higher educations. Although there is much debate, the reasons for the perceived change are not completely clear. Here we present the results of a 2‐year trial of the JISC Plagiarism Detection Service (PDS) involving hundreds of students. The effectiveness of the service in detecting plagiarized material and in acting as a deterrent are discussed. Although an increased number of cases of plagiarism were detected during the trial, the relative contributions of the electronic detection system and increased staff awareness remain unknown.

...read moreread less

Proceedings Article•DOI•

A natural language processing approach to automatic plagiarism detection

[...]

Chi-Hong Leung¹, Yuen-Yan Chan¹•Institutions (1)

The Chinese University of Hong Kong¹

18 Oct 2007

TL;DR: It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach, and application of natural language processing can help to resolve this kind of problem.

...read moreread less

Abstract: The problem of plagiarism has existed for a long time but with the advance of information technology the problem becomes worse. It is because there are many electronic versions of published materials available to everyone. The Web is an important and common source for plagiarism. Some plagiarism detection programs (such as Turnitin) were developed to attempt to deal with this problem. To determine whether an article is copied from the Web or other electronic sources, the plagiarism detection program should calculate the similarity between two articles. However, it is often difficult to detect plagiarism accurately after modification of the copied contents. For example, it is possible to simply replace a word with its synonym (e.g. "program" -- "software ") and change the entire sentence structure. Most plagiarism detection programs can only compare whether two words are the same lexically and count how many matched words are there in a paper. Thus, if the copied materials are modified deliberately, it becomes difficult to detect plagiarism.Application of natural language processing can help to resolve this kind of problem. The underlying syntactic structure and semantic meaning of two sentences can be compared to reveal their similarity. There are several steps in the matching procedure. First, the thesaurus (or the lexical hierarchical structure) is referenced to find out the synonyms, broader terms and narrower terms used in the paper being checked. Then, the paper will be compared with the documents in the database. Wordnet is a typical example of the thesaurus that can be used for this purpose. If it is suspected that the paper contains some contents from the database, the sentences of the paper may be parsed to construct their parsing trees and semantic representations for further detailed comparison. The context free grammar and the case grammar are used to represent the syntactic structure and semantic meaning of sentences in the system. It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach.

...read moreread less

Proceedings Article•

Using natural language parsers in plagiarism detection.

[...]

Maxim Mozgovoy, Tuomo Kakkonen, Erkki Sutinen

01 Jan 2007

TL;DR: In this work it is shown how a natural language parser can be used to fight against basic plagiarism hiding methods.

...read moreread less

Abstract: The problem of plagiarism detection system design is a subject of numerous works of the last decades. Various advanced file-file comparison techniques were developed. However, most existing systems, aimed at natural language texts, do not perform any significant preprocessing of the input documents. So in many cases it is possible to hide the presence of plagiarism by utilizing some simple techniques. In this work we show how a natural language parser can be used to fight against basic plagiarism hiding methods.

...read moreread less

Proceedings Article•

Detection of Plagiarism in University Projects Using Metrics-based Spectral Similarity

[...]

Ettore Merlo¹•Institutions (1)

École Polytechnique de Montréal¹

01 Jan 2007

TL;DR: An original method of spectral similarity analysis for plagiarism detection in university project is presented in this paper, which is based on a clone detection tool called CLAN that performs metrics based on similarity analysis of source code fragments.

...read moreread less

Abstract: An original method of spectral similarity analysis for plagiarism detection in university project is presented. The approach is based on a clone detection tool called CLAN that performs metrics based similarity analysis of source code fragments. Definitions and algorithms for spectral similarity analysis are presented and discussed. Experiments performed on university projects are presented. Experimental results include the distribution of similarity in C and C++ projects. Analysis of spectral similarity distribution identifies the most similar pairs of projects that can be considered as candidates for plagiarism.

...read moreread less

Journal Article•

Plagiarism prevention or detection? The contribution of text-matching software to education about academic integrity

[...]

Dominic Keuskamp¹, Regina Sliuzas¹•Institutions (1)

Flinders University¹

12 Jan 2007-Journal of Academic Language and Learning

TL;DR: In this article, text-matching reports generated from 21 students were analysed to identify the extent and nature of identifiable plagiarism, and how the software communicated this to students, concluding that the quantitative information reported to the students by the software offered less assistance in determining if plagiarism had occurred than the more detailed information to be found in careful interpretation of the textmatching report.

...read moreread less

Abstract: Developing an understanding of academic integrity within students is one of the core objectives of many Academic Language and Learning (ALL) advisers, and the perceived rise of plagiarism suggests that this will continue to demand our attention. A recently available tool to assist advisers in this role is text-matching software (TMS). Routinely promoted on the basis of its capabilities for “plagiarism detection”, TMS also offers students educative opportunities which appropriately are web-based, given the increasing “web-dependency” of students. This paper examines how TMS can contribute to the role ALL advisers play in developing students’ understanding of academic integrity. Students from across one university were invited to submit their assignments to a TMS program called SafeAssignment™, offered as part of the university’s academic integrity policy. Text-matching reports generated from 21 students were analysed to identify the extent and nature of identifiable plagiarism, and how the software communicated this to students. Overall percentages of text-matching were low, with many students’ texts matching purely on information that was bibliographical, appropriately quoted, generic or technical. However, the quantitative information reported to the students by the software offered less assistance in determining if plagiarism had occurred than the more detailed information to be found in careful interpretation of the text-matching reports. A guide is presented for ALL advisers involved with interpreting reports with students.

...read moreread less

Journal Article•DOI•

Staff attitudes to dealing with plagiarism issues: Perspectives from one Australian university.

[...]

Geoffrey T Crisp

18 Sep 2007-The International Journal for Educational Integrity

TL;DR: In this paper, the authors report on the results of an online staff survey at one Australian university on attitudes to plagiarism issues, the use and efficacy of the institutional plagiarism policy and Turnitin system and staff perceptions of institutional resources that were available to assist both staff and students reduce the incidence of plagiarism.

...read moreread less

Abstract: This paper reports on the results of an online staff survey at one Australian university on attitudes to plagiarism issues, the use and efficacy of the institutional plagiarism policy and Turnitin system and staff perceptions of institutional resources that were available to assist both staff and students reduce the incidence of plagiarism. The survey was designed to capture staff perceptions, rather than verifiable activity or plagiarism detection outcomes. The survey responses highlighted the need for a common understanding of plagiarism and approaches to the detection and dealing with suspected plagiarism incidents. The responses also signalled a requirement for improved assessment practices that reduce the opportunity for plagiarism. Staff responses indicated that there was a need to publicise more effectively existing University resources for avoiding plagiarism; only a minority of survey respondents were aware of these resources. The majority of staff perceived that the institutional policies and practices were adequate for dealing with suspected plagiarism incidents.

...read moreread less

Journal Article•DOI•

Information distance and its applications

[...]

Ming Li¹•Institutions (1)

University of Waterloo¹

01 Aug 2007-International Journal of Foundations of Computer Science

TL;DR: This work summarizes the recent developments of a general theory of information distance and its applications in whole genome phylogeny, document comparison, internet query-answer systems, and many other data mining tasks and solves an open problem regarding the universality of the normalized information distance.

...read moreread less

Abstract: We have been developing a general theory of information distance and a paradigm of applying this theory to practical problems.[3, 19, 20] There are several problems associated with this theory. On the practical side, among other problems, the strict requirement of triangle inequality is unrealistic in some applications; on the theoretical side, the universality theorems for normalized information distances were only proved in a weak form. In this paper, we will introduce a complementary theory that resolves or avoids these problems. This article also serves as a brief expository summary for this area. We will tell the stories about how and why some of the concepts were introduced, recent theoretical developments and interesting applications. These applications include whole genome phylogeny, plagiarism detection, document comparison, music classification, language classification, fetal heart rate tracing, question answering, and a wide range of other data mining tasks.

...read moreread less

Proceedings Article•DOI•

A Program Plagiarism Detection Model Based on Information Distance and Clustering

[...]

Liang Zhang, Yue-ting Zhuang, Zhen-ming Yuan

11 Oct 2007

TL;DR: A metric, based on information distance, is proposed, to measure similarity between two programs and clustering analysis,based on shared near neighbors, is applied in order to provide more beneficial and detailed information about the program plagiarism.

...read moreread less

Abstract: Plagiarism in students programming assignment submissions causes considerable difficulties for course designers. Efficient detection of plagiarism in programming assignments of students is important to the educational procedure. This paper proposes a metric, based on information distance, to measure similarity between two programs. Furthermore, clustering analysis, based on shared near neighbors, is applied in order to provide more beneficial and detailed information about the program plagiarism. Experimental results demonstrate that our software has clear advantages over other plagiarism detection systems and it is quite beneficial to teachers to get rid of time-consuming and toilsome tasks. Key words: Program plagiarism, Detection, Information distance, Clustering

...read moreread less

Proceedings Article•DOI•

Extending Web Search for Online Plagiarism Detection

[...]

Yi-Ting Liu¹, Heng-Rui Zhang¹, Tai-Wei Chen¹, Wei-Guang Teng¹•Institutions (1)

National Cheng Kung University¹

04 Sep 2007

TL;DR: An online detection system to reduce such misapplication of search engines and suspicious documents are extracted and verified through the collaboration of this plagiarism detection system and search engines.

...read moreread less

Abstract: As information technologies advance, the data amount gathered on the Internet increases at an incredible rapid speed. To solve the data overloading problem, people commonly use Web search engines to find what they need. However, as search engines become an efficient and effective tool, plagiarists can grab, reassemble and redistribute text contents without much difficulty. In this paper, we develop an online detection system to reduce such misapplication of search engines. Specifically, suspicious documents are extracted and verified through the collaboration of our plagiarism detection system and search engines. With a proper design, extracted text segments are given different priorities when sending them to search engines as the ascertainment of plagiarism. This greatly reduces unnecessary and repetitive works when performing plagiarism detection.

...read moreread less

Journal Article•DOI•

A source code linearization technique for detecting plagiarized programs

[...]

JiJeong-Hoon, WooGyun, ChoHwan-Gue

25 Jun 2007-ACM Sigcse Bulletin

TL;DR: It is very important to detect plagiarized programs in the field of computer science education, and many tools and algorithms have been developed for this purpose.

...read moreread less

Proceedings Article•

Software development marketplaces: implications for plagiarism

[...]

Daryl D'Souza¹, Margaret Hamilton¹, Michael C. Harris¹•Institutions (1)

RMIT University¹

30 Jan 2007

TL;DR: This paper addresses the cheating problem of students purchasing solutions via websites that host software development marketplaces, and addresses the use of copy detection software for catching such cheating by copying.

...read moreread less

Abstract: Plagiarism of programming assignment solutions can be detected via a range of plagiarism detection tools, as long as the originally authored work is accessible electronically. For some time now, our school has used plagiarism detection software over all submissions of programming assignments in selected courses. The use of such software has provided a viable mechanism for catching such cheating by copying. However, the emergence of online software development websites now enables students to purchase solutions from software contractors. Students submit their assignment specifications to such websites and receive bids from potential contractors to develop coded solutions. In these contexts the use of copy detection software is rendered useless, as the solutions are custom written and there is usually no electronic source available against which similarities may be detected. This paper addresses the cheating problem of students purchasing solutions via websites that host software development marketplaces.

...read moreread less

Proceedings Article•DOI•

Automatic generation of benchmarks for plagiarism detection tools using grammatical evolution

[...]

Manuel Cebrian¹, Manuel Alfonseca¹, Alfonso Ortega¹•Institutions (1)

Autonomous University of Madrid¹

07 Jul 2007

TL;DR: In this paper, a Grammatical Evolution technique is used to generate benchmarks for plagiarism detection in computer programming assignments, where student mix and/or modify one or more original solutions to obtain counterfeits.

...read moreread less

Abstract: Student plagiarism is a mayor problem in universities worldwide. In this paper,we focus on plagiarism in answers to computer programming assignments,where student mix and/or modify one or more original solutions to obtain counterfeits. Although several software tools have been implemented to help the tedious and time consuming task of detecting plagiarism, little has been done to assess their quality, because, in fact, determining the original subset of the whole solutionset is practically impossible for graders. In this article we present a Grammatical Evolution technique which generates benchmarks. Given a programming language, our technique generates a set of original solutions to an assignment, together with a set of plagiarisms of the former set which mimic the way in which students act. The phylogeny of the coded solutions is predefined, providing a base for the evaluationof the performance of copy-catching tools. We give empirical evidence of the suitability of our approach by studying the behavior of one state-of-the-art detection tool (AC) on four benchmarks coded in APL2, generated with this technique.

...read moreread less

Proceedings Article•

PDE4Java: Plagiarism Detection Engine For Java, Source Code: A Clustering Approach.

[...]

Ameera Jadalla¹, Ashraf Elnagar¹•Institutions (1)

University of Sharjah¹

01 Jan 2007

TL;DR: The simulation results of the proposed Plagiarism Detection Engine for Java showed a comparable performance to that of JPlag and it outperformed the expectations when compared to the domain experts' findings.

...read moreread less

Abstract: The educational community across the world is facing the increasing problem of plagiarism. The proposed Plagiarism Detection Engine for Java (PDE4Java) detects code-plagiarism by applying data mining techniques. The engine consists of three main phases; Java tokenisation, similarity measurement and clustering. It has an optional default tokeniser that makes it flexible to be used with almost any programming language. The system provides a visualising representation for each cluster besides the textual representation. The simulation results of PDE4Java showed a comparable performance to that of JPlag and it outperformed the expectations when compared to the domain experts' findings.

...read moreread less

Proceedings Article•DOI•

XML version detection

[...]

Deise de Brum Saccol¹, Nina Edelweiss¹, Renata Galante¹, Carlo Zaniolo²•Institutions (2)

Universidade Federal do Rio Grande do Sul¹, University of California²

28 Aug 2007

TL;DR: This work proposes a version detection mechanism for XML documents based on Naïve Bayesian classifiers, and presents the results of various experiments on synthetic data that show that the approach produces very good results, both in terms of recall and precision measures.

...read moreread less

Abstract: The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-peer searching. A natural and commonly used approach to version detection relies on analyzing the similarity between files. Most of the techniques proposed so far rely on the use of hard thresholds for similarity measures. However, defining a threshold value is problematic for several reasons: in particular (i) the threshold value is not the same when considering different similarity functions, and (ii) it is not semantically meaningful for the user. To overcome this problem, our work proposes a version detection mechanism for XML documents based on Naive Bayesian classifiers. Thus, our approach turns the detection problem into a classification problem. In this paper, we present the results of various experiments on synthetic data that show that our approach produces very good results, both in terms of recall and precision measures.

...read moreread less

Proceedings Article•DOI•

Understanding the evolution process of program source for investigating software authorship and plagiarism

[...]

Jeong-Hoon Ji¹, Su-Hyun Park¹, Gyun Woo¹, Hwan-Gue Cho¹•Institutions (1)

Pusan National University¹

01 Oct 2007

TL;DR: Wang et al. as mentioned in this paper proposed an algorithm to construct the evolution tree (hylogenetic tree) for a set of similar program clones, which can be interchangeably applied for both these purposes in student assignment program domain.

...read moreread less

Abstract: This paper addresses the evolution process of program source codes to establish the framework for software authorship identification. Since program code cheating is getting serious in academic institutions, we will be able to apply the software authorship identification tool as a detection tool for code plagiarism. The main contribution of our work is twofold. First, we have devised new asymmetric distance measure to compute the distance of authorship between two different programs. Second, we have proposed an algorithm to construct the evolution tree(hylogenetic tree) for a set of similar program clones. For the experiment we have gathered two set of codes: a set of assignment programs and another program set which have been submitted to the ICPC, an international programming contests. Our experiment showed that our distance measure for program sources has successfully identified the code authorship and has also reliably detected plagiarized programs. This experiment has showed a strong possibility that the proposed construction algorithm for phylogenetic forest can be used to trace the evolution(improving) process of software. This paper shows the confidence of the authorship identification and plagiarism detection can be interchangeably applied for both these purposes in student assignment program domain.

...read moreread less

Posted Content•

Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution

[...]

Manuel Cebrian¹, Manuel Alfonseca¹, Alfonso Ortega¹•Institutions (1)

Autonomous University of Madrid¹

27 Mar 2007-arXiv: Neural and Evolutionary Computing

TL;DR: A Grammatical Evolution technique which generates benchmarks for plagiarism in answers to computer programming assignments, where student mix and/or modify one or more original solutions to obtain counterfeits.

...read moreread less

Plagiarism Detection Systems and International Students: Detecting plagiarism, copying or learning?

[...]

Lucas D. Introna, Niall Hayes

01 Jan 2007

TL;DR: It is argued that the inappropriate use of electronic plagiarism detection systems (such as Turnitin) could lead to the unfair and unjust construction of international students as plagiarists.

...read moreread less

Abstract: This paper explores the question of plagiarism by international students (non-native speakers). It argues that the inappropriate use of electronic plagiarism detection systems (such as Turnitin) could lead to the unfair and unjust construction of international students as plagiarists. We argue that the use of detection systems should take into account the writing practices used by those who write as novices in a non-native language as well as the way ‘plagiarism’ or plagiaristic forms of writing are valued in other cultures. It calls for a move away from a punitive legalistic approach to plagiarism that equates copying to plagiarism and move to a progressive and formative approach. If taken up such an approach will have very important implications for the way universities in the west deal with plagiarism in their learning and teaching practice as well as their disciplinary procedures.

...read moreread less

Posted Content•

Uncovering Plagiarism Networks

[...]

Manuel Freire¹, Manuel Cebrian, Emilio del Rosal•Institutions (1)

Autonomous University of Madrid¹

28 Mar 2007-arXiv: Information Theory

TL;DR: AC, a modular plagiarism detection system, is presented, which is portable across platforms and assignment formats and provides easy extraction into the internal assignment representation.

...read moreread less

Journal Article•DOI•

Computer-Assisted Assembly Language Programming Laboratory

[...]

Santiago Rodríguez¹, José Luis Pedraza¹, Antonio Prestes García¹, Francisco Rosales¹, Rafael Méndez¹ - Show less +1 more•Institutions (1)

Technical University of Madrid¹

01 Jul 2007-International Journal of Electrical Engineering Education

TL;DR: A new approach for managing laboratory work mini-projects that is used in the Computer Architecture Department of the Technical University of Madrid is presented, based on a chain of tools that a small number of teachers can use to efficiently manage a course with a large number of students.

...read moreread less

Abstract: This paper presents a new approach for managing laboratory work mini-projects that is used in the Computer Architecture Department of the Technical University of Madrid (UPM). The approach is based on a chain of tools (a Delivery Agent; an Automatic Project Evaluator; and a Plagiarism Detection Assistant) that a small number of teachers can use to efficiently manage a course with a large number of students (400 each year). Students use this tool chain to complete the Assembly Language Programming laboratory assignments using an MC88110 simulator built in our department. Jointly, these tools have demonstrated over the last decade what important benefits can be gained from the exploitation of a global laboratory work management system. Some of the benefits may have a continuation in an area of growing importance that we have not yet explored, such as distance learning environments for technical subjects.

...read moreread less