Showing papers on "Plagiarism detection published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

[...]

Zhan Su¹, Byung-Ryul Ahn¹, Ki-Yol Eom¹, Min-Koo Kang¹, Jin-Pyung Kim¹, Moon-Kyun Kim¹ - Show less +2 more•Institutions (1)

Sungkyunkwan University¹

18 Jun 2008

TL;DR: This paper investigates the use of a diagonal line, which is derived from Levenshtein distance, and simplified Smith-Waterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection.

...read moreread less

Abstract: Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified Smith-Waterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison.

...read moreread less

93 citations

Journal Article•DOI•

Detection of Plagiarism in Programming Assignments

[...]

F. Rosales¹, A. Garcia¹, Santiago Rodríguez¹, José Luis Pedraza¹, R. Mendez¹, M.M. Nieto¹ - Show less +2 more•Institutions (1)

Complutense University of Madrid¹

01 May 2008-IEEE Transactions on Education

TL;DR: The paper describes the plagiarism detection tool and the experience of using it over the last 12 years in four different programming assignments, from microprogramming a CPU to system programming in C.

...read moreread less

Abstract: Laboratory work assignments are very important for computer science learning. Over the last 12 years many students have been involved in solving such assignments in the authors' department, having reached a figure of more than 400 students doing the same assignment in the same year. This number of students has required teachers to pay special attention to conceivable plagiarism cases. A plagiarism detection tool has been developed as part of a full toolset for helping in the management of the laboratory work assignments. This tool defines and uses four similarity criteria to measure how similar two assignment implementations are. The paper describes the plagiarism detection tool and the experience of using it over the last 12 years in four different programming assignments, from microprogramming a CPU to system programming in C.

...read moreread less

87 citations

Book Chapter•DOI•

Multilingual Plagiarism Detection

[...]

Zdenek Ceska¹, Michal Toman¹, Karel Jezek¹•Institutions (1)

University of West Bohemia¹

04 Sep 2008

TL;DR: A new method called MLPlag is proposed for plagiarism detection in multilingual environment based on analysis of word positions which identifies the replacement of synonyms used by plagiarists to hide the document match.

...read moreread less

Abstract: Multilingual text processing has been gaining more and more attention in recent years. This trend has been accentuated by the global integration of European states and the vanishing cultural and social boundaries. Multilingual text processing has become an important field bringing a lot of new and interesting problems. This paper describes a novel approach to multilingual plagiarism detection. We propose a new method called MLPlag for plagiarism detection in multilingual environment. This method is based on analysis of word positions. It utilizes the EuroWordNet thesaurus which transforms words into language independent form. This allows to identify documents plagiarized from sources written in other languages. Special techniques, such as semantic-based word normalization, were incorporated to refine our method. It identifies the replacement of synonyms used by plagiarists to hide the document match. We performed and evaluated our experiments on monolingual and multilingual corpora and results are presented in this paper.

...read moreread less

60 citations

Journal Article•DOI•

Is There an Effective Approach to Deterring Students from Plagiarizing

[...]

Lidija Bilić-Zulle, Azman J, Vedran Frkovic, Mladen Petrovečki

01 Jan 2008-Science and Engineering Ethics

TL;DR: Use of plagiarism detection software in evaluation of essays and consequent penalties had effectively deterred students from plagiarizing.

...read moreread less

Abstract: The purpose of this study was to evaluate the effectiveness of plagiarism detection software and penalty for plagiarizing in detecting and deterring plagiarism among medical students. The study was a continuation of previously published research in which second-year medical students from 2001/2002 and 2002/2003 school years were required to write an essay based on one of the four scientific articles offered by the instructor. Students from 2004/2005 (N = 92) included in present study were given the same task. Topics of two of the four articles were considered less complex, and two were more complex. One less and one more complex articles were available only as hardcopies, whereas the other two were available in electronic format. The students from 2001/2002 (N = 111) were only told to write an original essay, whereas the students from 2002/2003 (N = 87) were additionally warned against plagiarism, explained what plagiarism was, and how to avoid it. The students from 2004/2005 were warned that their essays would be examined by plagiarism detection software and that those who had plagiarized would be penalized. Students from 2004/2005 plagiarized significantly less of their essays than students from the previous two groups (2% vs. 17% vs. 21%, respectively, P < 0.001). Over time, students more frequently choose articles with more complex subjects (P < 0.001) and articles in electronic format (P < 0.001) as a source for their essays, but it did not influence the rate of plagiarism. Use of plagiarism detection software in evaluation of essays and consequent penalties had effectively deterred students from plagiarizing.

...read moreread less

51 citations

Book Chapter•DOI•

Plagiarism Detection Based on Singular Value Decomposition

[...]

Zdenek Ceska¹•Institutions (1)

University of West Bohemia¹

25 Aug 2008

TL;DR: A new method solving associations of phrases contained in text documents, called SVDPlag, employs Singular Value Decomposition (SVD) for this purpose and significantly improves the accuracy of plagiarism detection and overcomes other methods.

...read moreread less

Abstract: Plagiarism is a widely spread problem that is the main focus of interest these days. In this paper, we propose a new method solving associations of phrases contained in text documents. This method, called SVDPlag, employs Singular Value Decomposition (SVD) for this purpose. Further, we discuss other approaches to plagiarism detection and compare them with our method. To examine the efficiency of plagiarism detection methods, we used an experimental corpus of 950 text documents about politics, which were created from the standard CTK corpus. The experiments indicate that our approach significantly improves the accuracy of plagiarism detection and overcomes other methods.

...read moreread less

41 citations

Journal Article•DOI•

PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach

[...]

Ameera Jadalla¹, Ashraf Elnagar¹•Institutions (1)

University of Sharjah¹

01 Sep 2008-International Journal of Business Intelligence and Data Mining

TL;DR: PDE4Java as mentioned in this paper is a plagiarism detection engine for Java, which consists of three main phases; Java tokenisation, similarity measurement and clustering, and it provides a visualising representation for each cluster besides the textual representation.

...read moreread less

Abstract: The educational community across the world is facing the increasing problem of plagiarism. The proposed Plagiarism Detection Engine for Java (PDE4Java) detects code-plagiarism by applying data mining techniques. The engine consists of three main phases; Java tokenisation, similarity measurement and clustering. It has an optional default tokeniser that makes it flexible to be used with almost any programming language. The system provides a visualising representation for each cluster besides the textual representation. The simulation results of PDE4Java showed a comparable performance to that of JPlag and it outperformed the expectations when compared to the domain experts' findings.

...read moreread less

41 citations

Journal Article•DOI•

A Cautionary Note on Checking Software Engineering Papers for Plagiarism

[...]

C. Kaner¹, R.L. Fiedler²•Institutions (2)

Florida Institute of Technology¹, Saint Mary-of-the-Woods College²

01 May 2008-IEEE Transactions on Education

TL;DR: Two leading plagiarism detection tools are contrasted, TurnItIn and MyDropBox, in detecting submissions that were obviously plagiarized from articles published in IEEE journals.

...read moreread less

Abstract: Several tools are marketed to the educational community for plagiarism detection and prevention. This article briefly contrasts the performance of two leading tools, TurnItIn and MyDropBox, in detecting submissions that were obviously plagiarized from articles published in IEEE journals. Both tools performed poorly because they do not compare submitted writings to publications in the IEEE database. Moreover, these tools do not cover the Association for Computing Machinery (ACM) database or several others important for scholarly work in software engineering. Reports from these tools suggesting that a submission has ldquopassedrdquo can encourage false confidence in the integrity of a submitted writing. Additionally, students can submit drafts to determine the extent to which these tools detect plagiarism in their work. Because the tool samples the engineering professional literature narrowly, the student who chooses to plagiarize can use this tool to determine what plagiarism will be invisible to the faculty member. An appearance of successful plagiarism prevention may in fact reflect better training of students to avoid plagiarism detection.

...read moreread less

34 citations

Proceedings Article•DOI•

Practical issues for academics using the Turnitin plagiarism detection software

[...]

Karl O. Jones¹•Institutions (1)

Liverpool John Moores University¹

12 Jun 2008

TL;DR: Some issues that might be raised in employing Turnitin are highlighted and some approaches that academics might utilise to allow efficient use of the system are suggested.

...read moreread less

Abstract: The Turnitin plagiarism detection system allows individual student assignments to be uploaded and matched for similarity with content on the web, all other assignments uploaded by institutions using the system and certain journals. An online report is produced for each submission identifying the sources of those similarities and the percentage match. There is a significant benefit in using Turnitin to identify possible cases of plagiarism. This paper highlights some issues that might be raised in employing Turnitin and suggests some approaches that academics might utilise to allow efficient use of Turnitin.

...read moreread less

29 citations

Journal Article•DOI•

On Students' Strategy-Preferences for Managing Difficult Course Work

[...]

Hua-Li Jian¹, Frode Eika Sandnes², Yo-Ping Huang³, Li Cai, Kris M. Y. Law⁴ - Show less +1 more•Institutions (4)

National Cheng Kung University¹, Oslo and Akershus University College of Applied Sciences², Tatung University³, City University of Hong Kong⁴

01 May 2008-IEEE Transactions on Education

TL;DR: This study investigates three aspects of academic dishonesty, identifies students' preferred strategies for managing perceptually too difficult course work, and measures students' preferences for choosing side in ethical conflicts.

...read moreread less

Abstract: Course work plagiarism among university students is often attributed to ignorance about plagiarism or an assignment's level of difficulty. Students submit other people's work when they are unable to solve an assignment themselves. This study, based on 233 student responses from four cultural regions, investigates three aspects of academic dishonesty. First, the study identifies students' preferred strategies for managing perceptually too difficult course work. Second, students' preferences for responding to help from fellow students are investigated. Finally, the study measures students' preferences for choosing side in ethical conflicts. Seven strategies for managing difficult course work, six strategies for responding to requests for help, and five key parties in ethical conflicts are studied using a pair-wise comparison method. The results show that students prefer to collaborate and use the Internet. The impact of the teacher is smaller than expected. Factors including cultural origin, gender, level of study, and field of study have limited impact.

...read moreread less

27 citations

Book•

Enhancing Computer-Aided Plagiarism Detection

[...]

Maxim Mozgovoy

05 Nov 2008

TL;DR: This work is dedicated to the development and the use of software instruments that help to reveal plagiarism, and building the taxonomy of existing plagiarism detection methods according to their speed and reliability characteristics.

...read moreread less

Abstract: This work is dedicated to the problem of computer- aided plagiarism detection, i.e. to the development and the use of software instruments that help to reveal plagiarism. The creation of such tools raises specific algorithmic problems that deserve attention. The results covered in this work, include: (a) Building the taxonomy of existing plagiarism detection methods according to their speed and reliability characteristics. (b) Studying and improving string matching algorithms used in plagiarism detection. Introducing "tokenizers" for the natural language texts, applying natural language parsers for plagiarism detection in order to enhance the quality of the detectors. (c) Optimizing the speed performance of string matching based plagiarism detection algorithms by applying a combined fast and reliable scoring scheme. Developing an efficient parameterized matching procedure. (d) Developing a fast string matching based plagiarism detection algorithm.

...read moreread less

26 citations

Proceedings Article•DOI•

A Plagiarism Detection Technique for Java Program Using Bytecode Analysis

[...]

Jeong-Hoon Ji¹, Gyun Woo¹, Hwan-Gue Cho¹•Institutions (1)

Pusan National University¹

11 Nov 2008

TL;DR: In this paper, a plagiarism detection technique for Java programs using bytecodes without referring their source codes is proposed, which can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.

...read moreread less

Abstract: Most plagiarism detection systems evaluate the similarity of source codes and detect plagiarized program pairs. If we use the source codes in plagiarism detection, the source code security can be a significant problem. Plagiarism detection based on target code can be used for protecting the security of source codes. In this paper, we propose anew plagiarism detection technique for Java programs using bytecodes without referring their source codes. The plagiarism detection procedure using bytecode consists of two major steps. First, we generate the token sequences from the Java class file by analyzing the code area of methods. Then, we evaluate the similarity between token sequences using the adaptive local alignment. According to the experimental results, we can find the distributions of similarities of the source codes and that of bytecodes are very similar. Also, the correlation between the similarities of source code pairs and those of bytecode pairs is high enough for typical test data. The plagiarism detection system using bytecode can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.

...read moreread less

Journal Article•DOI•

Student and staff perceptions of the effectiveness of plagiarism detection software

[...]

Doug Atkinson¹, Sue Yeoh¹•Institutions (1)

Curtin University¹

22 Feb 2008-Australasian Journal of Educational Technology

TL;DR: Students perceived that plagiarism is an important issue; detection software makes it easier for lecturers; it is fair to use detection software; students support its use; and it will have some effect in preventing plagiarism, but students' concerns included being caught for unintentional plagiarism.

...read moreread less

Abstract: The aim of this research was to determine student and staff perceptions of the effectiveness of plagiarism detection software. A mixed methods approach was undertaken, using a research model adapted from the literature. Eight hours of interviews were conducted with six students and six teaching staff from Curtin Business School at Curtin University of Technology, which had trialled the plagiarism detection software, EVE2 . A survey questionnaire was completed by 171 students involved in the trial. The summary indication was that students perceived that plagiarism is an important issue; detection software makes it easier for lecturers; it is fair to use detection software; students support its use; and it will have some effect in preventing plagiarism. However, students' concerns included being caught for unintentional plagiarism, teaching staff placing too much emphasis on detection results above student ability, and the accuracy of the software at detecting plagiarism. Concerns for teaching staff included the time taken for the detection process, limitation of the software to publicly based Internet sources and direct copying, and the extra workload involved with pursuing academic misconduct.

...read moreread less

Proceedings Article•DOI•

Visualizing program similarity in the Ac plagiarism detection system

[...]

Manuel Freire¹•Institutions (1)

Autonomous University of Madrid¹

28 May 2008

TL;DR: Although Ac's visualizations were developed with plagiarism detection in mind, they should also prove effective to visualize distance matrices from other domains, as demonstrated by preliminary experiments.

...read moreread less

Abstract: Programming assignments are easy to plagiarize in such a way as to foil casual reading by graders. Graders can resort to automatic plagiarism detection systems, which can generate a "distance" matrix that covers all possible pairings. Most plagiarism detection programs then present this information as a simple ranked list, losing valuable information in the process.The Ac system uses the whole distance matrix to provide graders with multiple linked visualizations. The graph representation can be used to explore clusters of highly related submissions at different filtering levels. The histogram representation presents compact "individual" histograms for each submission, complementing the graph representation in aiding graders during analysis.Although Ac's visualizations were developed with plagiarism detection in mind, they should also prove effective to visualize distance matrices from other domains, as demonstrated by preliminary experiments.

...read moreread less

Proceedings Article•DOI•

Detecting and tracing plagiarized documents by reconstruction plagiarism-evolution tree

[...]

Chang-Keon Ryu¹, Hyong-Jun Kim¹, Seung-Hyun Ji¹, Gyun Woo¹, Hwan-Gue Cho¹ - Show less +1 more•Institutions (1)

Pusan National University¹

08 Jul 2008

TL;DR: A new approach to reconstruct the evolution process of suspected texts in order to detect plagiarized documents is proposed by adopting the Weibull distribution, which is one of extreme distribution used to compute the statistical significance of genomic sequence matching.

...read moreread less

Abstract: Due to smart word processors and powerful Web-searching engines, lots of plagiarism prevail, especially in digital texts. So it is very crucial to develop efficient and effective anti-plagiarism tools to prevent or identify document plagiarism. Till now, a few plagiarism detecting systems have been announced. All previous plagiarism detection studies focus on how to measure the similarity of documents. In this paper, we propose a new approach to reconstruct the evolution process of suspected texts in order to detect plagiarized documents. For this, we propose two major metrics: spatial plagiarism similarity and temporal plagiarism similarity. And by combining these two similarity measure, we give conclusively the evolutionary plagiarism probability model by adopting the Weibull distribution, which is one of extreme distribution used to compute the statistical significance of genomic sequence matching. The main difference of our model to the previous studies is that our model can estimate the plagiarism and its direction as a temporal event. An experiment with a group Internet-posted news clearly coincided to the real plagiarism among those news.

...read moreread less

Plagiarism Detection Software

[...]

Marlin Thomas¹•Institutions (1)

Iona College¹

17 Nov 2008

TL;DR: Faculty use of plagiarism detection software should be reevaluated because of issues related to its efficacy and because of ethical and legal concerns.

...read moreread less

Abstract: Plagiarism is a pervasive form of academic dishonesty in collegiate settings. Since it distorts learning and assessment, deterring and detecting it are crucial to maintaining academic integrity. Large class sizes and an increase in writing assignments that result from writing across the curriculum combine to make detection of plagiarism burdensome. The concomitant rapid increase of written material on the Internet and its ease of appropriation contribute to the problem. Plagiarism detection software has emerged in response. The most prominent implementation compares submissions against items in a database and then adds them to that database. It outputs measures of possible plagiarism. Faculty use of the detection software should be reevaluated because of issues related to its efficacy and because of ethical and legal concerns.

...read moreread less

Plagiarism detection in arabic scripts using fuzzy information retreival

[...]

Salha Alzahrani, Naomie Salim

01 Jan 2008

TL;DR: This paper presents a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model, and shows that fuzzyset IR successfully detected not only exact but also similar statements that have different structure.

...read moreread less

Abstract: The nature of Arabic language structure exposes the need for fuzzy or vague concept to reveal dishonest practices in Arabic documents. In this paper, we present a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model. The degree of similarity is calculated and compared to a threshold value to judge whether two statements are the same or different. Our corpus collection has been built in which all stopwords were removed and non-stop words were stemmed for typical Arabic IR. The corpora have 100 documents with 4367 statements in total. Five query documents with about 250 plagiarized statements were constructed and tested. Experimental results show that fuzzyset IR successfully detected not only exact but also similar statements that have different structure. However, our Arabic fuzzy-set model approach does not handle the case of rewording with different synonyms/antonyms, a deficiency that will lead to future work of modeling the system using Arabic thesaurus. Keywordsfuzzy-set information retrieval; Arabic; plagiarism detection;

...read moreread less

Proceedings Article•DOI•

[...]

Vic Ciesielski¹, Nelson Wu¹, S. M. M. Tahaghoghi¹•Institutions (1)

RMIT University¹

12 Jul 2008

TL;DR: It is found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files.

...read moreread less

Abstract: Detecting whether computer program code is a student's original work or has been copied from another student or some other source is a major problem for many universities. Detection methods based on the information retrieval concepts of indexing and similarity matching scale well to large collections of files, but require appropriate similarity functions for good performance. We have used particle swarm optimization and genetic programming to evolve similarity functions that are suited to computer program code. Using a training set of plagiarised and non-plagiarised programs we have evolved better parameter values for the previously published Okapi BM25 similarity function. We have then used genetic programming to evolve completely new similarity functions that do not conform to any predetermined structure. We found that the evolved similarity functions outperformed the human developed Okapi BM25 function. We also found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files. The evolutionary computing techniques have been extremely useful in finding similarity functions that advance the state of the art in code plagiarism detection.

...read moreread less

Book Chapter•DOI•

International Students and Plagiarism Detection Systems: Detecting Plagiarism, Copying or Learning?

[...]

Lucas D. Introna¹, Niall Hayes¹•Institutions (1)

Lancaster University¹

01 Jan 2008

TL;DR: It is argued that the inappropriate use of electronic plagiarism detection systems (such as Turnitin) could lead to the unfair and unjust construction of international students as plagiarists.

...read moreread less

Abstract: This chapter explores the question of plagiarism by international students (non-native speakers). It argues that the inappropriate use of electronic plagiarism detection systems (such as Turnitin) could lead to the unfair and unjust construction of international students as plagiarists. We argue that the use of detection systems should take into account the writing practices used by those who write as novices in a non-native language as well as the way “plagiarism” or plagiaristic forms of writing are valued in other cultures. It calls for a move away from a punitive legalistic approach to plagiarism that equates copying to plagiarism and move to a progressive and formative approach. If taken up, such an approach will have very important implications for the way universities in the west deal with plagiarism in their learning and teaching practice as well as their disciplinary procedures.

...read moreread less

Rethinking Plagiarism for Technical Communication

[...]

Jessica Reyman

01 Jan 2008

TL;DR: The issue of plagiarism is particularly contentious for technical and professional writers, as opposed to academic writers, because of the types of writing activities we regularly engage in this article, and many students of professional writing fear that they may be "stealing" or committing intellectual "theft" whenever they make use of any existing material in their writing.

...read moreread less

Abstract: INTRODUCTION Cases of plagiarism among professional writers have gained increasing media attention in recent years. As a result, many students of professional writing fear that they may be “stealing,” or committing intellectual “theft,” whenever they make use of any existing material in their writing. They have been warned against such uses by several sources. Instructors and university administrators tell them that they must follow plagiarism policies or risk earning failing grades or being expelled from the university. In the news they see their peers venture into the professional world and face public criticism and termination of contracts for acts of plagiarism. In addition, attention given to Turnitin.com and other “plagiarism detection technologies” has created a culture of fear among student writers who understand that such technologies may be used for policing their writing practices. (For more on Turntin.com, see http://www.plagiarism.org/.) These stories and others have infiltrated conversations on many college campuses, warning student writers against copying with a seemingly simple message: “don’t steal.” However, as industry professionals in technical communication are well aware, the message is not that simple in our field. The issue of plagiarism is particularly contentious for technical and professional writers, as opposed to academic writers, because of the types of writing activities we regularly engage in. Technical communicators commonly perform a variety of types of composing activities that could be considered plagiarism in the context of the classroom. Such activities include:

...read moreread less

Deterring digital plagiarism, how effective is the digital detection process?

[...]

Jayati Chaudhuri

25 Mar 2008

TL;DR: In this article, a special emphasis is given to text-matching software called SafeAssignmentTM, which discusses and analyzed the advantages and disadvantages of using automated text matching software's.

...read moreread less

Abstract: Academic dishonesty or plagiarism is a growing problem in today's digital world. Use of plagiarism detection tools can assist faculty to combat this form of academic dishonesty. In this article, a special emphasis is given to text-matching software called SafeAssignmentTM. The advantages and disadvantages of using automated text matching software's are discussed and analyzed in detail. The advantages and disadvantages of using automated text matching software's are discussed and analyzed in detail.

...read moreread less

Journal Article•DOI•

A Consideration of the Use of Plagiarism Tools for Automated Student Assessment

[...]

Olaf Hallan Graven¹, Lachlan MacKinnon¹•Institutions (1)

Buskerud University College¹

01 May 2008-IEEE Transactions on Education

TL;DR: The authors evaluate the flexibility and richness of two well-established text analysis plagiarism tools, through a consideration of the use of plagiarism detection software as a mechanism for the automated assessment of student-created narrative in a virtual learning environment (VLE).

...read moreread less

Abstract: In this paper, the authors evaluate the flexibility and richness of two well-established text analysis plagiarism tools, through a consideration of the use of plagiarism detection software as a mechanism for the automated assessment of student-created narrative in a virtual learning environment (VLE). The authors are currently engaged in a project creating a prototype VLE, using technologies for multilevel and multiplayer games, based on the inherent support such an environment would provide for constructivist learning, engagement, and contextual socialization. Progress between levels in the VLE will be based on the creation, by the student, of a narrative linking together a number of conceptual elements obtained through game-play at that level. Support for the narrative creation process will help the student to contextualize the conceptual elements, providing the necessary linking elements or themes to enable the student to produce a coherent description of their understanding of the concepts. A particular challenge in such environments is the need for fast, real-time feedback to students to maintain the level of engagement and to support the game-play metaphor. Additionally, the student must be able to make as many attempts to progress as they need and it will be their decision when and how often to submit for assessment. Since the student narrative will be in a textual form and can therefore be related to a sample solution narrative, generated by the author of the level within the learning environment, the idea of using plagiarism detection software as the mechanism for automated comparison and assessment was considered appropriate for investigation. While the limitation of such tools would appear to be that they are seeking direct copies of text elements, the authors wanted to investigate whether they offered sufficient richness and fuzziness to detect common conceptually-linked texts. The initial decision was to experiment with text-analytic tools, since they are both widely used and readily available. The tools chosen were TurnItIn, a commercial tool provided to the U.K. higher education community by the U.K. Joint Information Systems Committee (JISC), and VALT/VAST, a set of tools created at the Centre for Interactive Systems Engineering at London South Bank University, London, U.K., the workings of which are based on recognized and well-published research. An experiment using a small group of students in a traditional assessment situation was carried out, and is described in detail. The rationale for this approach was that there is not yet a fully working prototype of the VLE in which to carry out such an experiment, but that the conditions necessary to test the hypothesis that plagiarism tools could be utilized for such a purpose could be replicated sufficiently to make such an experiment viable. The results of the experiment demonstrated neither a correlation between the sample solution and student solutions, nor any correlation between the individual student solutions, proving the null hypothesis. This result demonstrates that these tools are not useful for the development of automated assessment within the VLE, and the authors are now giving consideration to the use of lexical analysis/tokenizer and other tools. However, it also suggests that these text-analytic plagiarism tools are too firmly focused on direct copy, which does raise the question of whether or not they offer enough richness and fuzziness to detect a sophisticated plagiarism attempt using, for example, text replacement tools. An ongoing close relationship between research in automated assessment and plagiarism detection is also proposed, to achieve mutual benefit.

...read moreread less

Proceedings Article•DOI•

A Platform Framework for Cross-Lingual Text Relatedness Evaluation and Plagiarism Detection

[...]

Chung-Hong Lee, Chih-Hong Wu, Hsin-Chang Yang

18 Jun 2008

TL;DR: A system platform to evaluating text similarity and relatedness in multilingual text collections for plagiarism detection and preliminary results show that the platform framework has the potential for cross-lingual text relatedness evaluation and plagiarism Detection.

...read moreread less

Abstract: Research work related to plagiarism detection methods in dealing with monolingual texts (e.g. English texts) have been well established in recent years. However, little attention has been paid to facilitate plagiarism detection in cross-lingual text collections (e.g. English and Chinese texts). In this paper we present a system platform to evaluating text similarity and relatedness in multilingual text collections for plagiarism detection. First, we utilized a number of selected texts in Chinese-English parallel corpora collected from Internet to train text classifiers based on the Support Vector Machines (SVM) model. As such, the multilingual texts of unknown category can be classified by the trained classifiers. Subsequently, the resulting categorized texts were measured by means of a language-neutral clustering technique based on Self-Organizing Maps (SOM) method for evaluating semantic similarity among texts. The preliminary results show that our platform framework has the potential for cross-lingual text relatedness evaluation and plagiarism detection.

...read moreread less

Proceedings Article•DOI•

Plagiarism detection in Chinese based on chunk and paragraph weight

[...]

Tao Wang¹, Xiao-Zhong Fan¹, Jie Liu¹•Institutions (1)

Beijing Institute of Technology¹

12 Jul 2008

TL;DR: Aiming at the Chinese academic paper plagiarism detection, proposed chunk based plagiarism Detection algorithm with chunk extraction method based on character or word and proposed two paragraph weight algorithms and defined three paragraph weight functions.

...read moreread less

Abstract: Aiming at the Chinese academic paper plagiarism detection, proposed chunk based plagiarism detection algorithm with chunk extraction method based on character or word. Taken account of that different part of document has different importance, proposed two paragraph weight algorithms and defined three paragraph weight functions. The best chunk lengths are determined by experiments. Experiments show that using paragraph weight can enhance the detection effect.

...read moreread less

Journal Article•DOI•

A Technological Tool to Detect Plagiarized Projects in Microsoft Access

[...]

J.A. McCart¹, J. Jarman¹•Institutions (1)

University of South Florida¹

01 May 2008-IEEE Transactions on Education

TL;DR: Combining technology and policy can be effective in curtailing blatant plagiarism within large technology courses, and a significant decrease in the number of projects being duplicated is demonstrated.

...read moreread less

Abstract: Over one in ten students surveyed have admitted to copying programs in courses with computer assignments. The ease with which digital coursework can be copied and the impracticality of manually checking for plagiarized projects in large courses has only compounded the problem. As current research has focused predominantly on detecting plagiarism for textual applications such as source code and documents, there exists a gap in detecting plagiarism in graphically-driven applications. This paper focuses on the effectiveness of a technological tool in detecting plagiarized projects in a course using Microsoft Access. Seven semesters of data were collected from a large technology-oriented course in which the tool had been in use. Comparing semesters before and after the technological tool was introduced demonstrates a significant decrease in the number of projects being duplicated. The results indicate combining technology and policy can be effective in curtailing blatant plagiarism within large technology courses.

...read moreread less

Proceedings Article•DOI•

Algorithm of the longest commonly consecutive word for Plagiarism detection in text based document

[...]

A. Sediyono¹, Ku Ruhana Ku-Mahamud²•Institutions (2)

Trisakti University¹, Applied Science Private University²

01 Nov 2008

TL;DR: A numerical based comparison algorithm is proposed that is comparable in the computation time without loosing the word order of common parts in full text document plagiarism.

...read moreread less

Abstract: Plagiarism is a form of academic misconduct which has increased with the easy access to obtain information through electronic documents and the Internet. The problem of finding document plagiarism in full text document can be viewed as a problem of finding the longest common parts of strings. Moreover, the detection system has to be capable to determine and visualize not only the common parts but also the location of the common parts in both the source and the observed document. Unlike previous research, this paper proposes a numerical based comparison algorithm that is comparable in the computation time without loosing the word order of common parts. Based on the experiment, the proposed algorithm outperforms the suffix tree in the length of observed paragraph below one hundred words.

...read moreread less

Journal Article•

Generating Pylogenetic Tree of Homogeneous Source Code in a Plagiarism Detection System

[...]

Jeong-Hoon Ji, Su-Hyun Park, Gyun Woo, Hwan-Gue Cho

01 Dec 2008-International Journal of Control Automation and Systems

TL;DR: The proposed phylogeny construction algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably and is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.

...read moreread less

Abstract: Program plagiarism is widespread due to intelligent software and the global Internet environment. Consequently the detection of plagiarized source code and software is becoming important especially in academic field. Though numerous studies have been reported for detecting plagiarized pairs of codes, we cannot find any profound work on understanding the underlying mechanisms of plagiarism. In this paper, we study the evolutionary process of source codes regarding that the plagiarism procedure can be considered as evolutionary steps of source codes. The final goal of our paper is to reconstruct a tree depicting the evolution process in the source code. To this end, we extend the well-known bioinformatics approach, a local alignment approach, to detect a region of similar code with an adaptive scoring matrix. The asymmetric code similarity based on the local alignment can be considered as one of the main contribution of this paper. The phylogenetic tree or evolution tree of source codes can be reconstructed using this asymmetric measure. To show the effectiveness and efficiency of the phylogeny construction algorithm, we conducted experiments with more than 100 real source codes which were obtained from East-Asia ICPC (International Collegiate Programming Contest). Our experiments showed that the proposed algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably. Also, the phylogeny construction algorithm is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.

...read moreread less

Journal Article•

Plagiarism Detection: The Tool And The Case Study.

[...]

Vladislav Scherbinin, Sergey Butakov

01 Jan 2008-E-learning

TL;DR: This paper presents preliminary results on algorithm implementation for processing of 1000+ submissions archive and discusses problems in the implementation of existing anti-plagiarism systems, and describes the open architecture that could be used for plagiarism detection in different kind of assignments.

...read moreread less

Abstract: Plagiarism is a problem in many education institutions around the world. Preventing digital plagiarism requires enormous amount of work from educator. In this paper we concentrate on implementation of well known anti-plagiarism algorithm for local and global search for the original source of plagiarized assignment. We first discuss problems in the implementation of existing anti-plagiarism systems, and then describe the open architecture that could be used for plagiarism detection in different kind of assignments from plain text to audio submissions. Finally we present preliminary results on algorithm implementation for processing of 1000+ submissions archive. We hope this paper will add a trend to the discussion of anti-plagiarism systems especially for new types of assignments.

...read moreread less

Plagiarism Detection through Vector Space Models Applied to a Digital Library.

[...]

Radim Rehurek

01 Jan 2008

TL;DR: This paper evaluates an Information Retrieval approach of dealing with plagiarism through Vector Spaces that will allow us to detect similarities that are not result of naive copy\&paste.

...read moreread less

Journal Article•

A Plagiarism Detection Technique for Java Program Using Bytecode Analysis

[...]

Jeong-Hoon Ji, Gyun Woo, Hwan-Gue Cho

01 Jan 2008-Journal of KIISE:Software and Applications

TL;DR: This paper proposes anew plagiarism detection technique for Java programs using bytecodes without referring their source codes, which can find the distributions of similarities of the source codes and that of bytecodes are very similar.

...read moreread less

Abstract: Most plagiarism detection systems evaluate the similarity of source codes and detect plagiarized program pairs. If we use the source codes in plagiarism detection, the source code security can be a significant problem. Plagiarism detection based on target code can be used for protecting the security of source codes. In this paper, we propose a new plagiarism detection technique for Java programs using bytecodes without referring their source codes. The plagiarism detection procedure using bytecode consists of two major steps. First, we generate the token sequences from the Java class file by analyzing the code area of methods. Then, we evaluate the similarity between token sequences using the adaptive local alignment. According to the experimental results, we can find the distributions of similarities of the source codes and that of bytecodes are very similar. Also, the correlation between the similarities of source code pairs and those of bytecode pairs is high enough for typical test data. The plagiarism detection system using bytecode can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.

...read moreread less

Journal Article•DOI•

Plagiarism detection service to be launched in June.

[...]

Caroline White

10 Apr 2008-BMJ

TL;DR: CrossCheck, a plagiarism detection service, is to be offered by the independent publishers’ membership association CrossRef, which functions as a sort of digital switchboard for articles from several hundred scholarly and professional publishers.

...read moreread less

Abstract: Editors of scientific journals will soon have a new weapon at their disposal in the fight against research misconduct, delegates at the annual meeting of the Committee on Publication Ethics, held in London last week, were told. Scheduled for launch in June, researchers and editorial staff will be able to access CrossCheck, a plagiarism detection service, offered by the independent publishers’ membership association CrossRef. CrossRef is a collaborative reference linking service that functions as a sort of digital switchboard for articles from several hundred scholarly and professional publishers. Each item has …

...read moreread less