scispace - formally typeset
Search or ask a question

Showing papers on "Plagiarism detection published in 2008"


Proceedings ArticleDOI
18 Jun 2008
TL;DR: This paper investigates the use of a diagonal line, which is derived from Levenshtein distance, and simplified Smith-Waterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection.
Abstract: Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified Smith-Waterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison.

93 citations


Journal ArticleDOI
TL;DR: The paper describes the plagiarism detection tool and the experience of using it over the last 12 years in four different programming assignments, from microprogramming a CPU to system programming in C.
Abstract: Laboratory work assignments are very important for computer science learning. Over the last 12 years many students have been involved in solving such assignments in the authors' department, having reached a figure of more than 400 students doing the same assignment in the same year. This number of students has required teachers to pay special attention to conceivable plagiarism cases. A plagiarism detection tool has been developed as part of a full toolset for helping in the management of the laboratory work assignments. This tool defines and uses four similarity criteria to measure how similar two assignment implementations are. The paper describes the plagiarism detection tool and the experience of using it over the last 12 years in four different programming assignments, from microprogramming a CPU to system programming in C.

87 citations


Book ChapterDOI
04 Sep 2008
TL;DR: A new method called MLPlag is proposed for plagiarism detection in multilingual environment based on analysis of word positions which identifies the replacement of synonyms used by plagiarists to hide the document match.
Abstract: Multilingual text processing has been gaining more and more attention in recent years. This trend has been accentuated by the global integration of European states and the vanishing cultural and social boundaries. Multilingual text processing has become an important field bringing a lot of new and interesting problems. This paper describes a novel approach to multilingual plagiarism detection. We propose a new method called MLPlag for plagiarism detection in multilingual environment. This method is based on analysis of word positions. It utilizes the EuroWordNet thesaurus which transforms words into language independent form. This allows to identify documents plagiarized from sources written in other languages. Special techniques, such as semantic-based word normalization, were incorporated to refine our method. It identifies the replacement of synonyms used by plagiarists to hide the document match. We performed and evaluated our experiments on monolingual and multilingual corpora and results are presented in this paper.

60 citations


Journal ArticleDOI
TL;DR: Use of plagiarism detection software in evaluation of essays and consequent penalties had effectively deterred students from plagiarizing.
Abstract: The purpose of this study was to evaluate the effectiveness of plagiarism detection software and penalty for plagiarizing in detecting and deterring plagiarism among medical students. The study was a continuation of previously published research in which second-year medical students from 2001/2002 and 2002/2003 school years were required to write an essay based on one of the four scientific articles offered by the instructor. Students from 2004/2005 (N = 92) included in present study were given the same task. Topics of two of the four articles were considered less complex, and two were more complex. One less and one more complex articles were available only as hardcopies, whereas the other two were available in electronic format. The students from 2001/2002 (N = 111) were only told to write an original essay, whereas the students from 2002/2003 (N = 87) were additionally warned against plagiarism, explained what plagiarism was, and how to avoid it. The students from 2004/2005 were warned that their essays would be examined by plagiarism detection software and that those who had plagiarized would be penalized. Students from 2004/2005 plagiarized significantly less of their essays than students from the previous two groups (2% vs. 17% vs. 21%, respectively, P < 0.001). Over time, students more frequently choose articles with more complex subjects (P < 0.001) and articles in electronic format (P < 0.001) as a source for their essays, but it did not influence the rate of plagiarism. Use of plagiarism detection software in evaluation of essays and consequent penalties had effectively deterred students from plagiarizing.

51 citations


Book ChapterDOI
25 Aug 2008
TL;DR: A new method solving associations of phrases contained in text documents, called SVDPlag, employs Singular Value Decomposition (SVD) for this purpose and significantly improves the accuracy of plagiarism detection and overcomes other methods.
Abstract: Plagiarism is a widely spread problem that is the main focus of interest these days. In this paper, we propose a new method solving associations of phrases contained in text documents. This method, called SVDPlag, employs Singular Value Decomposition (SVD) for this purpose. Further, we discuss other approaches to plagiarism detection and compare them with our method. To examine the efficiency of plagiarism detection methods, we used an experimental corpus of 950 text documents about politics, which were created from the standard CTK corpus. The experiments indicate that our approach significantly improves the accuracy of plagiarism detection and overcomes other methods.

41 citations


Journal ArticleDOI
TL;DR: PDE4Java as mentioned in this paper is a plagiarism detection engine for Java, which consists of three main phases; Java tokenisation, similarity measurement and clustering, and it provides a visualising representation for each cluster besides the textual representation.
Abstract: The educational community across the world is facing the increasing problem of plagiarism. The proposed Plagiarism Detection Engine for Java (PDE4Java) detects code-plagiarism by applying data mining techniques. The engine consists of three main phases; Java tokenisation, similarity measurement and clustering. It has an optional default tokeniser that makes it flexible to be used with almost any programming language. The system provides a visualising representation for each cluster besides the textual representation. The simulation results of PDE4Java showed a comparable performance to that of JPlag and it outperformed the expectations when compared to the domain experts' findings.

41 citations


Journal ArticleDOI
TL;DR: Two leading plagiarism detection tools are contrasted, TurnItIn and MyDropBox, in detecting submissions that were obviously plagiarized from articles published in IEEE journals.
Abstract: Several tools are marketed to the educational community for plagiarism detection and prevention. This article briefly contrasts the performance of two leading tools, TurnItIn and MyDropBox, in detecting submissions that were obviously plagiarized from articles published in IEEE journals. Both tools performed poorly because they do not compare submitted writings to publications in the IEEE database. Moreover, these tools do not cover the Association for Computing Machinery (ACM) database or several others important for scholarly work in software engineering. Reports from these tools suggesting that a submission has ldquopassedrdquo can encourage false confidence in the integrity of a submitted writing. Additionally, students can submit drafts to determine the extent to which these tools detect plagiarism in their work. Because the tool samples the engineering professional literature narrowly, the student who chooses to plagiarize can use this tool to determine what plagiarism will be invisible to the faculty member. An appearance of successful plagiarism prevention may in fact reflect better training of students to avoid plagiarism detection.

34 citations


Proceedings ArticleDOI
12 Jun 2008
TL;DR: Some issues that might be raised in employing Turnitin are highlighted and some approaches that academics might utilise to allow efficient use of the system are suggested.
Abstract: The Turnitin plagiarism detection system allows individual student assignments to be uploaded and matched for similarity with content on the web, all other assignments uploaded by institutions using the system and certain journals. An online report is produced for each submission identifying the sources of those similarities and the percentage match. There is a significant benefit in using Turnitin to identify possible cases of plagiarism. This paper highlights some issues that might be raised in employing Turnitin and suggests some approaches that academics might utilise to allow efficient use of Turnitin.

29 citations


Journal ArticleDOI
TL;DR: This study investigates three aspects of academic dishonesty, identifies students' preferred strategies for managing perceptually too difficult course work, and measures students' preferences for choosing side in ethical conflicts.
Abstract: Course work plagiarism among university students is often attributed to ignorance about plagiarism or an assignment's level of difficulty. Students submit other people's work when they are unable to solve an assignment themselves. This study, based on 233 student responses from four cultural regions, investigates three aspects of academic dishonesty. First, the study identifies students' preferred strategies for managing perceptually too difficult course work. Second, students' preferences for responding to help from fellow students are investigated. Finally, the study measures students' preferences for choosing side in ethical conflicts. Seven strategies for managing difficult course work, six strategies for responding to requests for help, and five key parties in ethical conflicts are studied using a pair-wise comparison method. The results show that students prefer to collaborate and use the Internet. The impact of the teacher is smaller than expected. Factors including cultural origin, gender, level of study, and field of study have limited impact.

27 citations


Book
05 Nov 2008
TL;DR: This work is dedicated to the development and the use of software instruments that help to reveal plagiarism, and building the taxonomy of existing plagiarism detection methods according to their speed and reliability characteristics.
Abstract: This work is dedicated to the problem of computer- aided plagiarism detection, i.e. to the development and the use of software instruments that help to reveal plagiarism. The creation of such tools raises specific algorithmic problems that deserve attention. The results covered in this work, include: (a) Building the taxonomy of existing plagiarism detection methods according to their speed and reliability characteristics. (b) Studying and improving string matching algorithms used in plagiarism detection. Introducing "tokenizers" for the natural language texts, applying natural language parsers for plagiarism detection in order to enhance the quality of the detectors. (c) Optimizing the speed performance of string matching based plagiarism detection algorithms by applying a combined fast and reliable scoring scheme. Developing an efficient parameterized matching procedure. (d) Developing a fast string matching based plagiarism detection algorithm.

26 citations


Proceedings ArticleDOI
11 Nov 2008
TL;DR: In this paper, a plagiarism detection technique for Java programs using bytecodes without referring their source codes is proposed, which can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.
Abstract: Most plagiarism detection systems evaluate the similarity of source codes and detect plagiarized program pairs. If we use the source codes in plagiarism detection, the source code security can be a significant problem. Plagiarism detection based on target code can be used for protecting the security of source codes. In this paper, we propose anew plagiarism detection technique for Java programs using bytecodes without referring their source codes. The plagiarism detection procedure using bytecode consists of two major steps. First, we generate the token sequences from the Java class file by analyzing the code area of methods. Then, we evaluate the similarity between token sequences using the adaptive local alignment. According to the experimental results, we can find the distributions of similarities of the source codes and that of bytecodes are very similar. Also, the correlation between the similarities of source code pairs and those of bytecode pairs is high enough for typical test data. The plagiarism detection system using bytecode can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.

Journal ArticleDOI
TL;DR: Students perceived that plagiarism is an important issue; detection software makes it easier for lecturers; it is fair to use detection software; students support its use; and it will have some effect in preventing plagiarism, but students' concerns included being caught for unintentional plagiarism.
Abstract: The aim of this research was to determine student and staff perceptions of the effectiveness of plagiarism detection software. A mixed methods approach was undertaken, using a research model adapted from the literature. Eight hours of interviews were conducted with six students and six teaching staff from Curtin Business School at Curtin University of Technology, which had trialled the plagiarism detection software, EVE2 . A survey questionnaire was completed by 171 students involved in the trial. The summary indication was that students perceived that plagiarism is an important issue; detection software makes it easier for lecturers; it is fair to use detection software; students support its use; and it will have some effect in preventing plagiarism. However, students' concerns included being caught for unintentional plagiarism, teaching staff placing too much emphasis on detection results above student ability, and the accuracy of the software at detecting plagiarism. Concerns for teaching staff included the time taken for the detection process, limitation of the software to publicly based Internet sources and direct copying, and the extra workload involved with pursuing academic misconduct.

Proceedings ArticleDOI
28 May 2008
TL;DR: Although Ac's visualizations were developed with plagiarism detection in mind, they should also prove effective to visualize distance matrices from other domains, as demonstrated by preliminary experiments.
Abstract: Programming assignments are easy to plagiarize in such a way as to foil casual reading by graders. Graders can resort to automatic plagiarism detection systems, which can generate a "distance" matrix that covers all possible pairings. Most plagiarism detection programs then present this information as a simple ranked list, losing valuable information in the process.The Ac system uses the whole distance matrix to provide graders with multiple linked visualizations. The graph representation can be used to explore clusters of highly related submissions at different filtering levels. The histogram representation presents compact "individual" histograms for each submission, complementing the graph representation in aiding graders during analysis.Although Ac's visualizations were developed with plagiarism detection in mind, they should also prove effective to visualize distance matrices from other domains, as demonstrated by preliminary experiments.

Proceedings ArticleDOI
08 Jul 2008
TL;DR: A new approach to reconstruct the evolution process of suspected texts in order to detect plagiarized documents is proposed by adopting the Weibull distribution, which is one of extreme distribution used to compute the statistical significance of genomic sequence matching.
Abstract: Due to smart word processors and powerful Web-searching engines, lots of plagiarism prevail, especially in digital texts. So it is very crucial to develop efficient and effective anti-plagiarism tools to prevent or identify document plagiarism. Till now, a few plagiarism detecting systems have been announced. All previous plagiarism detection studies focus on how to measure the similarity of documents. In this paper, we propose a new approach to reconstruct the evolution process of suspected texts in order to detect plagiarized documents. For this, we propose two major metrics: spatial plagiarism similarity and temporal plagiarism similarity. And by combining these two similarity measure, we give conclusively the evolutionary plagiarism probability model by adopting the Weibull distribution, which is one of extreme distribution used to compute the statistical significance of genomic sequence matching. The main difference of our model to the previous studies is that our model can estimate the plagiarism and its direction as a temporal event. An experiment with a group Internet-posted news clearly coincided to the real plagiarism among those news.

Marlin Thomas1
17 Nov 2008
TL;DR: Faculty use of plagiarism detection software should be reevaluated because of issues related to its efficacy and because of ethical and legal concerns.
Abstract: Plagiarism is a pervasive form of academic dishonesty in collegiate settings. Since it distorts learning and assessment, deterring and detecting it are crucial to maintaining academic integrity. Large class sizes and an increase in writing assignments that result from writing across the curriculum combine to make detection of plagiarism burdensome. The concomitant rapid increase of written material on the Internet and its ease of appropriation contribute to the problem. Plagiarism detection software has emerged in response. The most prominent implementation compares submissions against items in a database and then adds them to that database. It outputs measures of possible plagiarism. Faculty use of the detection software should be reevaluated because of issues related to its efficacy and because of ethical and legal concerns.

01 Jan 2008
TL;DR: This paper presents a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model, and shows that fuzzyset IR successfully detected not only exact but also similar statements that have different structure.
Abstract: The nature of Arabic language structure exposes the need for fuzzy or vague concept to reveal dishonest practices in Arabic documents. In this paper, we present a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model. The degree of similarity is calculated and compared to a threshold value to judge whether two statements are the same or different. Our corpus collection has been built in which all stopwords were removed and non-stop words were stemmed for typical Arabic IR. The corpora have 100 documents with 4367 statements in total. Five query documents with about 250 plagiarized statements were constructed and tested. Experimental results show that fuzzyset IR successfully detected not only exact but also similar statements that have different structure. However, our Arabic fuzzy-set model approach does not handle the case of rewording with different synonyms/antonyms, a deficiency that will lead to future work of modeling the system using Arabic thesaurus. Keywordsfuzzy-set information retrieval; Arabic; plagiarism detection;

Proceedings ArticleDOI
12 Jul 2008
TL;DR: It is found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files.
Abstract: Detecting whether computer program code is a student's original work or has been copied from another student or some other source is a major problem for many universities. Detection methods based on the information retrieval concepts of indexing and similarity matching scale well to large collections of files, but require appropriate similarity functions for good performance. We have used particle swarm optimization and genetic programming to evolve similarity functions that are suited to computer program code. Using a training set of plagiarised and non-plagiarised programs we have evolved better parameter values for the previously published Okapi BM25 similarity function. We have then used genetic programming to evolve completely new similarity functions that do not conform to any predetermined structure. We found that the evolved similarity functions outperformed the human developed Okapi BM25 function. We also found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files. The evolutionary computing techniques have been extremely useful in finding similarity functions that advance the state of the art in code plagiarism detection.

Book ChapterDOI
01 Jan 2008
TL;DR: It is argued that the inappropriate use of electronic plagiarism detection systems (such as Turnitin) could lead to the unfair and unjust construction of international students as plagiarists.
Abstract: This chapter explores the question of plagiarism by international students (non-native speakers). It argues that the inappropriate use of electronic plagiarism detection systems (such as Turnitin) could lead to the unfair and unjust construction of international students as plagiarists. We argue that the use of detection systems should take into account the writing practices used by those who write as novices in a non-native language as well as the way “plagiarism” or plagiaristic forms of writing are valued in other cultures. It calls for a move away from a punitive legalistic approach to plagiarism that equates copying to plagiarism and move to a progressive and formative approach. If taken up, such an approach will have very important implications for the way universities in the west deal with plagiarism in their learning and teaching practice as well as their disciplinary procedures.

01 Jan 2008
TL;DR: The issue of plagiarism is particularly contentious for technical and professional writers, as opposed to academic writers, because of the types of writing activities we regularly engage in this article, and many students of professional writing fear that they may be "stealing" or committing intellectual "theft" whenever they make use of any existing material in their writing.
Abstract: INTRODUCTION Cases of plagiarism among professional writers have gained increasing media attention in recent years. As a result, many students of professional writing fear that they may be “stealing,” or committing intellectual “theft,” whenever they make use of any existing material in their writing. They have been warned against such uses by several sources. Instructors and university administrators tell them that they must follow plagiarism policies or risk earning failing grades or being expelled from the university. In the news they see their peers venture into the professional world and face public criticism and termination of contracts for acts of plagiarism. In addition, attention given to Turnitin.com and other “plagiarism detection technologies” has created a culture of fear among student writers who understand that such technologies may be used for policing their writing practices. (For more on Turntin.com, see http://www.plagiarism.org/.) These stories and others have infiltrated conversations on many college campuses, warning student writers against copying with a seemingly simple message: “don’t steal.” However, as industry professionals in technical communication are well aware, the message is not that simple in our field. The issue of plagiarism is particularly contentious for technical and professional writers, as opposed to academic writers, because of the types of writing activities we regularly engage in. Technical communicators commonly perform a variety of types of composing activities that could be considered plagiarism in the context of the classroom. Such activities include:

25 Mar 2008
TL;DR: In this article, a special emphasis is given to text-matching software called SafeAssignmentTM, which discusses and analyzed the advantages and disadvantages of using automated text matching software's.
Abstract: Academic dishonesty or plagiarism is a growing problem in today's digital world. Use of plagiarism detection tools can assist faculty to combat this form of academic dishonesty. In this article, a special emphasis is given to text-matching software called SafeAssignmentTM. The advantages and disadvantages of using automated text matching software's are discussed and analyzed in detail. The advantages and disadvantages of using automated text matching software's are discussed and analyzed in detail.

Journal ArticleDOI
TL;DR: The authors evaluate the flexibility and richness of two well-established text analysis plagiarism tools, through a consideration of the use of plagiarism detection software as a mechanism for the automated assessment of student-created narrative in a virtual learning environment (VLE).
Abstract: In this paper, the authors evaluate the flexibility and richness of two well-established text analysis plagiarism tools, through a consideration of the use of plagiarism detection software as a mechanism for the automated assessment of student-created narrative in a virtual learning environment (VLE). The authors are currently engaged in a project creating a prototype VLE, using technologies for multilevel and multiplayer games, based on the inherent support such an environment would provide for constructivist learning, engagement, and contextual socialization. Progress between levels in the VLE will be based on the creation, by the student, of a narrative linking together a number of conceptual elements obtained through game-play at that level. Support for the narrative creation process will help the student to contextualize the conceptual elements, providing the necessary linking elements or themes to enable the student to produce a coherent description of their understanding of the concepts. A particular challenge in such environments is the need for fast, real-time feedback to students to maintain the level of engagement and to support the game-play metaphor. Additionally, the student must be able to make as many attempts to progress as they need and it will be their decision when and how often to submit for assessment. Since the student narrative will be in a textual form and can therefore be related to a sample solution narrative, generated by the author of the level within the learning environment, the idea of using plagiarism detection software as the mechanism for automated comparison and assessment was considered appropriate for investigation. While the limitation of such tools would appear to be that they are seeking direct copies of text elements, the authors wanted to investigate whether they offered sufficient richness and fuzziness to detect common conceptually-linked texts. The initial decision was to experiment with text-analytic tools, since they are both widely used and readily available. The tools chosen were TurnItIn, a commercial tool provided to the U.K. higher education community by the U.K. Joint Information Systems Committee (JISC), and VALT/VAST, a set of tools created at the Centre for Interactive Systems Engineering at London South Bank University, London, U.K., the workings of which are based on recognized and well-published research. An experiment using a small group of students in a traditional assessment situation was carried out, and is described in detail. The rationale for this approach was that there is not yet a fully working prototype of the VLE in which to carry out such an experiment, but that the conditions necessary to test the hypothesis that plagiarism tools could be utilized for such a purpose could be replicated sufficiently to make such an experiment viable. The results of the experiment demonstrated neither a correlation between the sample solution and student solutions, nor any correlation between the individual student solutions, proving the null hypothesis. This result demonstrates that these tools are not useful for the development of automated assessment within the VLE, and the authors are now giving consideration to the use of lexical analysis/tokenizer and other tools. However, it also suggests that these text-analytic plagiarism tools are too firmly focused on direct copy, which does raise the question of whether or not they offer enough richness and fuzziness to detect a sophisticated plagiarism attempt using, for example, text replacement tools. An ongoing close relationship between research in automated assessment and plagiarism detection is also proposed, to achieve mutual benefit.

Proceedings ArticleDOI
18 Jun 2008
TL;DR: A system platform to evaluating text similarity and relatedness in multilingual text collections for plagiarism detection and preliminary results show that the platform framework has the potential for cross-lingual text relatedness evaluation and plagiarism Detection.
Abstract: Research work related to plagiarism detection methods in dealing with monolingual texts (e.g. English texts) have been well established in recent years. However, little attention has been paid to facilitate plagiarism detection in cross-lingual text collections (e.g. English and Chinese texts). In this paper we present a system platform to evaluating text similarity and relatedness in multilingual text collections for plagiarism detection. First, we utilized a number of selected texts in Chinese-English parallel corpora collected from Internet to train text classifiers based on the Support Vector Machines (SVM) model. As such, the multilingual texts of unknown category can be classified by the trained classifiers. Subsequently, the resulting categorized texts were measured by means of a language-neutral clustering technique based on Self-Organizing Maps (SOM) method for evaluating semantic similarity among texts. The preliminary results show that our platform framework has the potential for cross-lingual text relatedness evaluation and plagiarism detection.

Proceedings ArticleDOI
12 Jul 2008
TL;DR: Aiming at the Chinese academic paper plagiarism detection, proposed chunk based plagiarism Detection algorithm with chunk extraction method based on character or word and proposed two paragraph weight algorithms and defined three paragraph weight functions.
Abstract: Aiming at the Chinese academic paper plagiarism detection, proposed chunk based plagiarism detection algorithm with chunk extraction method based on character or word. Taken account of that different part of document has different importance, proposed two paragraph weight algorithms and defined three paragraph weight functions. The best chunk lengths are determined by experiments. Experiments show that using paragraph weight can enhance the detection effect.

Journal ArticleDOI
TL;DR: Combining technology and policy can be effective in curtailing blatant plagiarism within large technology courses, and a significant decrease in the number of projects being duplicated is demonstrated.
Abstract: Over one in ten students surveyed have admitted to copying programs in courses with computer assignments. The ease with which digital coursework can be copied and the impracticality of manually checking for plagiarized projects in large courses has only compounded the problem. As current research has focused predominantly on detecting plagiarism for textual applications such as source code and documents, there exists a gap in detecting plagiarism in graphically-driven applications. This paper focuses on the effectiveness of a technological tool in detecting plagiarized projects in a course using Microsoft Access. Seven semesters of data were collected from a large technology-oriented course in which the tool had been in use. Comparing semesters before and after the technological tool was introduced demonstrates a significant decrease in the number of projects being duplicated. The results indicate combining technology and policy can be effective in curtailing blatant plagiarism within large technology courses.

Proceedings ArticleDOI
01 Nov 2008
TL;DR: A numerical based comparison algorithm is proposed that is comparable in the computation time without loosing the word order of common parts in full text document plagiarism.
Abstract: Plagiarism is a form of academic misconduct which has increased with the easy access to obtain information through electronic documents and the Internet. The problem of finding document plagiarism in full text document can be viewed as a problem of finding the longest common parts of strings. Moreover, the detection system has to be capable to determine and visualize not only the common parts but also the location of the common parts in both the source and the observed document. Unlike previous research, this paper proposes a numerical based comparison algorithm that is comparable in the computation time without loosing the word order of common parts. Based on the experiment, the proposed algorithm outperforms the suffix tree in the length of observed paragraph below one hundred words.

Journal Article
TL;DR: The proposed phylogeny construction algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably and is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.
Abstract: Program plagiarism is widespread due to intelligent software and the global Internet environment. Consequently the detection of plagiarized source code and software is becoming important especially in academic field. Though numerous studies have been reported for detecting plagiarized pairs of codes, we cannot find any profound work on understanding the underlying mechanisms of plagiarism. In this paper, we study the evolutionary process of source codes regarding that the plagiarism procedure can be considered as evolutionary steps of source codes. The final goal of our paper is to reconstruct a tree depicting the evolution process in the source code. To this end, we extend the well-known bioinformatics approach, a local alignment approach, to detect a region of similar code with an adaptive scoring matrix. The asymmetric code similarity based on the local alignment can be considered as one of the main contribution of this paper. The phylogenetic tree or evolution tree of source codes can be reconstructed using this asymmetric measure. To show the effectiveness and efficiency of the phylogeny construction algorithm, we conducted experiments with more than 100 real source codes which were obtained from East-Asia ICPC (International Collegiate Programming Contest). Our experiments showed that the proposed algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably. Also, the phylogeny construction algorithm is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.

Journal Article
TL;DR: This paper presents preliminary results on algorithm implementation for processing of 1000+ submissions archive and discusses problems in the implementation of existing anti-plagiarism systems, and describes the open architecture that could be used for plagiarism detection in different kind of assignments.
Abstract: Plagiarism is a problem in many education institutions around the world. Preventing digital plagiarism requires enormous amount of work from educator. In this paper we concentrate on implementation of well known anti-plagiarism algorithm for local and global search for the original source of plagiarized assignment. We first discuss problems in the implementation of existing anti-plagiarism systems, and then describe the open architecture that could be used for plagiarism detection in different kind of assignments from plain text to audio submissions. Finally we present preliminary results on algorithm implementation for processing of 1000+ submissions archive. We hope this paper will add a trend to the discussion of anti-plagiarism systems especially for new types of assignments.

01 Jan 2008
TL;DR: This paper evaluates an Information Retrieval approach of dealing with plagiarism through Vector Spaces that will allow us to detect similarities that are not result of naive copy\&paste.

Journal Article
TL;DR: This paper proposes anew plagiarism detection technique for Java programs using bytecodes without referring their source codes, which can find the distributions of similarities of the source codes and that of bytecodes are very similar.
Abstract: Most plagiarism detection systems evaluate the similarity of source codes and detect plagiarized program pairs. If we use the source codes in plagiarism detection, the source code security can be a significant problem. Plagiarism detection based on target code can be used for protecting the security of source codes. In this paper, we propose a new plagiarism detection technique for Java programs using bytecodes without referring their source codes. The plagiarism detection procedure using bytecode consists of two major steps. First, we generate the token sequences from the Java class file by analyzing the code area of methods. Then, we evaluate the similarity between token sequences using the adaptive local alignment. According to the experimental results, we can find the distributions of similarities of the source codes and that of bytecodes are very similar. Also, the correlation between the similarities of source code pairs and those of bytecode pairs is high enough for typical test data. The plagiarism detection system using bytecode can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.

Journal ArticleDOI
10 Apr 2008-BMJ
TL;DR: CrossCheck, a plagiarism detection service, is to be offered by the independent publishers’ membership association CrossRef, which functions as a sort of digital switchboard for articles from several hundred scholarly and professional publishers.
Abstract: Editors of scientific journals will soon have a new weapon at their disposal in the fight against research misconduct, delegates at the annual meeting of the Committee on Publication Ethics, held in London last week, were told. Scheduled for launch in June, researchers and editorial staff will be able to access CrossCheck, a plagiarism detection service, offered by the independent publishers’ membership association CrossRef. CrossRef is a collaborative reference linking service that functions as a sort of digital switchboard for articles from several hundred scholarly and professional publishers. Each item has …