scispace - formally typeset
Search or ask a question

Showing papers on "Plagiarism detection published in 2006"


Proceedings ArticleDOI
20 Aug 2006
TL;DR: A new plagiarism detection tool, called GPLAG, is developed, which detects plagiarism by mining program dependence graphs (PDGs) and is more effective than state-of-the-art tools for plagiarism Detection.
Abstract: Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own products. Although current plagiarism detection tools appear sufficient for academic use, they are nevertheless short for fighting against serious plagiarists. For example, disguises like statement reordering and code insertion can effectively confuse these tools. In this paper, we develop a new plagiarism detection tool, called GPLAG, which detects plagiarism by mining program dependence graphs (PDGs). A PDG is a graphic representation of the data and control dependencies within a procedure. Because PDGs are nearly invariant during plagiarism, GPLAG is more effective than state-of-the-art tools for plagiarism detection. In order to make GPLAG scalable to large programs, a statistical lossy filter is proposed to prune the plagiarism search space. Experiment study shows that GPLAG is both effective and efficient: It detects plagiarism that easily slips over existing tools, and it usually takes a few seconds to find (simulated) plagiarism in programs having thousands of lines of code.

467 citations


Journal Article
TL;DR: This paper discusses the complex general setting, then reports on some results of plagiarism detection software, and draws attention to the fact that any serious investigation in plagiarism turns up rather unexpected side-effects.
Abstract: Plagiarism in the sense of "theft of intellectual property" has been around for as long as humans have produced work of art and research. However, easy access to the Web, large databases, and telecommunication in general, has turned plagiarism into a serious problem for publishers, researchers and educational institutions. In this paper, we concentrate on textual plagiarism (as opposed to plagiarism in music, paintings, pictures, maps, technical drawings, etc.). We first discuss the complex general setting, then report on some results of plagiarism detection software and finally draw attention to the fact that any serious investigation in plagiarism turns up rather unexpected side-effects. We believe that this paper is of value to all researchers, educators and students and should be considered as seminal work that hopefully will encourage many still deeper investigations.

339 citations


Book ChapterDOI
10 Apr 2006
TL;DR: It is shown that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style, and new features for the quantification of style aspects are added.
Abstract: Current research in the field of automatic plagiarism detection for text documents focuses on algorithms that compare plagiarized documents against potential original documents. Though these approaches perform well in identifying copied or even modified passages, they assume a closed world: a reference collection must be given against which a plagiarized document can be compared. This raises the question whether plagiarized passages within a document can be detected automatically if no reference is given, e. g. if the plagiarized passages stem from a book that is not available in digital form. We call this problem class intrinsic plagiarism detection. The paper is devoted to this problem class; it shows that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style. Our contributions are fourfold: (i) a taxonomy of plagiarism delicts along with detection methods, (ii) new features for the quantification of style aspects, (iii) a publicly available plagiarism corpus for benchmark comparisons, and (iv) promising results in non-trivial plagiarism detection settings: in our experiments we achieved recall values of 85% with a precision of 75% and better.

189 citations


Proceedings ArticleDOI
01 Feb 2006
TL;DR: According to reports, Plaggie is the only open-source plagiarism detection engine for Java exercises and it must be installed locally and its source code is open.
Abstract: A source code plagiarism detection engine Plaggie is presented. It is a stand-alone Java application that can be used to check Java programming exercises. Plaggie's functionality is similar with previously published JPlag web service but unlike JPlag, Plaggie must be installed locally and its source code is open. Apparently, Plaggie is the only open-source plagiarism detection engine for Java exercises.

92 citations


Proceedings ArticleDOI
18 Dec 2006
TL;DR: In this paper, the authors describe a large-scale application of methods for finding plagiarism in research document collections, applied to a collection of 284,834 documents collected by arXiv.org over a 14-year period, covering a few different research disciplines.
Abstract: We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology effi- ciently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to imple- ment as a real-time submission screen for a collection many times larger.

90 citations


01 Jan 2006
TL;DR: This work proposes a novel approach, XPlag, to detect plagiarism involving multiple languages using intermediate program code produced by a compiler suite, and shows that it can detect inter-lingual plagiarism with reasonably good precision.
Abstract: Plagiarism is a widespread problem in assessment tasks; in computing courses, students often plagiarise source code. For all but the smallest classes, manual detection of such plagiarism is impractical, and, while automated tools are available, none has been applied to detect inter-lingual plagiarism, where source code is copied from one language to another. In this work, we propose a novel approach, XPlag, to detect plagiarism involving multiple languages using intermediate program code produced by a compiler suite. We describe experiments to evaluate XPlag, and show that we can detect inter-lingual plagiarism with reasonably good precision.

85 citations


Journal ArticleDOI
TL;DR: The authors used electronic plagiarism detection tools to help students understand correct academic practice in using source material, and found that 41% of students had submitted work identified by Turnitin as possible plagiarism but this reduced to 26% on inspection by academics.
Abstract: Lessons on paraphrasing and citing sources can only be partially effective if they are not perceived as immediately relevant to the individual student. We used electronic plagiarism detection tools to help students understand correct academic practice in using source material. In order to produce an essay on a specified topic, students were required to summarise a number of research papers. The 182 students who took part in this exercise were studying one-year Masters programmes in Computer Science, Automotive Engineering, and Electronics, mainly from China, India and Pakistan and new to the University. These students should have been building on previous study both in subject matter and study skills, but before they tackled the assignment, a series of lectures gave guidance on finding and summarising sources, and reminded students about what constitutes plagiarism. The students' essays were submitted to Turnitin and Ferret -- a straightforward, but resource intensive process -- and the resulting reports used to give individual feedback to students on how original their words appeared to be. This was effective in helping the students to understand plagiarism, because the reports identified plagiarised passages in their own work. Using a threshold of 15% of matching text, we found 41% of students had submitted work identified by Turnitin as possible plagiarism but this reduced to 26% on inspection by academics. After a second submission, incidence of plagiarism dropped to 3% overall. We found that the degree of matching text found correlated with a student's programme of study, but not with nationality.

85 citations


Journal ArticleDOI
TL;DR: It is argued that if online detection is used in conjunction with the many valuable ‘anti‐plagiarism’ resources and tutorials available on the web, it really can become a positive teaching aid for staff and students alike, rather than a threatening online policing system.
Abstract: Although the exponential growth of the Internet has made it easier than ever to carry out plagiarism, it has also made it much easier to detect. This paper gives an overview of the many different methods of detecting web‐based plagiarism which are currently available, assessing practical matters such as cost, functionality and performance. Different types of plagiarism detection services are briefly outlined by broad category. The paper then considers the relative advantages and disadvantages of the different methods, referring to comparative studies where possible. It also draws out some of the more general drawbacks of electronic detection, ranging from practical matters such as technical restrictions, data protection issues, and cost, to the human impact on staff and students alike. It seeks to counterbalance these drawbacks by outlining the many possible benefits of implementing online detection in academic institutions, aside from the obvious saving of time when dealing with large cohorts of students...

72 citations


Journal ArticleDOI
TL;DR: Suggestions for developing a coordinated institutional policy on plagiarism are suggested, and it is predicted that students will resort to increased use of paraphrase in order to drop below the radar of the detection software.
Abstract: The ready availability of Internet resources has made it easier than ever for students to plagiarize and many higher education institutions have resorted to checking essays with plagiarism detection software. Student behaviour is likely to change in response to this increased scrutiny but not necessarily in the desired direction. Internet technology facilitates a ‘cut and paste’ assembly‐line approach to essay writing that will persist despite the use of plagiarism software. It is predicted that students will resort to increased use of paraphrase in order to drop below the radar of the detection software. To illustrate this trend, samples of student essays are analysed and limitations of plagiarism software discussed. The paper concludes with suggestions for developing a coordinated institutional policy on plagiarism, and recommends that policy encompass training and educational initiatives to complement any enforcement strategy using plagiarism software.

65 citations


Proceedings ArticleDOI
03 Mar 2006
TL;DR: The design of a software tool is reported on that implements a fast and accurate plagiarism detection algorithm using the Google Web API and empirical results of a performance and accuracy study are presented.
Abstract: Plagiarism of material from the Internet is a widespread and growing problem. Computer science students, and those in other science and engineering courses, can sometimes get away with a "cut and paste" approach to assembling a paper in part because the expected style of technical writing is less expositional than in liberal arts courses. Detection of cut and paste plagiarism is time-consuming when done by hand, and can be greatly aided by automated software tools. This paper reports on the design of a software tool called SNITCH that implements a fast and accurate plagiarism detection algorithm using the Google Web API. Issues related to plagiarism detection software are discussed and empirical results of a performance and accuracy study are presented.

58 citations


Proceedings ArticleDOI
13 Dec 2006
TL;DR: A recursive plagiarism evaluation function to be evaluated at each level of the document structure which is based on the Levenshtein edit distance is proposed and a method that will eliminate unnecessary chunks comparison, avoiding similarity calculation of chunks which do not share enough 4-grams is proposed.
Abstract: The paper presents the implementation of a tool for plagiarism detection developed within the AXMEDIS project. The algorithm leverages the plagiarist behaviour, which is modeled as a combination of 3 basical actions: insertion, deletion, substitution. We recognize that this behaviour may occur at various level of the document structure: the plagiarist may insert, delete or substitute a word, period or a paragraph. The procedure consists in two main steps: document structure extraction and plagiarism function calculation. We propose a recursive plagiarism evaluation function to be evaluated at each level of the document structure which is based on the Levenshtein edit distance. We also propose a method that will eliminate unnecessary chunks comparison, avoiding similarity calculation of chunks which do not share enough 4-grams. We describe the similarity algorithm and discuss some implementation issues and future work.

Journal ArticleDOI
Robert Evans1
TL;DR: The key findings are that the service did identify examples of poor scholarship and unfair practice that had been missed under the usual marking system but that rigorously checking every script for plagiarism was impractical and trust and student honesty remain central to a successful academic system.
Abstract: Plagiarism by students is seen as an increasing problem. The fear is that students will use the internet to obtain analysis, interpretation or even complete assignments and then submit these as their own work. Electronic plagiarism detection services may help to prevent such unfair practice but, in doing so, they create a new problem: certifying the absence of plagiarism. This article reports the results of an evaluation of one such service within an interdisciplinary school of social sciences. The article describes how the system works and the experiences of staff and students in using the service, together with an evaluation of the data generated. The key findings are that the service did identify examples of poor scholarship and unfair practice that had been missed under the usual marking system but that rigorously checking every script for plagiarism was impractical. Trust and student honesty thus remain central to a successful academic system.

01 Jan 2006
TL;DR: This paper reports an experiment which aims to address the problem in the world of language teaching in that students – especially weaker ones – use free online MT to do their translation homework, using methods from the broader world of computational stylometry, plagiarism detection, text reuse, and MT evaluation.
Abstract: The ready availability of free online machine translation (MT) systems has given rise to a problem in the world of language teaching in that students – especially weaker ones – use free online MT to do their translation homework. Apart from the pedagogic implications, one question of interest is whether we can devise any techniques for automatically detecting such use. This paper reports an experiment which aims to address this particular problem, using methods from the broader world of computational stylometry, plagiarism detection, text reuse, and MT evaluation. A pilot experiment comparing ‘honest’ and ‘derived’ translations produced by 25 intermediate learners of Spanish, Italian and

Journal Article
TL;DR: Modern plagiarism software solutions are considered, paying attention mostly to desktop systems intended for plagiarism detection in program code, and their capabilities, advantages and disadvantages are estimated.
Abstract: Plagiarism in universities has always been a difficult problem to overcome Various tools have been developed over the past few years to help teachers detect plagiarism in students' work By being able to categorize the multitude of plagiarism detection tools, it is possible to estimate their capabilities, advantages and disadvantages In this article I consider modern plagiarism software solutions, paying attention mostly to desktop systems intended for plagiarism detection in program code I also estimate the speed and reliability of different plagiarism detection systems that are currently available

01 Jan 2006
TL;DR: An original method of spectral similarity analysis for plagia- rism detection in university project is presented, based on a clone detection tool called CLAN that performs metrics based sim- ilarity analysis of source code fragments.
Abstract: An original method of spectral similarity analysis for plagiarism detection in university project is presented. The approach is based on a clone detection tool called CLAN that performs metrics based similarity analysis of source code fragments. Definitions and algorithms for spectral similarity analysis are presented and discussed. Experiments performed on university projects are presented. Experimental results include the distribution of similarity in C and C++ projects. Analysis of spectral similarity distribution identifies the most similar pairs of projects that can be considered as candidates for plagiarism.

Book ChapterDOI
07 Aug 2006
TL;DR: A comparison with existing systems such as SID and JPlag shows that the proposed system can detect plagiarism more accurately due to its ability of handling structural information.
Abstract: Many existing plagiarism detection systems fail in detecting plagiarism when there are an abundant garbage in the copied programs. This is because they do not use the structural information efficiently. In this paper, we propose a novel plagiarism detection system which uses parse tree kernels. By incorporating parse tree kernels into the system, it efficiently handles the structural information within source programs. A comparison with existing systems such as SID and JPlag shows that the proposed system can detect plagiarism more accurately due to its ability of handling structural information.

Proceedings ArticleDOI
10 Jul 2006
TL;DR: In this article, the authors identified some impacts of plagiarism on learning, teaching and research in higher education in Bangladesh and suggested strategies for combating plagiarism, which are a) inclusion of some articles on plagiarism in the existing copyright act and cyber law, or creating a separate plagiarism act, b) practice of code of ethics against plagiarism at HEIs through active work-groups to monitor and combat plagiarism.
Abstract: In rendering education in general, particularly in higher education institution (HEI), a fundamental question arises as to how to judge the authenticity of an intellectual work against plagiarism (i.e. submitted assignments, research manuscripts, or teaching materials where sources are not mentioned or claimed by someone of his own). In the current era of information technology, the Internet and other electronic media are extensively used for its potential to improve educational experiences for all the stakeholders: students, researchers and teachers. As people are free to find and use wealth of information, in many of the cases, they are tempted to adopt plagiarism through copy-and-paste instead of assistance. Over decades, a variety of methods of plagiarism detection and prevention have been proposed such as a) plagiarism detection using electronic tools [1], [2], [3], b) improving learning and teaching ethics by conducting specialized courses, c) penalizing persons responsible for their guilt, d) creating awareness about copyright act and cyber law, etc. None of them alone could eliminate the temptation of practicing plagiarism by the students, researchers and surprisingly even by the teachers. Thus comprehensive plans and strategies are necessary to combat plagiarism for effective rendering of education. In this paper, we identified some impacts of plagiarism on learning, teaching and research. In doing so, we interviewed (both open and close ended) students, teachers and researchers from six universities in Bangladesh [4]. The sampled data is analyzed to identify most widely practiced means [5], [6] of plagiarism by three target groups (i.e. students, researchers and teachers) and how they are being currently dealt with by the authority. Finally, strategies are suggested for combating plagiarism, which are a) inclusion of some articles on plagiarism in the existing copyright act and cyber law, or creating a separate plagiarism act, b) practice of code of ethics against plagiarism at HEIs through active work-groups to monitor and combat plagiarism, c) including plagiarism in the secondary and undergraduate syllabuses, research and development guidelines for conducting thesis and research papers and arranging seminars and lectures to stop plagiarism, etc.

Proceedings ArticleDOI
10 Jul 2006
TL;DR: A novel plagiarism detection system and its integration with an e-portfolio used in first year engineering teaching is described and a performance evaluation in terms of accuracy and execution time is presented.
Abstract: We describe a novel plagiarism detection system and its integration with an e-portfolio used in first year engineering teaching. The tool addresses an important issue arising from the decreasing barriers to information access. Academics know that information can support valuable learning experiences, but these experiences are diminished when students plagiarise by copying assignments and getting credit for work they have not done. While it is possible for academics to develop project-based activities to make it harder for students to plagiarise work from outside sources, some students will still copy work done by others within the same class, which can be especially difficult to detect within large cohorts. According to student feedback received while assessing an e-portfolio activity, we found that students were also concerned about plagiarism, and that they modify their approaches to learning based on this concern. We developed a plagiarism detection tool called Beagle, which uses an internal method (also known as collusion): whenever a student submits an assignment to the e-portfolio system, it is compared to those previously submitted by other students. Beagle measures the statistical similarity between students' work using text mining methods. When a specific similarity threshold is reached, the work can be flagged as possible plagiarism or the system can automatically warn the student and request that they resubmit their work. In this paper we present the design of the system, a performance evaluation in terms of accuracy and execution time, and a description of its application integration capabilities through web services.

Proceedings ArticleDOI
10 Jul 2006
TL;DR: Assessment in flexible delivery and how plagiarism can be detected is discussed and a method for testing the identity of a student (or more generally, author) online, without any interference with the examination process is presented.
Abstract: While many institutions of higher education offer courses via distance education, there is one aspect which is difficult to realise by use of the Internet only: assessment. If exams are performed online, how can the course provider guarantee that the student participating in the exam is the person enrolled? Without any Internet-based form of authenticating the student's identity, flexible delivery can break down at this point. As a consequence, traditional identity checks are introduced such as requiring the student to be physically present and to take the exam at a local institution, or requiring the student to sign documents that certify his/her identity. This paper discusses assessment in flexible delivery and how plagiarism can be detected. It presents a method for testing the identity of a student (or more generally, author) online, without any interference with the examination process. Recent advances in computational text analysis allow authorship identification with high reliability. That is, the original author of a document submitted for assessment can be determined successfully with an accuracy and precision of well above 90 percent. The computational methods include machine learning techniques such as "support vector machines", which are highly successful in text classification and a range of other practical applications.

Journal Article
TL;DR: A comparison with existing systems such as SID and JPlag shows that the proposed system can detect plagiarism more accurately due to its ability of handling structural information.
Abstract: Many existing plagiarism detection systems fail in detecting plagiarism when there are an abundant garbage in the copied programs. This is because they do not use the structural information efficiently. In this paper, we propose a novel plagiarism detection system which uses parse tree kernels. By incorporating parse tree kernels into the system, it efficiently handles the structural information within source programs. A comparison with existing systems such as SID and JPlag shows that the proposed system can detect plagiarism more accurately due to its ability of handling structural information.

01 Jan 2006
TL;DR: In this paper, the authors discuss different forms of plagiarism from identical copy to structural plagiarism, and discuss ways and means for discovering plagiarism and call for action against plagiarism.
Abstract: The topics of plagiarism and appropriate reactions to its discovery have been recently discussed. This paper discusses strategies for handling plagiarism. After an attempt to define the term anda discussion of the problems involved in such a definition, the paper lists the different forms of plagiarism, from identical copy to structural plagiarism. A discussion of the types of plagiarism situations is followed by a thorough discussion of ways and means for discovering plagiarism. A test of plagiarism detection software shows that they are only moderately successful. The paper closes with a call for action against plagiarism.

Journal ArticleDOI
TL;DR: Turnitin'UK as discussed by the authors is a plagiarism detection software service used at the University of Sheffield to detect plagiarism and to improve student support, staff awareness and more consistent practice overall.
Abstract: Research and consultations in session 2003/2004 by a University's Plagiarism Working Group uncovered a poor understanding of plagiarism and inconsistent handling procedures throughout its schools. In an effort to address both these issues, a strategic 2-year Action Plan was developed and rolled out beginning the following academic year in order to improve student support, staff awareness and more consistent practice overall. The plan included a pilot using the detection software service, Turnitin'UK, with five of the University's 14 schools. The pilot was only one of a series of university-wide deliberations, others included the revision and piloting of a University Plagiarism Code of Practice, implementation of school-based academic conduct officers, improved staff development opportunities and student support materials and events. One school in the University has served as a role model of good practice throughout. Noteworthy is the school's record keeping practice since session 2001/02 of incidences of plagiarism and other academic misconduct. In the paper we present the factors such as gender, nationality and level of study that have been found linked to the incidences of plagiarism in the school. Additionally, the role plagiarism detection software plays in addressing plagiarism is explored within the collaborative and holistic approach of the Action Plan. Finally, the challenges and resistance faced by key players throughout the implementation of the first phase of the Action Plan at the University are considered and the commitment to continuous enhancement recognised.

Book ChapterDOI
19 Feb 2006
TL;DR: In this paper, a document copy detection system that calculates the similarity between documents based on plagiarism patterns is presented, and experiments were performed using CISI document collection and show that the proposed system produces more precise results than existing systems.
Abstract: Document copy detection is a very important tool for protecting author’s copyright. We present a document copy detection system that calculates the similarity between documents based on plagiarism patterns. Experiments were performed using CISI document collection and show that the proposed system produces more precise results than existing systems.

Journal ArticleDOI
TL;DR: Turnitin.com as discussed by the authors uses the Internet-based plagiarism detection service to teach better techniques of conducting research and source documentation, instead of focusing on detecting and punishing plagiarism.
Abstract: Instead of focusing on detecting and punishing plagiarism, this teaching innovation uses the Internet-based plagiarism detection service, turnitin.com, to teach better techniques of conducting research and source documentation. Syllabus content, referrals to the University Writing Center, peer review, lecture, and examples of good and bad acknowledgement practice, as well as the professor's own use of the service, are techniques employed to turn the submission of papers to turnitin.com into a learning event, rather than into apresumption of guilt and possible punishment. Data show that most students seem to appreciate the approach taken.

Proceedings ArticleDOI
10 Jul 2006
TL;DR: Initial qualitative and quantitative evaluations illustrate a flexible, convenient and cost-effective tool for building plagiarism detectors for effective detection of programs in various imperative and procedural programming languages.
Abstract: A system for the automatic generation of plagiarism detectors that find similar programs in a set of student programs is presented. Existing plagiarism detectors are either applied to a programming language or a pre-defined set of programming languages. The general purpose one usually employs string matching to perform similarity measures that are based on plagiarism detection among documents in general, and not in programs in particular, thus, losing much of the structure and logic of programs in the process. On the other hand, plagiarism detectors for specific languages only cater to that particular set of languages. This study provides a means for the user to specify the programming language of the student programs to be analyzed. Moreover, an automatic plagiarism detector system must be immune to the transformations that students perform on copied programs. These transformations are usually dependent on several factors namely: the type of programming problems and correspondingly, the complexity of the project to be implemented by the students, and also the programming language paradigm of the programs. Thus, the similarity measures employed by the system should be determined by these factors and can be specified by the professor. He/she has the option to specify how the similarities among the student programs will be captured. The system provides an interface for the specification of the particular programming language in which the student programs are implemented, and a knowledgebase of similarity measures that the user would like to include in the analysis of the student programs. Hence, the system provides flexibility in the programming language of the student programs to be analyzed and the similarity measures that the professor wishes to employ. Initial qualitative and quantitative evaluations illustrate a flexible, convenient and cost-effective tool for building plagiarism detectors for effective detection of programs in various imperative and procedural programming languages. The approach also addresses some of the changes that students perform on copied programs which JPlag fails to handle, thus, allowing for improved accuracy in terms of the reduction of false-positives, increasing the chance of catching plagiarized programs. These changes include modification of control structures, use of temporary variables and subexpressions, in-lining and re-factoring of methods, and redundancy (variables or methods that were not used). Comprehensive tests on other programming languages under various programming language paradigms such as object-oriented, logic and functional languages, considering the different changes that the students employ to copied programs (such as the tests done in JPlag) are also recommended for empirical evaluation

Proceedings ArticleDOI
21 Nov 2006
TL;DR: An unsupervised method for automatic music motive extraction from symbolic sources, using an intervallic analysis is presented, and the results are evaluated quantitatively using a melodic similarity technique.
Abstract: Music motive extraction is an important concept to consider in music information retrieval Among the possible applications are the creations of music databases that need of indexing tools and access in a dynamic way, copyright management and plagiarism detection, computer-aided composition, etc This paper presents an unsupervised method for automatic music motive extraction from symbolic sources, using an intervallic analysis The results are evaluated quantitatively using a melodic similarity technique

Journal ArticleDOI
TL;DR: This research presents a novel and scalable approach, developed over the past few years, that can be used by teachers to detect plagiarism in students' work.
Abstract: Plagiarism in universities has always been a difficult problem to overcome. Various tools have been developed over the past few years to help teachers detect plagiarism in students' work. By being ...

Proceedings ArticleDOI
06 Dec 2006
TL;DR: A framework for applying an empirical approach to requirements is proposed to adapt this approach to various developments and accuracy of the plagiarism detection tool's output has reached 71%.
Abstract: Extracting requirements for large-scale software development has become increasingly complicated because many users from different organizations should collaborate with each other. Although use cases and UML diagrams for analyzing requirements are powerful tools, users often have difficulty understanding them. Therefore, we tried to extract detailed requirements via an empirical approach when we developed a plagiarism detection tool for students' reports. Key points of this approach are recording empirical data in an experiment and using an incomplete prototype based on component requirements. By comparing empirical data with output of the incomplete prototype, the detailed requirements are incrementally determined without additional efforts by the users. As a result of this approach, accuracy of the detection tool's output has reached 71%. In addition, we propose a framework for applying an empirical approach to requirements to adapt this approach to various developments.

Journal Article
TL;DR: Automatica measuring the program code similarity can not only detect the program plagiarism, but also assist to implement the automation of checking the assignments or correcting the papers.
Abstract: Automatica measuring the program code similarity can not only detect the program plagiarism,but also assist to implement the automation of checking the assignments or correcting the papers.This paper introduced the automatic techniques of measuring and methods of design and the realization of several abroad program plagiarism detection systems.

05 Sep 2006
TL;DR: The results show that while the automatic plagiarism detection system is very useful, the simple and free system can be adequate for most purposes, and recommendations on a suitable system for the OUM context are made.
Abstract: In the Open University environment where students are not centrally located and are not under any direct supervision the potential for plagiarism definitely exists. The very technology that facilitates open learning also allows easy exchange of papers among peers. Students can also easily access “paper-mills” where essays can be quickly customized to suit requirements, for a fee. There is also the vast information residing on the Internet ready for creative reuse. In the light of all these temptations the Open University of Malaysia (OUM) is exploring the use of technology to educate students and deter plagiarism. One approach that appears promising is to use a good commercial plagiarism detection system. The system, by being able to detect cases of plagiarism would serve as a deterrent and hopefully contribute towards inculcating the culture of honesty. This paper presents the findings from a small study using two detection systems, a commercial Plagiarism Detection System, MyDropBox and a simple and free automatic file comparison system, Pl@giarism. The results show that while the automatic system is very useful, the simple and free system can be adequate for most purposes. Recommendations on a suitable system for the OUM context are then made based on these findings. (Author's abstract)