scispace - formally typeset
Search or ask a question

Showing papers on "Plagiarism detection published in 2015"


Journal ArticleDOI
TL;DR: A new, less-detectable method of cyber-facilitated plagiarism known as ‘back translation’ is presented, where students are running text through language translation software to disguise the original source.
Abstract: Advances have been made in detecting and deterring the student plagiarism that has accompanied the uptake and development of the internet. Many authors from the late 1990s onwards grappled with plagiarism in the digital age, presenting articles that were provoking and established the foundation for strategies to address cyber plagiarism, including software such as TurnitinTM. In the spirit of its predecessors, this article presents a new, less-detectable method of cyber-facilitated plagiarism known as ‘back translation’, where students are running text through language translation software to disguise the original source. This paper discusses how this plagiarism strategy attempts to subvert academic attempts to detect plagiarism and maintain academic integrity in the digital age, before presenting useful detection tools and then critiquing three classroom plagiarism management approaches for their usefulness in the current digital and educational context.

65 citations


Journal ArticleDOI
TL;DR: This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition and shows that the proposed method is able to improve the performance compared with the participating systems in PAN-PC-11.
Abstract: Plagiarism is described as the reuse of someone else's previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source document sentence and a suspicious document sentence, when two sentences have same surface text (the words are the same) or they are a paraphrase of each other. Therefore it causes inaccurate or unnecessary matching results. However, this method can improve the performance of plagiarism detection because it is able to avoid selecting the source text sentence whose similarity with suspicious text sentence is high but its meaning is different. It is executed by computing the semantic and syntactic similarity of the sentence-to-sentence. Besides, the proposed method expands the words in sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. This method is also capable to identify various kinds of plagiarism such as the exact copied text, paraphrasing, transformation of sentences and changing of word structure in the sentences. As a result, the experimental results have displayed that the proposed method is able to improve the performance compared with the participating systems in PAN-PC-11. The experimental results also displayed that the proposed method demonstrates better performance as compared to other existing techniques on PAN-PC-10 and PAN-PC-11 datasets.

62 citations


Journal ArticleDOI
TL;DR: A new type of software birthmark called DYnamic Key Instruction Sequence (DYKIS) that can be extracted from an executable without the need for source code is proposed that is resilient to both weak obfuscation techniques such as compiler optimizations and strong obfuscations implemented in tools such as SandMark, Allatori and Upx.
Abstract: A software birthmark is a unique characteristic of a program. Thus, comparing the birthmarks between the plaintiff and defendant programs provides an effective approach for software plagiarism detection. However, software birthmark generation faces two main challenges: the absence of source code and various code obfuscation techniques that attempt to hide the characteristics of a program. In this paper, we propose a new type of software birthmark called DYnamic Key Instruction Sequence (DYKIS) that can be extracted from an executable without the need for source code. The plagiarism detection algorithm based on our new birthmarks is resilient to both weak obfuscation techniques such as compiler optimizations and strong obfuscation techniques implemented in tools such as SandMark , Allatori and Upx . We have developed a tool called DYKIS-PD (DYKIS Plagiarism Detection tool) and conducted extensive experiments on large number of binary programs. The tool, the benchmarks and the experimental results are all publicly available.

52 citations


Book ChapterDOI
08 Sep 2015
TL;DR: An overview of the PAN/CLEF evaluation lab is presented, in addition to usual author demographics, five personality traits are introduced openness, conscientiousness, extraversion, agreeableness, and neuroticism and a new corpus of Twitter messages covering four languages was developed.
Abstract: This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problems. In plagiarism detection, community-driven corpus construction is introduced as a new way of developing evaluation resources with diversity. In author identification, cross-topic and cross-genre author verification where the texts of known and unknown authorship do not match in topic and/or genre is introduced. A new corpus was built for this challenging, yet realistic, task covering four languages. In author profiling, in addition to usual author demographics, such as gender and age, five personality traits are introduced openness, conscientiousness, extraversion, agreeableness, and neuroticism and a new corpus of Twitter messages covering four languages was developed. In total, 53 teams participated in all three tasks of PAN 2015 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.

52 citations


Journal ArticleDOI
TL;DR: Turnitin et al. as discussed by the authors analyzed 384 doctoral dissertations written in English and published by accredited universities in the U.S. and Canada to investigate the potential influence the prevalence of the Internet has had on significant higher education artifacts.
Abstract: Plagiarism has been a long standing concern within higher education. Yet with the rapid rise in the use and availability of the Internet, both the research literature and media have raised the notion that the online environment is accelerating the decline in academic ethics. The majority of research that has been conducted to investigate such claims have involved self-report data from students. This study sought to collect empirical data to investigate the potential influence the prevalence of the Internet has had on significant higher education artifacts by comparing dissertations written prior to widespread use of the Internet with those written in a period in ubiquitous Internet use. Due to the prestige associated with the doctoral degree and the fact that the majority of the effort necessary to achieve such a degree resides within the dissertation, this study utilized Doctor of Philosophy (PhD) dissertations written in English and published by accredited universities in the U.S. and Canada. A sample of 384 dissertations were analyzed by Turnitin plagiarism detection software. The mean similarity indices for pre-Internet and post-Internet eras were 14.5 and 12.3 %, respectively. A Mann Whitney U test (Mdn = 13, U = 30,098.5, p < 0.001) indicated that the differences between groups was significant, however opposite than has been purported within the exigent literature. When comparing the counts of dissertations for each time era considering those with plagiarism versus those that had little/no evidence thereof, there was no statistically significant difference (χ2 [1, N = 368] = 2.61, p = 0.11). The findings of this study suggest that the Internet may not be significantly impacting the prevalence of plagiarism in advanced levels of higher education.

40 citations


Journal ArticleDOI
TL;DR: The value-based plagiarism detection method (VaPD) uses the longest common subsequence based similarity measuring algorithms to check whether two code fragments belong to the same lineage and is resilient to various control and data obfuscation techniques.
Abstract: Illegal code reuse has become a serious threat to the software community. Identifying similar or identical code fragments becomes much more challenging in code theft cases where plagiarizers can use various automated code transformation or obfuscation techniques to hide stolen code from being detected. Previous works in this field are largely limited in that (i) most of them cannot handle advanced obfuscation techniques, and (ii) the methods based on source code analysis are not practical since the source code of suspicious programs typically cannot be obtained until strong evidences have been collected. Based on the observation that some critical runtime values of a program are hard to be replaced or eliminated by semantics-preserving transformation techniques, we introduce a novel approach to dynamic characterization of executable programs. Leveraging such invariant values, our technique is resilient to various control and data obfuscation techniques. We show how the values can be extracted and refined to expose the critical values and how we can apply this runtime property to help solve problems in software plagiarism detection. We have implemented a prototype with a dynamic taint analyzer atop a generic processor emulator. Our value-based plagiarism detection method (VaPD) uses the longest common subsequence based similarity measuring algorithms to check whether two code fragments belong to the same lineage. We evaluate our proposed method through a set of real-world automated obfuscators. Our experimental results show that the value-based method successfully discriminates 34 plagiarisms obfuscated by SandMark, plagiarisms heavily obfuscated by KlassMaster, programs obfuscated by Thicket, and executables obfuscated by Loco/Diablo.

40 citations


Proceedings ArticleDOI
19 Nov 2015
TL;DR: The results indicate that there is potential in using this method of identifying individuals from typing data captured by a programming environment as these individuals are learning to program, and that such data has privacy concerns that should be addressed.
Abstract: Being able to identify the user of a computer solely based on their typing patterns can lead to improvements in plagiarism detection, provide new opportunities for authentication, and enable novel guidance methods in tutoring systems. However, at the same time, if such identification is possible, new privacy and ethical concerns arise. In our work, we explore methods for identifying individuals from typing data captured by a programming environment as these individuals are learning to program. We compare the identification accuracy of automatically generated user profiles, ranging from the average amount of time that a user needs between keystrokes to the amount of time that it takes for the user to press specific pairs of keys, digraphs. We also explore the effect of data quantity and different acceptance thresholds on the identification accuracy, and analyze how the accuracy changes when identifying individuals across courses. Our results show that, while the identification accuracy varies depending on data quantity and the method, identification of users based on their programming data is possible. These results indicate that there is potential in using this method, for example, in identification of students taking exams, and that such data has privacy concerns that should be addressed.

38 citations


Journal ArticleDOI
TL;DR: It is observed that using the online compiler and the plagiarism detection tool reduces time and effort needed for the assessment of the programming assignments; prevents the authors' students from plagiarism; and increases their success in their programming based Data Structures course.
Abstract: In this study, an online compiler and a source code plagiarism detection tool have been included into the Moodle based distance education system of our Computer Engineering department. For this purpose Moodle system has been extended with the GCC compiler, and the Moss source code plagiarism detection tool. We observed that using the online compiler and the plagiarism detection tool reduces time and effort needed for the assessment of the programming assignments; prevents our students from plagiarism; and increases their success in their programming based Data Structures course. © 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:363–373, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21606

37 citations


Journal ArticleDOI
TL;DR: This work presents an approach called program it yourself (PIY) which is empirically shown to outperform MOSS in detection accuracy, and is also capable of maintaining detection accuracy and reasonable runtimes even when using extremely large data repositories.
Abstract: Vast amounts of information available online make plagiarism increasingly easy to commit, and this is particularly true of source code. The traditional approach of detecting copied work in a course setting is manual inspection. This is not only tedious but also typically misses code plagiarized from outside sources or even from an earlier offering of the course. Systems to automatically detect source code plagiarism exist but tend to focus on small submission sets. One such system that has become the standard in automated source code plagiarism detection is measure of software similarity (MOSS) Schleimer et al. in proceedings of the 2003 ACM SIGMOD international conference on management of data, ACM, San Diego, 2003. In this work, we present an approach called program it yourself (PIY) which is empirically shown to outperform MOSS in detection accuracy. By utilizing parallel processing and data clustering, PIY is also capable of maintaining detection accuracy and reasonable runtimes even when using extremely large data repositories.

31 citations


Proceedings ArticleDOI
01 Aug 2015
TL;DR: The results demonstrate that the performance of the proposed fuzzy-based approach overcomes all other approaches on well-known source code datasets, and reveals promising results as an efficient and reliable approach to source-code plagiarism detection.
Abstract: Source-code plagiarism detection in programming, concerns the identification of source-code files that contain similar and/or identical source-code fragments. Fuzzy clustering approaches are a suitable solution to detecting source-code plagiarism due to their capability to capture the qualitative and semantic elements of similarity. This paper proposes a novel Fuzzy-based approach to source-code plagiarism detection, based on Fuzzy C-Means and the Adaptive-Neuro Fuzzy Inference System (ANFIS). In addition, performance of the proposed approach is compared to the Self- Organising Map (SOM) and the state-of-the-art plagiarism detection Running Karp-Rabin Greedy-String-Tiling (RKR-GST) algorithms. The advantages of the proposed approach are that it is programming language independent, and hence there is no need to develop any parsers or compilers in order for the fuzzy-based predictor to provide detection in different programming languages. The results demonstrate that the performance of the proposed fuzzy-based approach overcomes all other approaches on well-known source code datasets, and reveals promising results as an efficient and reliable approach to source-code plagiarism detection.

30 citations


01 Jan 2015
TL;DR: This paper overviews the five source retrieval approaches that have been submitted to the seventh international competition on plagiarism detection at PAN 2015 and compares the performances of these five approaches to the 14 methods submitted in the two previous years.
Abstract: This paper overviews the five source retrieval approaches that have been submitted to the seventh international competition on plagiarism detection at PAN 2015. We compare the performances of these five approaches to the 14 methods submitted in the two previous years (eight from PAN 2013 and six from PAN 2014). For the third year in a row, we invited software submissions instead of run submissions, such that cross-year evaluations are possible. This year’s stand-alone source retrieval overview can thus to some extent also be used as a reference to the different ideas presented in the last three years—the text alignment subtask will be depicted in another individual overview. Linda Cappellato and Nicola Ferro and Gareth Jones and Eric San Juan (eds.): CLEF 2015 Labs and Workshops, Notebook Papers, 8-11 September, Toulouse, France. CEUR Workshop Proceedings. ISSN 1613-0073, http://ceur-ws.org/Vol-1391/, 2015.

Proceedings ArticleDOI
04 Nov 2015
TL;DR: A more effective plagiarism detection algorithm based on abstract syntax tree (AST) is proposed by computing the hash values of the syntax tree nodes, and comparing them, and it performs well in the code comparison field, and is helpful in the area of protecting source code's copyright.
Abstract: In modern software engineering, software plagiarism is widespread and uncurbed, developing plagiarism detection methods is imperative. Popular technologies of software plagiarism detection are mostly based on text, token and syntax tree. Among these plagiarism detection technologies, tree-based plagiarism detection technology can effectively detect the code which cannot be detected by the other two kinds of technologies. In this paper, we propose a more effective plagiarism detection algorithm based on abstract syntax tree (AST) by computing the hash values of the syntax tree nodes, and comparing them. In order to implement the algorithm more effectively, special measurement is taken to reduce the error rate when calculating the hash values of operations, especially the arithmetic operations like subtraction and division. Results of the test showed that the measurement is reliable and necessary. It performs well in the code comparison field, and is helpful in the area of protecting source code's copyright.

Journal ArticleDOI
TL;DR: Some of the plagiarism detection tools available for plagiarism checking and types of plagiarism are described, which are useful to the academic community to detect plagiarism of others and avoid such unlawful activity.
Abstract: Plagiarism has become an increasingly serious problem in the academic world. It is aggravated by the easy access to and the ease of cutting and pasting from a wide range of materials available on the internet. It constitutes academic theft the offender has 'stolen' the work of others and presented the stolen work as if it were his or her own. It goes to the integrity and honesty of a person. It stifles creativity and originality, and defeats the purpose of education The plagiarism is a widespread and growing problem in the academic process. The traditional manual detection of plagiarism by human is difficult, not accurate, and time consuming process as it is difficult for any person to verify with the existing data. The main purpose of this paper is to present existing tools about in regards with plagiarism detection. Plagiarism detection tools are useful to the academic community to detect plagiarism of others and avoid such unlawful activity. This paper describes some of the plagiarism detection tools available for plagiarism checking and types of plagiarism.

Journal ArticleDOI
Merin Paul1, Sangeetha Jamal1
TL;DR: A new technique which uses Semantic Role Labelling and Sentence Ranking for plagiarism detection and it was found out that the application of sentence ranking in plagiarism Detection method decreases the time of checking.

Journal ArticleDOI
TL;DR: The purpose of this research is to uncover potential cases of source code reuse in large‐scale environments by using an automatic system based on the comparison of programs at character level to find similarities among multiple sets of source codes.
Abstract: The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. © 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608

Book ChapterDOI
08 Sep 2015
TL;DR: This work describes its approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the best-performing system at the PAN 2014 competition and outperforms the best -performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet.
Abstract: The task of monolingual text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the best-performing system at the PAN 2014 competition and outperforms the best-performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to consider stopwords without increasing the rate of false positives. We introduce a recursive algorithm to extend the ranges of matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. Our system is available as open source.

Journal ArticleDOI
TL;DR: An architecture is proposed that uses a semantic similarity measure that exploits the semantic similarity of words, as mined from within the data corpus, thereby using localized contextual information to detect plagiarism.

Proceedings Article
01 Jan 2015
TL;DR: An overview paper describes these evaluation corpora of plagiarism detection methods for Arabic texts, discusses the participants' methods, and highlights their building blocks that could be language dependent.
Abstract: is the first shared task that addresses the evaluation of plagiarism detection methods for Arabic texts. It has two sub- tasks, namely external plagiarism detection and intrinsic plagiarism detection. A total of 8 runs have been submitted and tested on the standardized corpora developed for the track. This overview paper describes these evaluation corpora, discusses the participants' methods, and highlights their building blocks that could be language dependent.

Proceedings ArticleDOI
25 May 2015
TL;DR: An automated assessment system for programming assignments that includes dynamic testing of student programs, plagiarism detection, and a proper presentation of the results is introduced.
Abstract: Modern teaching paradigms promote active student participation, encouraging teachers to adapt the teaching process to involve more practical work. In the introductory programming course at the Faculty of Computer and Information Science, University of Ljubljana, Slovenia, homework assignments contribute approximately one half to the total grade, requiring a significant investment of time and human resources in the assessment process. This problem was alleviated by the automated assessment of homework assignments. In this paper, we introduce an automated assessment system for programming assignments that includes dynamic testing of student programs, plagiarism detection, and a proper presentation of the results. We share our experience and compare the introduced system with the manual assessment approach used before.

Proceedings ArticleDOI
28 Sep 2015
TL;DR: This paper focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task and the impact of utilizing part of speech tagging (POS) in the plagiarism Detection model is analyzed.
Abstract: Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN1 -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet_score.

Journal ArticleDOI
TL;DR: The prevalence of matches with one's own publications calls for more explicit operational standards among disciplines in this regard and points toward factors that may contribute to unintentional self-plagiarism, such as lexical bundles or authors' stylistic habits in writing.

Journal ArticleDOI
TL;DR: The findings revealed that existing plagiarism detection techniques require further enhancements as existing techniques are incapable of efficiently detecting plagiarised ideas, figures, tables, formulas and scanned documents.
Abstract: Purpose – The purpose of this paper is to analyse the state-of-the-art techniques used to detect plagiarism in terms of their limitations, features, taxonomies and processes. Design/methodology/approach – The method used to execute this study consisted of a comprehensive search for relevant literature via six online database repositories namely; IEEE xplore, ACM Digital Library, ScienceDirect, EI Compendex, Web of Science and Springer using search strings obtained from the subject of discussion. Findings – The findings revealed that existing plagiarism detection techniques require further enhancements as existing techniques are incapable of efficiently detecting plagiarised ideas, figures, tables, formulas and scanned documents. Originality/value – The contribution of this study lies in its ability to have exposed the current trends in plagiarism detection researches and identify areas where further improvements are required so as to complement the performances of existing techniques.

Journal ArticleDOI
TL;DR: Turnitin this article detected high and excessive levels of plagiarism in academic articles appearing in 19 South African management journals in 2011 and the cost to government of subsidising unoriginal work in these journals was calculated to approximate ZAR7 million for the period under review.
Abstract: Plagiarism by academics has been relatively unexplored thus far. However, there has been a growing awareness of this problem in recent years. We submitted 371 published academic articles appearing in 19 South African management journals in 2011 through the plagiarism detection software program Turnitin™. High and excessive levels of plagiarism were detected. The cost to government of subsidising unoriginal work in these journals was calculated to approximate ZAR7 million for the period under review. As academics are expected to role model ethical behaviour to students, such a finding is disturbing and has implications for the reputations of the institutions to which the authors are affiliated as well as that of the journals that publish articles that contain plagiarised material.

01 Jan 2015
TL;DR: The effectiveness of algorithms used to measure the similarity between two documents are compared and it is found that the performance of fingerprint and winnowing is better than the cosine similarity.
Abstract: Nowadays, measuring the similarity of documents plays an important role in text related researches and applications such as document clustering, plagiarism detection, information retrieval, machine translation and automatic essay scoring. Many researches have been proposed to solve this problem. They can be grouped into three main approaches: String-based, Corpus-based and Knowledge-based Similarities. In this paper, the similarity of two documents is gauged by using two string-based measures which are character-based and term-based algorithms. In character-based method, n-gram is utilized to find fingerprint for fingerprint and winnowing algorithms, then Dice coefficient is used to match two fingerprints found. In term-based measurement, cosine similarity algorithm is used. In this work, we would like to compare the effectiveness of algorithms used to measure the similarity between two documents. From the obtained results, we can find that the performance of fingerprint and winnowing is better than the cosine similarity. Moreover, the winnowing algorithm is more stable than others.

Journal ArticleDOI
TL;DR: The statistical analysis using paired t-tests shows that the proposed approach is statistically significant in comparison with the baselines, which demonstrates the competence of fuzzy semantic-based model to detect plagiarism cases beyond the literal plagiarism.
Abstract: Highly obfuscated plagiarism cases contain unseen and obfuscated texts, which pose difficulties when using existing plagiarism detection methods. A fuzzy semantic-based similarity model for uncovering obfuscated plagiarism is presented and compared with five state-of-the-art baselines. Semantic relatedness between words is studied based on the part-of-speech (POS) tags and WordNet-based similarity measures. Fuzzy-based rules are introduced to assess the semantic distance between source and suspicious texts of short lengths, which implement the semantic relatedness between words as a membership function to a fuzzy set. In order to minimize the number of false positives and false negatives, a learning method that combines a permission threshold and a variation threshold is used to decide true plagiarism cases. The proposed model and the baselines are evaluated on 99,033 ground-truth annotated cases extracted from different datasets, including 11,621 (11.7%) handmade paraphrases, 54,815 (55.4%) artificial plagiarism cases, and 32,578 (32.9%) plagiarism-free cases. We conduct extensive experimental verifications, including the study of the effects of different segmentations schemes and parameter settings. Results are assessed using precision, recall, F-measure and granularity on stratified 10-fold cross-validation data. The statistical analysis using paired t-tests shows that the proposed approach is statistically significant in comparison with the baselines, which demonstrates the competence of fuzzy semantic-based model to detect plagiarism cases beyond the literal plagiarism. Additionally, the analysis of variance (ANOVA) statistical test shows the effectiveness of different segmentation schemes used with the proposed approach.

Journal ArticleDOI
TL;DR: In this article, the authors examined 48 licenciatura theses and 102 masters theses from five of Mozambique's largest universities and found that 75% contained significant plagiarism and 39% very much plagiarism.
Abstract: Hugely facilitated by the Internet, plagiarism by students threatens educational quality and professional ethics worldwide Plagiarism reduces learning and is correlated with increased fraud and inefficiency on the job, thus lessening competitiveness and hampering development In this context, the present research examines 48 licenciatura theses and 102 masters theses from five of Mozambique’s largest universities Of the 150 theses, 75% contained significant plagiarism (>100 word equivalents) and 39%, very much (>500 word equivalents) Significant plagiarism was detected in both licenciatura and masters theses By using both Turnitin and Urkund to identify potentially plagiarized passages, professionally verifying whether those passages contain plagiarism, and, if confirmed, counting the words involved, the study presents a new method for classifying the quantity and significance of plagiarism The use of two text-similarity-recognition programs also improved the rate of detection and, in some theses, significantly increased the classification of the gravity of the plagiarism encountered Based on a broad review of the literature, the article argues that, to combat wide-scale plagiarism, academic institutions need to cultivate a consensus among faculty and students about the definition and types of plagiarism, the appropriate penalties, and the paramount professional and economic need to nurture professional ethics However, to achieve even partial success requires significant involvement by administrators, faculty, students and student leaders guided by a holistic strategy using technological, pedagogical, administrative and legal components to prevent and detect plagiarism and then reeducate or discipline students caught plagiarizing

Journal ArticleDOI
TL;DR: Turnitin should not be used as a 'plagiarism detection' tool instead, it can act as a self-assessment and self-learning aid to inform writing enhancement as discussed by the authors.
Abstract: Considering the change of attitudes of plagiarism detection to assessment for learning, it is necessary to explore the effect of the paradigm shift for Turnitin, from ‘plagiarism detection’ to self-service learning aid Two research questions are explored in the present study: (1) How Turnitin augments self-service skills of students and lecturers to inform learning enhancement? and (2) What is the polarity of positive and constructive experience with the use of Turnitin to narrow the gap between students’ expectations and university standards? Taking cross-disciplinary groups of academics and students, the study identifies their experiences The findings suggest that Turnitin enables students to conduct self-service and independent learning through the pedagogical use of the originality report Turnitin should not be used as a ‘plagiarism detection’ tool Instead, it can act as a self-assessment and self-learning aid to inform writing enhancement Recommendations and insights are discussed for such pedag

Journal ArticleDOI
TL;DR: TurnItIn plagiarism detection software as discussed by the authors found that students had a greater understanding of plagiarism, increased efficacy, and fewer instances of plagiarisms after exposure to an instructional activity on plagiarism.
Abstract: Plagiarism is a prevalent form of academic dishonesty in the undergraduate instructional context. Although students engage in plagiarism with some frequency, instructors often do little to help students understand the significance of plagiarism or to create assignments that reduce its likelihood. This study reports survey, coding, and TurnItIn software results from an evaluation of an instructional activity designed to help students improve their understanding of plagiarism, the consequences of plagiarizing, strategies to help them engage in ethical writing, and key citation elements. Results indicate students had a greater understanding of plagiarism, increased efficacy, and fewer instances of plagiarism as determined by TurnItIn plagiarism software after exposure to an instructional activity on plagiarism. Not surprisingly, when instructors prioritize academic honesty in their classrooms, train students on how to integrate others’ works, cite sources appropriately, and use plagiarism detection software, students are less likely to plagiarize. The discussion includes suggestions for instructors to help them create a plagiarism-free environment.

01 Jan 2015
TL;DR: The approach for construction of a monolingual Persian plagia- rism corpus that can be used to evaluate the performance of Persian plagiarism detection systems is described.
Abstract: The task of text alignment corpus construction at PAN 2015 competi- tion consists of preparing a plagiarism corpus so that it can provide various ob- fuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this pa- per, we describe our approach for construction of a monolingual Persian plagia- rism corpus that can be used to evaluate the performance of Persian plagiarism detection systems.

Proceedings ArticleDOI
13 May 2015
TL;DR: The proposed method is based on modeling the relation between documents and their n-gram phrases, and outperformed Plagiarism-Checker-X, especially for the intelligent similarity cases with syntactic changes.
Abstract: The computerized methods for document similarity estimation (or plagiarism detection) in natural languages, evolved during the last two decades, have focused on English language in particular and some other languages such as German and Chinese. On the other hand, there are several language-independent methods, but the accuracy of these methods is not satisfactory, especially with morphological and complicated languages such as Arabic. This paper proposes an innovative content-based method for document similarity analysis devoted to Arabic language in order to bridge the existing gap in such software solutions. The proposed method is based on modeling the relation between documents and their n-gram phrases. These phrases are generated from the normalized text, exploiting Arabic morphology analysis and lexical lookup. Resolving possible morphological ambiguity is carried out through applying Part-of-Speech (PoS) tagging on the examined documents. Text indexing and stop-words removal are performed, employing a new method based on text morphological analysis. The examined documents' TF-IDF model is constructed using Heuristic based pair-wise matching algorithm, considering lexical and syntactic changes. Then, the hidden associations between the unique n-gram phrases and their documents are investigated using Latent Semantic Analysis (LSA). Next, the pairwise document subset and similarity measures are derived from the Singular Value Decomposition (SVD) computations. The performance of the proposed method was confirmed through experiments with various data sets, exhibiting promising capabilities in estimating literal and some types of intelligent similarities. Finally, the results of the proposed method was compared to that of Plagiarism-Checker-X, and the proposed method outperformed Plagiarism-Checker-X, especially for the intelligent similarity cases with syntactic changes.