scispace - formally typeset
Search or ask a question

Showing papers on "Plagiarism detection published in 2016"


Journal ArticleDOI
TL;DR: A systematic examination of Cross-language Knowledge Graph Analysis; an approach that represents text fragments using knowledge graphs as a language independent content model and presents a new weighting scheme for relations between concepts based on distributed representations of concepts.
Abstract: Study of the impact of the implicit aspects of knowledge graphs for cross-language plagiarism detection.We present a new weighting scheme for relations between concepts based on distributed representations of concepts.We obtain state-of-the-art performance compared to several state-of-the-art models. Cross-language plagiarism detection aims to detect plagiarised fragments of text among documents in different languages. In this paper, we perform a systematic examination of Cross-language Knowledge Graph Analysis; an approach that represents text fragments using knowledge graphs as a language independent content model. We analyse the contributions to cross-language plagiarism detection of the different aspects covered by knowledge graphs: word sense disambiguation, vocabulary expansion, and representation by similarities with a collection of concepts. In addition, we study both the relevance of concepts and their relations when detecting plagiarism. Finally, as a key component of the knowledge graph construction, we present a new weighting scheme of relations between concepts based on distributed representations of concepts. Experimental results in Spanish-English and German-English plagiarism detection show state-of-the-art performance and provide interesting insights on the use of knowledge graphs.

95 citations


Journal ArticleDOI
TL;DR: The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art, and an overview of some of the available detection software tools, their features and detection efficiency is discussed.
Abstract: The swift evolution of technology has facilitated the access of information through different means which has opened the doors to plagiarism. In today’s world of technological outburst, plagiarism is aggravating and has become a serious concern in academia, research and many other fields. To curb this intellectual theft and to ensure academic integrity, efficient software systems to detect them are in urgent need. In this paper, a study on plagiarism is done with the focus on extrinsic text plagiarism detection, which is a fast emerging research area in this domain. The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art. Further an overview of some of the available detection software tools, their features and detection efficiency is discussed with some of the output demos. The paper also throws light on the popular PAN competition, which is conducted yearly since 2009 in plagiarism domain and the major tasks involved in it. Further it attempts to identify the problems existing in available tools and the research gaps where immense explorations can be done.

62 citations


Journal ArticleDOI
TL;DR: Experimental results show that continuous representations allow the continuous word alignment-based similarity analysis model to obtain competitive results and the knowledge-based document similarity model to outperform the state-of-the-art in CL plagiarism detection.
Abstract: We study the combination of knowledge graph and continuous space representations for cross-language plagiarism detection.We also compare methods that only make use of continuous-space representations of text.We present the continuous word alignment-based similarity analysis, a model to estimate similarity between text fragments.We obtain state-of-the-art performance compared to several state-of-the-art models. Cross-language (CL) plagiarism detection aims at detecting plagiarised fragments of text among documents in different languages. The main research question of this work is on whether knowledge graph representations and continuous space representations can complement to each other and improve the state-of-the-art performance in CL plagiarism detection methods. In this sense, we propose and evaluate hybrid models to assess the semantic similarity of two segments of text in different languages. The proposed hybrid models combine knowledge graph representations with continuous space representations aiming at exploiting their complementarity in capturing different aspects of cross-lingual similarity. We also present the continuous word alignment-based similarity analysis, a new model to estimate similarity between text fragments. We compare the aforementioned approaches with several state-of-the-art models in the task of CL plagiarism detection and study their performance in detecting different length and obfuscation types of plagiarism cases. We conduct experiments over Spanish-English and German-English datasets. Experimental results show that continuous representations allow the continuous word alignment-based similarity analysis model to obtain competitive results and the knowledge-based document similarity model to outperform the state-of-the-art in CL plagiarism detection.

60 citations


Journal ArticleDOI
TL;DR: Turnitin® has been widely used for plagiarism detection in the academic domain this paper, and the usage of Turnitin has increased dramatically among university instructors, while academic criticism of this software has also increased.
Abstract: Recently, the usage of plagiarism detection software such as Turnitin® has increased dramatically among university instructors. At the same time, academic criticism of this software’s employment has also increased. We interviewed 23 faculty members from various departments at a medium-sized, public university in the southeastern US to determine their perspectives on Turnitin® and student plagiarism. We wanted to discern if there are important disciplinary differences in how instructors define and handle plagiarism; how instructors use Turnitin®; and if instructors’ thinking aligns with ethical and political concerns commonly expressed in the academic literature. Despite varying attitudes towards Turnitin®, those interviewed did not differ significantly in their views as to what student plagiarism is or its seriousness, and typical objections to ‘policing’ plagiarism and Turnitin® had little resonance with interviewees. The majority viewed a substantial amount of plagiarism they encountered as unintentiona...

49 citations


Proceedings ArticleDOI
12 Oct 2016
TL;DR: Based on evaluation, it can be concluded that the source code plagiarism detection approach is more effective to detect most plagiarism attack types than raw source code approach on introductory programming course.
Abstract: Even though there are various source code plagiarism detection approaches, most of them only concern with low-level plagiarism attack with an assumption that plagiarism is only conducted by students who are not proficient in programming. However, plagiarism is often conducted not only due to student incapability, but also because of bad time management. Thus, high-level plagiarism attack should be detected and evaluated. This paper proposes source code plagiarism detection approach which can detect most introductory-programming-course plagiarism attacks at any level by utilizing low-level instructions instead of source code tokens. Several mechanisms are also introduced to improve its effectiveness such as instruction generalization, instruction reinterpretation, method-based comparison, and method linearization. Since low-level instruction is a language-dependent feature, Java is selected as target programming language with bytecode as its low-level instruction. Based on evaluation, it can be concluded that our approach is more effective to detect most plagiarism attack types than raw source code approach on introductory programming course. This evaluation is based on plagiarism attack types that are collected through controlled experiment.

42 citations


Journal ArticleDOI
10 Oct 2016
TL;DR: Plagiarism was a common occurrence among manuscripts submitted for publication to a major American specialty medical journal and most manuscripts with plagiarized material were submitted from countries in which English was not an official language.
Abstract: Plagiarism is common and threatens the integrity of the scientific literature. However, its detection is time consuming and difficult, presenting challenges to editors and publishers who are entrusted with ensuring the integrity of published literature. In this study, the extent of plagiarism in manuscripts submitted to a major specialty medical journal was documented. We manually curated submitted manuscripts and deemed an article contained plagiarism if one sentence had 80 % of the words copied from another published paper. Commercial plagiarism detection software was utilized and its use was optimized. In 400 consecutively submitted manuscripts, 17 % of submissions contained unacceptable levels of plagiarized material with 82 % of plagiarized manuscripts submitted from countries where English was not an official language. Using the most commonly employed commercial plagiarism detection software, sensitivity and specificity were studied with regard to the generated plagiarism score. The cutoff score maximizing both sensitivity and specificity was 15 % (sensitivity 84.8 % and specificity 80.5 %). Plagiarism was a common occurrence among manuscripts submitted for publication to a major American specialty medical journal and most manuscripts with plagiarized material were submitted from countries in which English was not an official language. The use of commercial plagiarism detection software can be optimized by selecting a cutoff score that reflects desired sensitivity and specificity.

38 citations


Journal ArticleDOI
TL;DR: Regular usage of professional plagiarism detection tools for similarity checks with critical interpretation by the editorial team at the pre-review stage will certainly help in reducing the menace of plagiarism in submitted manuscripts.
Abstract: Plagiarism is one of the most serious forms of scientific misconduct prevalent today and is an important reason for significant proportion of rejection of manuscripts and retraction of published articles. It is time for the medical fraternity to unanimously adopt a 'zero tolerance' policy towards this menace. While responsibility for ensuring a plagiarism-free manuscript primarily lies with the authors, editors cannot absolve themselves of their accountability. The only way to write a plagiarism-free manuscript for an author is to write an article in his/her own words, literally and figuratively. This article discusses various types of plagiarism, reasons for increasingly reported instances of plagiarism, pros and cons of use of plagiarism detection tools for detecting plagiarism and role of authors and editors in preventing/avoiding plagiarism in a submitted manuscript. Regular usage of professional plagiarism detection tools for similarity checks with critical interpretation by the editorial team at the pre-review stage will certainly help in reducing the menace of plagiarism in submitted manuscripts.

37 citations


Journal ArticleDOI
TL;DR: This paper found that students shared strong agreement that near verbatim copy and paste and patchwriting should be considered plagiarism, but that they were much more conflicted regarding the reuse of ideas.
Abstract: Most research on student plagiarism defines the concept very narrowly or with much ambiguity. Many studies focus on plagiarism involving large swaths of text copied and pasted from unattributed sources, a type of plagiarism that the overwhelming majority of students seem to have little trouble identifying. Other studies rely on ambiguous definitions, assuming students understand what the term means and requesting that they self-report how well they understand the concept. This study attempts to avoid these problems by examining student perceptions of more complex citation issues. We presented 240 students with a series of examples, asked them to indicate whether or not each should be considered plagiarism, and followed up with a series of demographic and attitudinal questions. The examples fell within the spectrum of inadequate citation, patchwriting, and the reuse of other people’s ideas. Half were excerpted from publicized cases of academic plagiarism, and half were modified from other sources. Our findings indicated that students shared a very strong agreement that near verbatim copy and paste and patchwriting should be considered plagiarism, but that they were much more conflicted regarding the reuse of ideas. Additionally, this study found significant correlation between self-reported confidence in their understanding and the identification of more complex cases as plagiarism, but this study found little correlation between academic class status or exposure to plagiarism detection software and perceptions of plagiarism. The latter finding goes against a prevailing sentiment in the academic literature that the ability to recognize plagiarism is inherently linked to academic literacy. Overall, our findings indicate that more pedagogical emphasis may need to be placed on complex forms of plagiarism.

36 citations


Journal ArticleDOI
01 Oct 2016
TL;DR: This work compares content and citation‐based approaches for plagiarism detection with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection and concluded that a combination of the methods can be beneficial.
Abstract: The vast amount of scientific publications available online makes it easier for students and researchers to reuse text from other authors and makes it harder for checking the originality of a given text. Reusing text without crediting the original authors is considered plagiarism. A number of studies have reported the prevalence of plagiarism in academia. As a consequence, numerous institutions and researchers are dedicated to devising systems to automate the process of checking for plagiarism. This work focuses on the problem of detecting text reuse in scientific papers. The contributions of this paper are twofold: a we survey the existing approaches for plagiarism detection based on content, based on content and structure, and based on citations and references; and b we compare content and citation-based approaches with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection. We carry out experiments with real data sets of scientific papers and concluded that a combination of the methods can be beneficial.

35 citations


Journal ArticleDOI
TL;DR: LoPD, a deviation-based program equivalence checking approach, is proposed, which is an ideal fit for the whole-program plagiarism detection and evaluation results indicate that LoPD is effective in detecting whole- program plagiarism.
Abstract: Software plagiarism, an act of illegally copying others’ code, has become a serious concern for honest software companies and the open source community. Considerable research efforts have been dedicated to searching the evidence of software plagiarism. In this paper, we continue this line of research and propose LoPD, a deviation-based program equivalence checking approach, which is an ideal fit for the whole-program plagiarism detection. Instead of directly comparing the similarity between two programs, LoPD searches for any dissimilarity between two programs by finding an input that will cause these two programs to behave differently, either with different output states or with semantically different execution paths. As long as we can find one dissimilarity, the programs are semantically different; but if we cannot find any dissimilarity, it is more likely a plagiarism case. We leverage dynamic symbolic execution to capture the semantics of execution paths and to find path deviations. Compared to the existing detection approaches, LoPD's formal program semantics-based method is more resilient to automatic obfuscation schemes. Our evaluation results indicate that LoPD is effective in detecting whole-program plagiarism. Furthermore, we demonstrate that LoPD can be applied to partial software plagiarism detection as well. The encouraging experiment results show that LoPD is an appealing complement to existing software plagiarism detection approaches.

34 citations


01 Jan 2016
TL;DR: In this paper, a deep learning based method to detect plagiarism is proposed, words are represented as multi-dimensional vectors, and simple aggregation methods are used to combine the word vectors for sentence representation.
Abstract: Plagiarism detection is defined as automatic identification of reused text materials. General availability of the internet and easy access to textual information enhances the need for automated plagiarism detection. In this regard, different algorithms have been proposed to perform the task of plagiarism detection in text documents. Due to drawbacks and inefficiency of traditional methods and lack of proper algorithms for Persian plagiarism detection, in this paper, we propose a deep learning based method to detect plagiarism. In the proposed method, words are represented as multi-dimensional vectors, and simple aggregation methods are used to combine the word vectors for sentence representation. By comparing representations of source and suspicious sentences, pair sentences with the highest similarity are considered as the candidates for plagiarism. The decision on being plagiarism is performed using a two level evaluation method. Our method has been used in PAN2016 Persian plagiarism detection contest and results in %90.6 plagdet, %85.8 recall, and % 95.9 precision on the provided data sets.

Journal ArticleDOI
TL;DR: This paper examines candidate retrieval, where the goal is to find potential source documents of a suspicious text and proposes a topic-based text segmentation method to convert the suspicious document to a set of related passages.
Abstract: Proposing a candidate retrieval model for cross-lingual plagiarism detectionThe method relies on using two levels of proximity informationProposing a topic-based text segmentation methodComparing the method with other cross-lingual plagiarism detection approachesShowing improvements using text segmentation and positional language models The rapid growth of documents in different languages, the increased accessibility of electronic documents, and the availability of translation tools have caused cross-lingual plagiarism detection research area to receive increasing attention in recent years. The task of cross-language plagiarism detection entails two main steps: candidate retrieval and assessing pairwise document similarity. In this paper we examine candidate retrieval, where the goal is to find potential source documents of a suspicious text. Our proposed method for cross-language plagiarism detection is a keyword-focused approach. Since plagiarism usually happens in parts of the text, there is a requirement to segment the texts into fragments to detect local similarity. Therefore we propose a topic-based segmentation algorithm to convert the suspicious document to a set of related passages. After that, we use a proximity-based model to retrieve documents with the best matching passages. Experiments show promising results for this important phase of cross-language plagiarism detection.

Journal ArticleDOI
TL;DR: DOCODE 3.0 is presented, a Web system for educational institutions that performs automatic analysis of large quantities of digital documents in relation to their degree of originality, and produces a number of visualizations and reports to let teachers and professors gain insights on the originality of the documents they review.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: Various techniques and algorithms to discover plagiarism in source code using these techniques will be explained and differentiated among these given techniques to discover how one technique is conflicting with the other.
Abstract: Plagiarism is becoming a serious problem for intellectual community. The detection of plagiarism at various levels is a major issue. The complexity of the problem increases when we are finding the plagiarism in the source codes that may be in the same language or they have been transformed into other languages. This type of plagiarism is found not only in the academic works but also in the industries dealing with software designing. The major issue with the source code plagiarism is that different programming languages may have different syntax. In this paper the authors will explain various techniques and algorithms to discover the plagiarism in source code. So organization or academic institution can simply discover plagiarism in source code using these techniques. The authors will differentiate among these given techniques of plagiarism to discover how one technique is conflicting with the other.

Proceedings ArticleDOI
01 May 2016
TL;DR: This work is a review of important research papers in the field of source-code plagiarism detection in academia and tries to answer some of the mentioned research questions and give indication to future work.
Abstract: Plagiarism is a big concern in academia and it can be a problem in every course. Plagiarism occurs when someone present others work as their own. Students plagiarize in different areas: homework assignments, essays, projects, etc. In this work focus is on programming courses and plagiarism in programming assignments. While source-code plagiarism detection, is in some way very similar to text plagiarism detection, it is very different in other ways. So, a lot of research is done focusing on source-code plagiarism. Some questions that are researched in this field are: what is considered plagiarism in programming assignments, how to perform plagiarism detection in programming assignments, how to do it automatically, what tool(s) to use, how students cheat in programming courses, how they try to obfuscate cheating, and many other questions. This work is a review of important research papers in the field of source-code plagiarism detection in academia. This paper tries to answer some of the mentioned research questions and give indication to future work.

Journal ArticleDOI
TL;DR: In programming courses there are various ways in which students attempt to cheat, the most commonly used method is copying source code from other studs.
Abstract: In programming courses there are various ways in which students attempt to cheat. The most commonly used method is copying source code from other students and making minimal changes in it, like renaming variable names. Several tools like Sherlock, JPlag and Moss have been devised to detect source code plagiarism. However, for larger student assignments and projects that involve a lot of source code files these tools are not so effective. Also, issues may occur when source code is given to students in class so they can copy it. In such cases these tools do not provide satisfying results and reports. In this study, we present an improved process model for plagiarism detection when multiple student files exist and allowed source code is present. In the research in this paper we use the Sherlock detection tool, although the presented process model can be combined with any plagiarism detection engine. The proposed model is tested on assignments in three courses in two subsequent academic years.


Book ChapterDOI
07 Dec 2016
TL;DR: The Persian PlagDet shared task at PAN 2016 as mentioned in this paper was an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus.
Abstract: The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora.

01 Jan 2016
TL;DR: A plagiarism detection method based on constructing an author style function from features of text sentences and detecting outliers and adapted the method for the diarization problem by segmenting author style statistics on text parts, which correspond to different authors.
Abstract: The paper investigates methods for intrinsic plagiarism detection and author diarization. We developed a plagiarism detection method based on constructing an author style function from features of text sentences and detecting outliers. We adapted the method for the diarization problem by segmenting author style statistics on text parts, which correspond to different authors. Both methods were tested on the PAN-2011 collection for the intrinsic plagiarism detection and implemented for the PAN-2016 competition on author diarization.

Proceedings ArticleDOI
11 May 2016
TL;DR: Two techniques for plagiarism detection and prevention are presented, based on the allocation of a unique assignment for each student, and the use of individual presentation of coursework findings, which are effective at reducing plagiarism and improving students' understanding.
Abstract: Plagiarism seriously damages the education process in a number of ways, where it prevents students from developing the skills of creative thinking and critical analysis and it undermines the trust between lecturers and students. Furthermore, if plagiarism is undetected, it can impact the reputation of the academic institution and devalue its degrees. In this paper, we present two techniques for plagiarism detection and prevention. The first method is based on the allocation of a unique assignment for each student, while the second approach is based on the use of individual presentation of coursework findings. These techniques are applied to three courses at the Master level in the University of Southampton, where we show that they are effective at reducing plagiarism and improving students' understanding.

Proceedings ArticleDOI
07 May 2016
TL;DR: A new approach to detect code re-use that increases the prediction accuracy by dynamically removing parts in assignments which are part of almost every assignment--the so called common ground is proposed.
Abstract: Plagiarism in online learning environments has a detrimental effect on the trust of online courses and their viability. Automatic plagiarism detection systems do exist yet the specific situation in online courses restricts their use. To allow for easy automated grading, online assignments usually are less open and instead require students to fill in small gaps. Therefore solutions tend to be very similar, yet are then not necessarily plagiarized. In this paper we propose a new approach to detect code re-use that increases the prediction accuracy by dynamically removing parts in assignments which are part of almost every assignment--the so called common ground. Our approach shows significantly better F-measure and Cohen's Kappa results than other state of the art algorithms such as Moss or JPlag. The proposed method is also language agnostic to the point that training and test data sets can be taken from different programming languages.

Proceedings ArticleDOI
13 Sep 2016
TL;DR: A novel approach for assessing cross-language similarity between texts for detecting plagiarized cases that has two main steps: a vector-based retrieval framework that focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment.
Abstract: The Web offers fast and easy access to a wide range of documents in various languages, and translation and editing tools provide the means to create derivative documents fairly easily. This leads to the need to develop effective tools for detecting cross-language plagiarism. Given a suspicious document, cross-language plagiarism detection comprises two main subtasks: retrieving documents that are candidate sources for that document and analyzing those candidates one by one to determine their similarity to the suspicious document. In this paper we focus on the second subtask and introduce a novel approach for assessing cross-language similarity between texts for detecting plagiarized cases. Our proposed approach has two main steps: a vector-based retrieval framework that focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment. Experiments show that our method outperforms the methods of the best results in PAN-2012 and PAN-2014 in terms of plagdet score. We also show that aligning n-gram units, instead of aligning complete sentences, improves the accuracy of detecting plagiarism.

Journal ArticleDOI
01 Dec 2016
TL;DR: A dynamic technique to detect plagiarized apps that works by observing the interaction of an app with the underlying mobile platform via its API invocations is proposed, and a robust plagiarism detection tool using API birthmarks is developed.
Abstract: This paper addresses the problem of detecting plagiarized mobile apps. Plagiarism is the practice of building mobile apps by reusing code from other apps without the consent of the corresponding app developers. Recent studies on third-party app markets have suggested that plagiarized apps are an important vehicle for malware delivery on mobile phones. Malware authors repackage official versions of apps with malicious functionality, and distribute them for free via these third-party app markets. An effective technique to detect app plagiarism can therefore help identify malicious apps. Code plagiarism has long been a problem and a number of code similarity detectors have been developed over the years to detect plagiarism. In this paper we show that obfuscation techniques can be used to easily defeat similarity detectors that rely solely on statically scanning the code of an app. We propose a dynamic technique to detect plagiarized apps that works by observing the interaction of an app with the underlying mobile platform via its API invocations. We propose API birthmarks to characterize unique app behaviors, and develop a robust plagiarism detection tool using API birthmarks.

Proceedings ArticleDOI
25 Apr 2016
TL;DR: The MOSS Tool for Addressing Plagiarism at Scale (MOSS-TAPS), organizes the MOSS submission task in courses that repeat coding assignments and reduces instructor time spent from 50 hours to only 10 minutes using the managed submission tool design presented here.
Abstract: Cheating in computer science classes can damage the reputation of institutions and their students. It is therefore essential to routinely authenticate student submissions with available software plagiarism detection algorithms such as Measure of Software Similarity (MOSS). Scaling this task for large classes where assignments are repeated each semester adds complexity and increases the instructor workload. The MOSS Tool for Addressing Plagiarism at Scale (MOSS-TAPS), organizes the MOSS submission task in courses that repeat coding assignments. In a recent use-case in the Online Master of Science in Computer Science (OMSCS) program at the Georgia Institute of Technology, the instructor time spent was reduced from 50 hours to only 10 minutes using the managed submission tool design presented here. MOSS-TAPS provides persistent configuration, supports a mixture of software languages and file organizations, and is implemented in pure Java for cross-platform compatibility.

Journal ArticleDOI
TL;DR: This research provides an effective way to detect semantic plagiarism for the written researches, especially by students who have a large plagiarism in their research.
Abstract: The simplest description of a plagiarism is either a 'copy and paste' for a text even if the source was cited or a change in some words by taking the meaning without citing the source, where determining the meaning is the hardest and most complex task. Plagiarism can be seen as one of the cybercrime, similar to (computer viruses, computer hacking, spamming and the violation of copyrights), therefore, this subject has been interesting because it has become an important part of the ethics of scientific research. The increasing incidence of plagiarism in the higher education sector, which is considered acceptable behavior by some, since plagiarism saves time and effort, and gives better results, became a big problem faced by educational institutions. The main objective of this research is to find a suitable way to detect semantic plagiarism which occurs on the meaning and making use of synonyms and replace it instead of the original words. This research aims also to apply a pre-processing for the words of research by using tokenization and stop word removing processes, then tested whether the research enter under the specialization of computer science or not, where only such research will subject to semantic plagiarism detection by using WordNet. This research provides an effective way to detect semantic plagiarism for the written researches, especially by students who have a large plagiarism in their research.

Journal ArticleDOI
TL;DR: This paper redesigns birthmark based software plagiarism detection algorithms to make such approach effective for multithreaded programs and shows that the new birthmarks are superior to existing birthmarks and are resilient against most state of theart obfuscation techniques.

Proceedings ArticleDOI
04 Jul 2016
TL;DR: The differences between various systems to detect code similarities with the aim of identifying cases that may have been plagiarised are explored as well as how their performance compares with manual checking.
Abstract: Plagiarism is an issue that all educators have had to deal with. Large numbers of students and assignments have resulted in the development of automated systems to detect code similarities with the aim of identifying cases that may have been plagiarised. These systems are of great value to assessors, allowing them to process submissions automatically. However, these automated systems do present possible disadvantages and drawbacks. In this study we explore and analyse the differences between various systems as well as how their performance compares with manual checking. We consider the different methods students use when committing plagiarism. Then we examine more closely the systems that can aid plagiarism detection, ranging from their characteristics to how they work. In the process, we determine how these systems compare with our own system and their suitability for aiding the identification of submissions which may have been plagiarised in our introductory C++ course.

Proceedings ArticleDOI
13 Jul 2016
TL;DR: The proposed content-based method is based on modeling the relationship between documents and their n-gram phrases, which are generated from the normalized text, exploiting morphology analysis and lexical lookup, and emphasizes Arabic documents similarity analysis and visualization.
Abstract: As the number of information resources and document quantity explodes, efficient tools with intuitive visualization capabilities desperately needed to assist users in conducting document similarity analysis and/or plagiarism detection tasks by discovering hidden relations among documents. This paper proposes a content-based method for document similarity analysis and visualization. The proposed method is based on modeling the relationship between documents and their n-gram phrases, which are generated from the normalized text, exploiting morphology analysis and lexical lookup. Resolving possible morphological ambiguities is carried out by tagging the words within the examined documents. Text indexing and stop-words removal are performed, employing a new technique that is efficient in dealing with multiple long documents. The examined documents' TF-IDF model is constructed using heuristic based pair-wise matching algorithm, considering lexical and syntactic changes. Then, the hidden associations between the documents and their unique n-gram phrases are investigated using Latent Semantic Analysis (LSA). Next, the pairwise document subset and similarity measures are derived from the Singular Value Decomposition (SVD) computations. Different visualization techniques are then applied on the SVD results to expose the hidden relations among the documents under consideration. As Arabic is one of the most morphological and complicated languages, this paper emphasizes Arabic documents similarity analysis and visualization. Various experiments were carried out revealing the strong capabilities of the proposed method in analyzing and visualizing literal and some types of intelligent similarities.

Journal ArticleDOI
TL;DR: It is suggested that students are educated on plagiarism to enhance their awareness of what it is and how to avoid it and all students should be monitored across PBL groups for detection of plagiarism.
Abstract: Students’ academic misconduct has been an issue in medical education and is more likely with development of technology (1, 2). We investigated the occurrence of plagiarism by medical students in a problem-based learning (PBL) course. The participants were a cohort of Year 1 students in the 4-year medical program (n=53) at Dongguk University Medical School in South Korea. Of these students, 38.5% were female and 61.5% were male. Of these, 60% were graduate-entry students and 40% were undergraduate-entry students. Student ages ranged from 19 to 33 years (M=24.13, SD=3.19). The students turned in papers after self-study of topics on the PBL module. The plagiarism detection program offered by the university was used for the investigation. Thirty-three students (62%) plagiarized, mainly copying and pasting websites found using Google, a Korean search engine, or the one offered by a Korean medical center. As a result of such extensive use of limited resources and searching the Internet using similar keywords, contents of the papers were very similar. In another assignment, students wrote their reflections on ethical issues raised in the module. Seventeen students (32%) plagiarized papers written by their peers; some of them copied and pasted others’ work and in some cases they used ideas from what their peers had written. In addition, we conducted one-on-one interviews with all of the students who were found to have plagiarized to investigate their patterns and perceptions of plagiarism. We found that most of the students were not aware that copying information from the website without proper citation of sources was considered plagiarism, and they were not aware that copying reports of their peers was a serious problem. In addition, most of the students copied papers written by their peers who were neither in their social network nor in the same PBL group. Our study indicates that plagiarism in PBL is as prevalent as in other conventional courses and that its occurrence differs according to the type of assignment. In addition, all students should be monitored across PBL groups for detection of plagiarism because they likely copy papers written by peers in other PBL groups or those outside of their social networks. In conclusion, it is suggested that students should be educated on plagiarism to enhance their awareness of what it is and how to avoid it. A variety of educational interventions may be available to teach about plagiarism to medical students – from conventional lectures to online tutorials. In addition, students need to be offered various learning resources for their self-study in order to prevent student plagiarism. Offering diverse, quality learning resources is fundamental to fostering an effective learning environment for PBL (3), and this can also encourage students to use diverse resources instead of merely copying and pasting content from simple Internet search in writing up papers.

Journal ArticleDOI
TL;DR: A monolingual plagiarism detection technique has been developed to tackle cases of paraphrased plagiarism and the performance of the system has been evaluated on various corpora, and the passage level approach has registered promising results.
Abstract: Abstract Plagiarism in free text has become a common occurrence due to the wide availability of voluminous information resources. Automatic plagiarism detection systems aim to identify plagiarized content present in large repositories. This task is rendered difficult by the use of sophisticated plagiarism techniques such as paraphrasing and summarization, which mask the occurrence of plagiarism. In this work, a monolingual plagiarism detection technique has been developed to tackle cases of paraphrased plagiarism. A support vector machine based paraphrase recognition system, which works by extracting lexical, syntactic, and semantic features from input text has been used. Both sentence-level and passage-level approaches have been investigated. The performance of the system has been evaluated on various corpora, and the passage level approach has registered promising results.