scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
01 Jun 2020
TL;DR: While more research is necessary to further investigate the reliability of the best performing software packages, stylometry software appears to show significant promise for the potential detection of contract cheating.
Abstract: Contract cheating, instances in which a student enlists someone other than themselves to produce coursework, has been identified as a growing problem within academic integrity literature and in news headlines. The percentage of students who have utilized this type of cheating has been reported to range between 6% and 15.7%. Generational sentiments about cheating and the prevalent accessibility of contract cheating providers online seems to only have exacerbated the issue. The problem is that there is currently no simple means identified and verified to detect contract cheating, as available plagiarism detection software has been shown to be ineffective in these cases. One method that is commonly used for authorship authentication in nonacademic settings, stylometry, has been suggested as a potential means for detection. Stylometry uses various attributes of documents to determine if they were written by the same individual. This pilot study sought to assess the utility of three easy to use and readily available stylometry software systems to detect simulated cases of contract cheating on academic documents. Average accuracy ranged from 33% to 88.9%. While more research is necessary to further investigate the reliability of the best performing software packages, stylometry software appears to show significant promise for the potential detection of contract cheating.

17 citations

Proceedings ArticleDOI
10 Jul 2006
TL;DR: Assessment in flexible delivery and how plagiarism can be detected is discussed and a method for testing the identity of a student (or more generally, author) online, without any interference with the examination process is presented.
Abstract: While many institutions of higher education offer courses via distance education, there is one aspect which is difficult to realise by use of the Internet only: assessment. If exams are performed online, how can the course provider guarantee that the student participating in the exam is the person enrolled? Without any Internet-based form of authenticating the student's identity, flexible delivery can break down at this point. As a consequence, traditional identity checks are introduced such as requiring the student to be physically present and to take the exam at a local institution, or requiring the student to sign documents that certify his/her identity. This paper discusses assessment in flexible delivery and how plagiarism can be detected. It presents a method for testing the identity of a student (or more generally, author) online, without any interference with the examination process. Recent advances in computational text analysis allow authorship identification with high reliability. That is, the original author of a document submitted for assessment can be determined successfully with an accuracy and precision of well above 90 percent. The computational methods include machine learning techniques such as "support vector machines", which are highly successful in text classification and a range of other practical applications.

17 citations

Journal ArticleDOI
TL;DR: The results show that the proposed candidate retrieval model outperforms the state-of-the-art models and can be considered as a proper choice to be embedded in cross-language plagiarism detection systems.
Abstract: Due to the rapid growth of documents and manuscripts in various languages all over the world, plagiarism detection has become a challenging task, especially for cross lingual cases. Because of this issue, in today's plagiarism detection systems, a candidate retrieval process is developed as the first step, in order to reduce the set of documents for comparison to a reasonable number. The performance of the second step of plagiarism detection, which is devoted to a detailed analysis of the candidates is tightly dependent on the candidate retrieval phase. Regarding its high importance, the present study focuses on the candidate retrieval task and aims to extract the minimal set of highly potential source documents, accurately. The paper proposes a fusion of concept-based and keyword-based retrieval models for this purpose. A dynamic interpolation factor is used in the proposed scheme in order to combine the results of conceptual and bag-of-words models. The effectiveness of the proposed model for cross language candidate retrieval is also compared with state-of-the-art models over German-English and Spanish-English language partitions. The results show that the proposed candidate retrieval model outperforms the state-of-the-art models and can be considered as a proper choice to be embedded in cross-language plagiarism detection systems.

17 citations

Journal ArticleDOI
TL;DR: Turnitin is software that identifies the matched material by checking the electronically submitted documents against its database of academic publications, internet, and previously submitted documents, which does not mean plagiarism.
Abstract: The institutional integrity constitutes the bases of scientific activity. The frequent incidences of similarity, plagiarism, and retraction cases created the space for frequent use of similarity and plagiarism detecting tools. Turnitin is software that identifies the matched material by checking the electronically submitted documents against its database of academic publications, internet, and previously submitted documents. Turnitin provides a “similarity index,” which does not mean plagiarism. The prevalence of plagiarism could not reduce tremendously in the presence of many paid and un-paid plagiarism detecting tools because of the assortment of reasons such as poor research and citation skills, language problems, underdeveloped academic skills, etc., This paper may provide an adequate feedback to the students, researchers, and faculty members in understanding the difference between similarity index and plagiarism.

17 citations

01 Jan 2008
TL;DR: This paper presents a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model, and shows that fuzzyset IR successfully detected not only exact but also similar statements that have different structure.
Abstract: The nature of Arabic language structure exposes the need for fuzzy or vague concept to reveal dishonest practices in Arabic documents. In this paper, we present a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model. The degree of similarity is calculated and compared to a threshold value to judge whether two statements are the same or different. Our corpus collection has been built in which all stopwords were removed and non-stop words were stemmed for typical Arabic IR. The corpora have 100 documents with 4367 statements in total. Five query documents with about 250 plagiarized statements were constructed and tested. Experimental results show that fuzzyset IR successfully detected not only exact but also similar statements that have different structure. However, our Arabic fuzzy-set model approach does not handle the case of rewording with different synonyms/antonyms, a deficiency that will lead to future work of modeling the system using Arabic thesaurus. Keywordsfuzzy-set information retrieval; Arabic; plagiarism detection;

17 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125