Showing papers on "Plagiarism detection published in 2023"

PDF

Open Access

Posted Content•DOI•

Will ChatGPT get you caught? Rethinking of Plagiarism Detection

[...]

08 Feb 2023

TL;DR: In this paper , the authors explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT, and compare the results with two popular plagiarism detection tools.

...read moreread less

Abstract: The rise of Artificial Intelligence (AI) technology and its impact on education has been a topic of growing concern in recent years. The new generation AI systems such as chatbots have become more accessible on the Internet and stronger in terms of capabilities. The use of chatbots, particularly ChatGPT, for generating academic essays at schools and colleges has sparked fears among scholars. This study aims to explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT. To this end, two popular plagiarism detection tools were used to evaluate the originality of 50 essays generated by ChatGPT on various topics. Our results manifest that ChatGPT has a great po-tential to generate sophisticated text outputs without being well caught by the plagiarism check software. In other words, ChatGPT can create content on many topics with high originality as if they were written by someone. These findings align with the recent concerns about students using chatbots for an easy shortcut to success with minimal or no effort. Moreover, ChatGPT was asked to verify if the essays were generated by itself, as an additional measure of plagiarism check, and it showed superior performance compared to the tradi-tional plagiarism-detection tools. The paper discusses the need for institutions to consider appropriate measures to mitigate potential plagiarism issues and advise on the ongoing debate surrounding the impact of AI technology on education. Further implications are discussed in the paper.

...read moreread less

7 citations

Journal Article•DOI•

Plagiarism software now able to detect students using ChatGPT

[...]

Manas Dave

01 May 2023-British Dental Journal

3 citations

Posted Content•DOI•

Will ChatGPT get you caught? Rethinking of Plagiarism Detection

[...]

08 Feb 2023

TL;DR: In this paper , the authors explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT, and compare the results with traditional plagiarism detection tools.

...read moreread less

Abstract: The rise of Artificial Intelligence (AI) technology and its impact on education has been a topic of growing concern in recent years. The new generation AI systems such as chatbots have become more accessible on the Internet and stronger in terms of capabilities. The use of chatbots, particularly ChatGPT, for generating academic essays at schools and colleges has sparked fears among scholars. This study aims to explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT. To this end, two popular plagiarism detection tools were used to evaluate the originality of 50 essays generated by ChatGPT on various topics. Our results manifest that ChatGPT has a great potential to generate sophisticated text outputs without being well caught by the plagiarism check software. In other words, ChatGPT can create content on many topics with high originality as if they were written by someone. These findings align with the recent concerns about students using chatbots for an easy shortcut to success with minimal or no effort. Moreover, ChatGPT was asked to verify if the essays were generated by itself, as an additional measure of plagiarism check, and it showed superior performance compared to the traditional plagiarism-detection tools. The paper discusses the need for institutions to consider appropriate measures to mitigate potential plagiarism issues and advise on the ongoing debate surrounding the impact of AI technology on education. Further implications are discussed in the paper.

...read moreread less

3 citations

Posted Content•DOI•

Identifying Machine-Paraphrased Plagiarism

[...]

01 Feb 2023

TL;DR: This paper used pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models to detect machine-paraphrased text.

...read moreread less

Abstract: Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers , graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99% (F1=99.68% for SpinBot and F1=71.64% for Spinner-Chief cases), while human evaluators achieved F1=78.4% for SpinBot and F1=65.6% for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems , such as Turnitin and PlagScan. To facilitate future research, all data 3 , code 4 , and two web applications 56 showcasing our contributions are openly available.

...read moreread less

1 citations

Journal Article•DOI•

Academic Integrity: Preventing Students’ Plagiarism with TURNITIN

[...]

Ismail Ismail, Umiyati Jabri

22 Feb 2023-Edumaspul : jurnal pendidikan

TL;DR: Turnitin's plagiarism detecting software was used to detect plagiarism in student papers as discussed by the authors , and the results showed that the average level of plagiarism among students decreased by 18.81%. In terms of students' perceptions of using Turnitin as a standard way of submitting their final assignments and to get feedback, the overall student reaction to the system used was positive.

...read moreread less

Abstract: The problem of plagiarism in writing scientific articles is intellectual dishonesty and received a lot of attention during this research, and relatively few students understand about plagiarism. A total of 16 respondents from final year students of the English education department involved in this research. The single group pretest-posttest comparative method was used to assess an action to determine the performance gap between the two time periods, before and after the intervention. Student papers were first submitted to measure their level of plagiarism without their knowledge. The results showed that Muhammadiyah Enrekang students practiced plagiarism on average 50.88%. Subsequently, the students were introduced by Turnitin's plagiarism detecting software and advised to examine their writing using software. The learning model intervention is then carried out with development training. The results showed that the average level of plagiarism among students decreased by 18.81%. In terms of students' perceptions of using Turnitin as a standard way of submitting their final assignments and to get feedback, the overall student reaction to the system used was positive. To avoid plagiarism, a more systematic approach should be taken by the University towards the problem of academic dishonesty, and in particular by students for the specific reasons why they practice plagiarism.

...read moreread less

1 citations

Journal Article•DOI•

Reporting less coincidental similarity to educate students about programming plagiarism and collusion

[...]

Oscar Karnalim, Simon, William J. Chivers

21 Feb 2023-Computer Science Education

1 citations

Journal Article•DOI•

AssignmentWatch: An automated detection and alert tool for reducing academic misconduct associated with file-sharing websites

[...]

Rick Somers, Sam Cunningham, Sarah Dart, Sheona Thomson, Caslon Chua, Edmund Ian Marcus Pickering - Show less +2 more

01 Jan 2023-IEEE Transactions on Learning Technologies

TL;DR: AssignmentWatch as mentioned in this paper is a software tool that supports academic integrity by actively monitoring for the upload of assessment content online, with a focus on file-sharing websites and reducing the time burden on educators by automating the detection process.

...read moreread less

Abstract: Academic misconduct stemming from file-sharing websites is an increasingly prevalent challenge in tertiary education, including information technology and engineering disciplines. Current plagiarism detection methods (e.g. text matching) are largely ineffective for combatting misconduct in programming and mathematics based assessments. For these reasons, the development of an effective, automated monitoring tool would provide significant benefit in the struggle against misconduct. To address this challenge, this paper reports an innovative software tool named AssignmentWatch which supports academic integrity by actively monitoring for the upload of assessment content online, with a focus on file-sharing websites. This monitoring and alert notification system reduces the time burden on educators by automating the detection process. The design of AssignmentWatch focuses on early detection which enables educators to take early action. This could include preventative education, or removal of content before it is utilized by students. Through this, AssignmentWatch can reduce incidents of misconduct. The software tool is open source and made freely available to the community. AssignmentWatch was validated under controlled conditions, followed by field testing in 30 assignments, across 16 subjects and 5 higher education institutions. AssignmentWatch was found to effectively detect uploads on a variety of websites, generally within 24 hours. In field testing, AssignmentWatch received positive feedback from educators with 8 out of 10 educators stating AssignmentWatch aided academic integrity and 7 out of 10 users stating AssignmentWatch helped identify assignment content online.

...read moreread less

1 citations

Proceedings Article•DOI•

[...]

06 Jan 2023

TL;DR: In this paper , a plagiarism detection strategy was proposed to detect code plagiarism in SQL queries by semantically evaluating raw student query submissions from SQL courses which are offered every semester.

...read moreread less

Abstract: The Structured Query Language is the de facto language for defining, and manipulating data in a relational database. Thus, its mastery is important for students in computer science related discipline. Ergo, most universities offer more different courses that enable students to acquire SQL skill. However, this objective is plagued by code plagiarism, a major problem affecting the academic community. While plagiarism detection in other languages are detectable, detecting copied code in SQL is a difficult task to solve as most of the queries are relatively same, which makes plagiarism detection strategies ineffective when the objects are SQL queries. Research efforts in natural language processing has seen the development of several strategies that has facilitated complex evaluation of text strings. In this endavour, we liverage semantic similarity, a method that enables the evaluation of the semantic textual similarity between text strings, and the idea of distance between words, and the likelyness of their meaning to detect plagiarised SQL queries by semantically evaluating raw student query submissions from our SQL courses which are offered every semester. Result show that the semantic similarity strategy was able to detect code similarity, which translated to plagiarism in a considerable umber of submissions. In all, we describe in this paper, our plagiarism detection strategy, the limitations of our strategy, possible means that may be effective at addressing these limitations.

...read moreread less

1 citations

Journal Article•DOI•

Winnowing Algorithm: A Powerful Tool for Identifying Plagiarism in Assignments

[...]

01 Jun 2023-Journal of Trends in Computer Science and Smart Technology

TL;DR: In this article , a plagiarism detection system using Rolling Hash function has been proposed to detect plagiarism on student assignments. And the similarity value is calculated using Jaccard coefficient and the test results show the combinations of parameters (n-gram, window length and the base prime number) for the successful implementation of the system.

...read moreread less

Abstract: Plagiarism refers to using other ideas or works as their own without giving proper acknowledgment. The act of plagiarism is inappropriate and untrue for many reasons, especially in the academic world. Academicians are aware of this and try to avoid the act of plagiarism by any means necessary. In the present context, the digital way of teaching and learning is in practice which has more chance of plagiarized content. This research provides plagiarism detection features due to the lack of such features in digital-based teaching-learning activities. This proposed system handles the document in text format and uses Winnowing Algorithm for fingerprinting the assignment documents, and the hashing technique chosen for this algorithm is the Rolling Hash function. The similarity value is calculated using Jaccard coefficient. The test results show the combinations of parameters (n-gram, window length, and the base prime number) for the successful implementation of the system. The system successfully detects plagiarism on student assignments. The overall system is developed by using Python Web Framework Django and MySQL as a database.

...read moreread less

1 citations

Journal Article•DOI•

Testing of support tools to detect plagiarism in academic Japanese texts

[...]

Tolga Özşen, İrem Saka, Özgür İlhan Çelik, Salim Razi, Senem Çente Akkan, Dita Dlabolová - Show less +2 more

24 Mar 2023-Education and Information Technologies

1 citations

Journal Article•DOI•

Towards Reliable Code Plagiarism Detection: A Survey on Software Clone Detection

[...]

17 Feb 2023

Posted Content•DOI•

TEIMMA: The First Content Reuse Annotator for Text, Images, and Math

[...]

22 May 2023

TL;DR: TEIMMA as discussed by the authors annotates the reuse of text, images, and mathematical formulae in a document pair, which is particularly useful to develop plagiarism detection algorithms. But real-world content reuse is often obfuscated, which makes it challenging to identify such cases.

...read moreread less

Abstract: This demo paper presents the first tool to annotate the reuse of text, images, and mathematical formulae in a document pair -- TEIMMA. Annotating content reuse is particularly useful to develop plagiarism detection algorithms. Real-world content reuse is often obfuscated, which makes it challenging to identify such cases. TEIMMA allows entering the obfuscation type to enable novel classifications for confirmed cases of plagiarism. It enables recording different reuse types for text, images, and mathematical formulae in HTML and supports users by visualizing the content reuse in a document pair using similarity detection methods for text and math.

...read moreread less

Journal Article•DOI•

Scholarly Communication and Machine-Generated Text: Is it Finally AI vs AI in Plagiarism Detection?

[...]

Patit Paban Santra

01 Jul 2023-SRELS Journal of Information Management

TL;DR: The authors used GPT (Generative Pre-Trained Transformer) language model-based AI writing tools to create a set of 80 academic writing samples based on the eight themes of the experiential sessions of the LTC 2023.

...read moreread less

Abstract: This study utilizes GPT (Generative Pre-Trained Transformer) language model-based AI writing tools to create a set of 80 academic writing samples based on the eight themes of the experiential sessions of the LTC 2023. These samples, each between 2000 and 2500 words long, are then analyzed using both conventional plagiarism detection tools and selected AI detection tools. The study finds that traditional syntactic similarity-based anti-plagiarism tools struggle to detect AI-generated text due to the differences in syntax and structure between machine-generated and human-written text. However, the researchers discovered that AI detector tools can be used to catch AI-generated content based on specific characteristics that are typical of machine-generated text. The paper concludes by posing the question of whether we are entering an era in which AI detectors will be used to prevent AI-generated content from entering the scholarly communication process. This research sheds light on the challenges associated with AI-generated content in the academic research literature and offers a potential solution for detecting and preventing plagiarism in this context.

...read moreread less

Journal Article•DOI•

Comparison study of unsupervised paraphrase detection: Deep learning—The key for semantic similarity detection

[...]

Tedo Vrbanec, Ana Meštrović

22 Jun 2023-Expert Systems

TL;DR: In this article , a comparative study identified the most efficient methods for unsupervised paraphrased document detection using similarity measures alone or combined with deep learning (DL) models and proved the hypothesis that some DL models are more successful than the best statistically-based methods in that task.

...read moreread less

Abstract: Automatic detection of concealed plagiarism in the form of paraphrases is a difficult task, and finding a successful unsupervised approach for paraphrase detection is necessary as a precondition to change that. This comparative study identified the most efficient methods for unsupervised paraphrased document detection using similarity measures alone or combined with Deep Learning (DL) models. It proved the hypothesis that some DL models are more successful than the best statistically-based methods in that task. Many experiments were carried out, and their results were compared. The text similarities between documents are obtained from 60 different methods using five paraphrase corpora, including the new one made by authors, as an important original contribution. Some DL models achieved significantly better results than those obtained by the best statistical methods, especially pre-trained transformer-based language models with average values of Accuracy and F1 of 85.8% and 88.3%, respectively, with top values of 99.9% and 98.4% for Accuracy and F1 on some corpora. These results are even better than those of supervised and combined approaches. Therefore, here presented results prove that detecting concealed plagiarism becomes an attainable goal. This study highlighted those language models with the best overall results for paraphrase detection as best suited for further research. The study also discussed the choice of similarity/distance measure paired with embeddings produced by DL models and some advantages of using cosine similarity as the fastest measure. For 60 different methods, complexity has been defined in O notation. Times needed for their implementation have also been presented. The article's results and conclusions are a firm base for future semantic similarity, paraphrasing, and plagiarism detection studies, clearly marking state-of-the-art tools and methods.

...read moreread less

Book Chapter•DOI•

Paraphrase Detection in Indian Languages Using Deep Learning

[...]

Nevim Aygun

01 Jan 2023

Journal Article•DOI•

The plagiarism pandemic: Inspection of academic dishonesty during the COVID-19 outbreak using originality software

[...]

Yovav Eshet

21 Jun 2023-Education and Information Technologies

Journal Article•DOI•

Automatic paraphrasing tools: an unexpected consequence of addressing student plagiarism and the impact of COVID in distance education settings

[...]

Rubén Comas Forgas, Thomas Lancaster, Elvira Curiel Marín, Carmen Touza Garma

01 Jan 2023-Práxis Educativa

TL;DR: In this paper , the authors used search engine analytics with data from SEMrush and Google Trends to estimate the level of interest in online paraphrasing tools, focusing on the period 2016 to 2020 and the four countries: the USA, UK, Canada and Australia.

...read moreread less

Abstract: Text matching tools employed to detect plagiarism are widely used in universities, but their availability may have pushed students to find ways to evade detection. One such method is the use of automatic paraphrasing software, where assignments can be rewritten with little effort required by students. This paper uses the search engine analytics methodology with data from SEMrush and Google Trends to estimate the level of interest in online automatic paraphrasing tools, focusing on the period 2016 to 2020 and the four countries: the USA, UK, Canada and Australia. The results show a concerning trend, with the number of searches for such tools growing during the period, especially during COVID-19, and notable increases observed during the months where assessment periods take place in universities. The method employed in this study opens up a new avenue of analysis to enrich and supplement the existing knowledge in the field of academic integrity research. The data obtained demonstrates that faculty should be alert for student use of automatic paraphrasing tools and that academic integrity interventions need to be in place across the sector to address this problem.

...read moreread less

Proceedings Article•DOI•

Current Trends in the Search for Similarities in Source Codes with an Application in the Field of Plagiarism and Clone Detection

[...]

24 May 2023

Posted Content•DOI•

Testing of Detection Tools for AI-Generated Text

[...]

21 Jun 2023

TL;DR: In this article , the authors examine the general functionality of detection tools for artificial intelligence generated text and evaluate them based on accuracy and error type analysis, and conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AIgenerated text.

...read moreread less

Abstract: Recent advances in generative pre-trained transformer large language models have emphasised the potential risks of unfair use of artificial intelligence (AI) generated content in an academic environment and intensified efforts in searching for solutions to detect such content. The paper examines the general functionality of detection tools for artificial intelligence generated text and evaluates them based on accuracy and error type analysis. Specifically, the study seeks to answer research questions about whether existing detection tools can reliably differentiate between human-written text and ChatGPT-generated text, and whether machine translation and content obfuscation techniques affect the detection of AIgenerated text. The research covers 12 publicly available tools and two commercial systems (Turnitin and PlagiarismCheck) that are widely used in the academic setting. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AIgenerated text. Furthermore, content obfuscation techniques significantly worsen the performance of tools. The study makes several significant contributions. First, it summarises up-to-date similar scientific and non-scientific efforts in the field. Second, it presents the result of one of the most comprehensive tests conducted so far, based on a rigorous research methodology, an original document set, and a broad coverage of tools. Third, it discusses the implications and drawbacks of using detection tools for AI-generated text in academic settings.

...read moreread less

Journal Article•DOI•

Plagiarism detection process using AI

[...]

25 Apr 2023

Journal Article•DOI•

Postgraduate students' perception of plagiarism, awareness and use of Turnitin text-matching software.

[...]

Isaac Nketsiah, Osman Imoro, Kwaku Anhwere Barfi

24 Jan 2023-Accountability in Research

TL;DR: Turnitin text-matching software has been widely used by many academic institutions in Ghana as one of the solutions to improving students' and faculty academic writing and a solution for detecting incidences of plagiarism as mentioned in this paper .

...read moreread less

Abstract: Plagiarism is a highly discussed issue in higher education institutions in recent times. Turnitin text-matching software has widely been adopted by many academic institutions in Ghana as one of the solutions to improving students' and faculty academic writing and a solution for detecting incidences of plagiarism. There has been little empirical research into what students actually know about plagiarism and their lived experiences of text-matching technology, despite the fact that a lot of research has looked at attitudes, motivations, and demographic characteristics related to academic dishonesty. This study used an online Google form for data collection. We enrolled 1054 postgraduate students of the University of Cape Coast. The data collected was analysed using SPSS version 21.0, and the proposed hypothesis was tested using Structural Equation Modeling. Findings show that there was no statistically significant relationship between postgraduate students' academic levels and their perception of plagiarism. However, there is significant relationship between postgraduate students' perception of plagiarism and their use of Turnitin. There is also statistically significant relationship between postgraduate students' awareness of Turnitin and its use. This calls for increased awareness creation and sensitization, which can be accomplished through scientific writing workshops, focused on inculcating ethical research practices into students.

...read moreread less

Journal Article•DOI•

An effective text plagiarism detection system based on feature selection and SVM techniques

[...]

Mohamed A. El-Rashidy, R El Ghobashy Mohamed, Nawal El-Fishawy, Marwa A. Shouman

16 May 2023-Multimedia Tools and Applications

TL;DR: In this paper , a new plagiarism detection system is proposed to extract the most effective sentence similarity features and construct hyperplane equation of the selected features to distinguish the similarity cases with the highest accuracy.

...read moreread less

Abstract: Abstract Text plagiarism has greatly spread in the recent years, it becomes a common problem in several fields such as research manuscripts, textbooks, patents, academic circles, etc. There are many sentence similarity features were used to detect plagiarism, but each of them is not discriminative to differentiate the similarity cases. This causes the discovery of lexical, syntactic and semantic text plagiarism types to be a challenging problem. Therefore, a new plagiarism detection system is proposed to extract the most effective sentence similarity features and construct hyperplane equation of the selected features to distinguish the similarity cases with the highest accuracy. It consists of three phases; the first phase is used to preprocess the documents. The second phase is depended on two paths, the first path is based on traditional paragraph level comparison, and the second path is based on the computed hyperplane equation using Support Vector Machine (SVM) and Chi-square techniques. The third phase is used to extract the best plagiarized segment. The proposed system is evaluated on several benchmark datasets. The experimental results showed that the proposed system obtained a significant superiority in the performance compared to the systems with a higher ranking in the recent years. The proposed system achieved the best values 89.12% and 92.91% of the Plagdet scores, 89.34% and 92.95% of the F-measure scores on the complete test corpus of PAN 2013 and PAN 2014 datasets, respectively.

...read moreread less

Journal Article•DOI•

The plagiarism checker using machine learning

[...]

14 Apr 2023-Indian Scientific Journal Of Research In Engineering And Management

TL;DR: In this paper , a plagiarism checker application is developed based on machine learning as its core that searches a vast database for plagiarized content, which converts data, for example, text data, into a list of numbers, thus allowing various operations to be performed on the converted data.

...read moreread less

Abstract: Plagiarism is one of the most increasing problems in many fields; the academic field is one of them. It comes in various forms, from replacing a word with its synonym to sentence modification, transformation, and many more. Though humans are known for their efficient working ability, they may not be able to detect plagiarism in all scenarios accurately. They can’t see similarities against more than one million online documents in seconds. So, concluding that plagiarism detection is tedious and time-consuming work for a human, it would be nice to have a plagiarism checker to do plagiarism detection for us. In this project, a plagiarism checker application is developed based on machine learning as its core that searches a vast database for plagiarized content. Using the vector embeddings concept, a plagiarism checker application is developed, which converts data, for example, text data, into a list of numbers, thus allowing various operations to be performed on the converted data. Vectors are helpful because when we present real-world entities like audio, images, text, etc., as vector embeddings, the semantic similarity between these entities can be quantified by how close they’re to each other as points in vectors. Models are trained to translate entities into vectors; NLP is commonly used for such training. These International Journal of Scientific Research in Engineering and Management (IJSREM) Volume: 07 Issue: 04 | April - 2023 Impact Factor: 8.176 ISSN: 2582-3930 © 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM18949 | Page 2 vector embeddings will be added to the pre-processed database, which will then be ready to be used for our similarity check. The machine learning-based application will take text as input from the user, check the text against the database, and return all the articles from which the input text could be plagiarized, along with the match score. Keywords: Plagiarism detection, machine learning, vector embeddings, NLP.

...read moreread less

Journal Article•DOI•

Named entity recognition and dependency parsing for better concept extraction in summary obfuscation detection

[...]

Umar Nur Taufiq, Reza Pulungan, Yohanes Suyanto

01 Jan 2023-Expert systems with applications

TL;DR: This paper proposed a new approach for summary obfuscation detection based on named entity recognition and dependency parsing, which is straightforward but accurate and easy to analyze compared to genetic algorithm-based methods.

...read moreread less

Abstract: Summary obfuscation is a type of idea plagiarism where a summary of a text document is inserted into another text document so that it is more difficult to detect with ordinary plagiarism detection methods. Various methods have been developed to overcome this problem, one of which is based on genetic algorithms. This paper proposes a new approach for summary obfuscation detection based on named entity recognition and dependency parsing, which is straightforward but accurate and easy to analyze compared to genetic algorithm-based methods. The proposed method successfully detects summary obfuscation at the document level more accurately than existing genetic algorithm-based methods. Our method produced accuracy at sentence level up to more than 84% for specific benchmark and threshold cases. In addition, we have also tested our proposed method on other types of plagiarism, and the resulting accuracy is excellent.

...read moreread less

Book Chapter•DOI•

Detection of Plagiarism in Contextual Meaning Using Transformer Model and Community Detection Algorithm

[...]

Kiyofumi Takabatake¹, Jaclyn Y. Hung¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2023

TL;DR: In this article , the authors used Transformer models and unsupervised algorithms for community detection methodologies to detect plagiarism in content between two articles and then determined the degree of plagiarism between the articles using the score of the number of communities existing in the two articles.

...read moreread less

Abstract: The portrayal of someone else’s original thoughts or work as one’s own without giving the author credit is known as plagiarism. A review revealed that the COVID epidemic, which nearly brought the world to an end, had a significant impact on the quality of academic work published. Of the 310 publications in infected journals that were examined, 41.6% were found to be plagiarized. Additionally, it was noted that technology was the cause of it. In order to compare the similarities of two articles while maintaining contextual value throughout and not just merely using words, this paper focuses on detecting plagiarism in content between two articles using Transformer models and unsupervised algorithms for community detection methodologies. This protects not only the presentation but also the ideas in the original content. The basic inputs to the proposed system are any two articles, into each of which the entire pipeline is fitted. The outputs are then used to determine the degree of plagiarism between the articles. Word embeddings are created using the BERT transformer model, and communities inside the embeddings are found using the Louvain Community discovery technique. A determination of plagiarism is made using the score of the number of communities existing in the two articles.

...read moreread less

Journal Article•DOI•

Applying Coding Behavior Features to Student Plagiarism Detection on Programming Assignments

[...]

Zheng Li, Yuting Zhang, Yong Liu, Yonghao Wu, Shumei Wu - Show less +1 more

07 Apr 2023-Journal of Circuits, Systems, and Computers

TL;DR: Wang et al. as discussed by the authors proposed a plagiarism detection method by analyzing behavioral features of students during the coding process, which extracted five behavioral features based on students' programming habits and used a feature ranking-based suspiciousness algorithm to obtain the possibility of student plagiarism.

...read moreread less

Abstract: In programming education, the result of plagiarism detection is a crucial criterion for assessing whether or not students can pass course exams. Recently, the prevalent methods for detecting student plagiarism have been proposed by analyzing source code. These methods extract features (such as token, abstract syntax tree and control flow graph) from the source code, examine the similarity of codes using various similarity detection methods, and then perform plagiarism detection based on a predefined plagiarism threshold. However, these previous methods for plagiarism detection have some problems. First, they are less effective in detecting code modification related to structure. Second, they require a considerable number of training data, which demand high computing time and space. Third, they cannot determine whether students plagiarize in time. We propose a novel plagiarism detection method by analyzing the behavioral features of students during the coding process. Specifically, we extract five behavioral features based on students’ programming habits. Then, we use a feature ranking-based suspiciousness algorithm to obtain the possibility of student plagiarism. Based on our proposed method, we develop the Online Integrated Programming Platform. To evaluate the accuracy of our method, we conduct a series of experiments. Final experimental results indicate that our method achieves promising results with Accuracy, Precision, Recall and [Formula: see text] values of 0.95, 0.90, 0.95 and 0.92, respectively. Finally, we also analyze the correlation between whether students plagiarized and their regular and final grades, which can further verify the effectiveness of our proposed method.

...read moreread less

Posted Content•DOI•

A Simple and Effective Method of Cross-Lingual Plagiarism Detection

[...]

03 Apr 2023

TL;DR: This article presented a cross-lingual plagiarism detection method applicable to a large number of languages, including under-resourced languages, which leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis.

...read moreread less

Abstract: We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages.

...read moreread less

Posted Content•DOI•

A Machine Learning based approach of for Plagiarism Detection

[...]

01 Feb 2023

TL;DR: A Machine Learning based approach of for Plagiarism Detection is presented in this article , where the authors use a machine learning based approach for the detection of plagiarism in Wikipedia articles.

...read moreread less

Abstract: A Machine Learning based approach of for Plagiarism Detection

...read moreread less

Journal Article•DOI•

Plagiarism Detection System

[...]

Ashish Jain, Anmol Pandey

28 Apr 2023-International Journal of Computer Applications

Journal Article•DOI•

Plagiarism vs Similarity Index: A Critical Insight

[...]

Sandesh Narayan Somnache

09 Jan 2023-German Journal of Pharmaceuticals and Biomaterials

TL;DR: In the early days, plagiarism detection was challenging for the publisher due to the unavailability of sophisticated screening technology for reviewing manuscripts against published hard copies of the articles as mentioned in this paper , which may also explain the increased number of retractions of scientific papers due to plagiarism.

...read moreread less

Abstract: The word Plagiarism is derived from the Latin term “plagiarius”, meaning “kidnapper” [1]. Plagiarism severely violates publication ethics and professional conduct [2]. It may be defined as an unethical intentional or unintentional piracy of someone else idea/s or text without acknowledgement [1]. Intentional Plagiarism usually occurs when some educational credentials, professional promotions, or economic benefits might benefit the concerned author(s)—unintentional plagiarism results either from negligence or lack of awareness about plagiarism [3].The first incidence of plagiarism was detected in the year 1979 in a scientific paper. A later number of papers were found to be plagiarised [4]. A report published in 2018 showed an increased number of retractions of scientific papers in the last two decades due to plagiarism [5]. The primary reason for the increase in plagiarism by the scientific community could be a mandatory requirement to publish for employment and promotions. In addition, lack of skill in scientific writing and stringent policies related to plagiarism [4]. The availability of advanced text formatting tools and free access to scientific information may also reason for increased cases of plagiarism [6]. In the early days, plagiarism detection was challenging for the publisher due to the unavailability of sophisticated screening technology for reviewing manuscripts against published hard copies of the articles. Advanced tools for detecting plagiarism, such as iThenticate (Crossref), Turnitin, Grammarly, and Dupli Checker, are available to compare manuscripts with published articles [3,4]. Recently, iParadigms has developed a plagiarism detection tool for individual authors to screen individual manuscripts against an extensive live database of scholarly literature [7].

...read moreread less