scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A framework for the improved detection of plagiarism is proposed that focuses on the integration of social network information, information from the Web, and an advanced semantically enriched visualization of information about authors and documents that enables the exploration of obtained data by seeking of advanced patterns of plagiarisms.
Abstract: The prevalence of different kinds of electronic devices and the volume of content on the Web have increased the amount of plagiarism, which is considered an unethical act. If we want to be efficient in the detection and prevention of these acts, we have to improve today's methods of discovering plagiarism. The paper presents a research study where a framework for the improved detection of plagiarism is proposed. The framework focuses on the integration of social network information, information from the Web, and an advanced semantically enriched visualization of information about authors and documents that enables the exploration of obtained data by seeking of advanced patterns of plagiarism. To support the proposed framework, a special software tool was also developed. The statistical evaluation confirmed that the employment of social network analysis and advanced visualization techniques led to improvements in the confirmation and investigation stages of the plagiarism detection process, thereby enhancing the overall efficiency of the plagiarism detection process. [ABSTRACT FROM AUTHOR]

5 citations

01 Jan 2008
TL;DR: An efficient text plagiarism detection system is developed and Latent Semantic Indexing (LSI) is employed to build semantic structure from a set of documents, and this knowledge is used to guide plagiarism queries.
Abstract: Summary: I aim to develop an efficient text plagiarism detection system. Currently used systems concentrate on copy detection, and as such are inca- pable of detecting finer cases of plagiarism, which include stealing of ideas rather than exact words. I want to employ Latent Semantic Indexing (LSI) to build semantic structure from a set of documents, and use this knowledge to guide plagiarism queries. LSI serves both as a tool for transforming text data into a smaller, conceptual space and consequently as a performance booster. However, the resulting document representation is very dense, in the sense that each concept is assigned a non-zero real valued number. This poses a problem to efficient querying, because the commonly used tech- nique of inverted index files is not applicable. Popular space-partitioning and data-partitioning indexing techniques also prove inadequate, due to their poor scalability with regard to the VS dimensionality. A choice of an improvement over linear scan called VA-File is considered. An improve- ment to internal system design clarity in the form of segmenting the doc- uments according to topics before further processing is introduced. This process is also hoped to improve retrieval performance. A novel combina- tion of these general methods into a system, their modifications and perfor- mance assessment is proposed to be the subject of my thesis.

5 citations

Proceedings ArticleDOI
01 Nov 2010
TL;DR: This work presents the design and development of a web-based system that supports cross-language similarity analysis and plagiarism detection, and introduces the idea of query document reduction via summarisation.
Abstract: This work presents the design and development of a web-based system that supports cross-language similarity analysis and plagiarism detection. A suspicious document d q in a language L q is to be submitted to the system via a PHP web-based interface. The system will accept the text through either uploading or pasting it directly to a text-area. In order to lighten large texts and provide an ideal set of queries, we introduce the idea of query document reduction via summarisation. Our proposed system utilised a fuzzy swarm-based summarisation tool originally built in Java. Then, the summary is used as a query to find similar web resources in languages L x other than L q via a dictionary-based translation. Thereafter, a detailed similarity analysis across the languages L q and L x is performed and friendly report of results is produced. Such report has global similarity score on the whole document, which assures high flexibility of utilisation.

5 citations

Book ChapterDOI
TL;DR: The topics that will be covered in the RuSSIR 2014 course on Author Profiling and Plagiarism Detection (APPD) are introduced and the results of the shared tasks on author profiling (gender and age identification) and plagiarism detection that the authors help to organise are discussed.
Abstract: In this paper we introduce the topics that we will cover in the RuSSIR 2014 course on Author Profiling and Plagiarism Detection (APPD). Author profiling distinguishes between classes of authors studying how language is shared by classes of people. This task helps in identifying profiling aspects such as gender, age, native language, or even personality type. In case of the plagiarism detection task we are not interested in studying how language is shared. On the contrary, given a document we are interested in investigating if the writing style changes in order to unveil text inconsistencies, i.e., unexpected irregularities through the document such as changes in vocabulary, style and text complexity. In fact, when it is not possible to retrieve the source document(s) where plagiarism has been committed from, the intrinsic analysis of the suspicious document is the only way to find evidence of plagiarism. The difficulty in retrieving the source of plagiarism could be due to the fact that the documents are not available on the web or the plagiarised text fragments were obfuscated via paraphrasing or translation (in case the source document was in another language). In this overview, we also discuss the results of the shared tasks on author profiling (gender and age identification) and plagiarism detection that we help to organise at the PAN Lab on Uncovering Plagiarism, Authorship, and Social Software Misuse (http://pan.webis.de).

5 citations

Journal ArticleDOI
TL;DR: The architecture and concepts of a real-world document retrieval system, which is a part of a general anti-plagiarism software, are described and key approaches of source retrieval are compared.
Abstract: Plagiarism has become a serious problem mainly because of the electronically available documents. An online document retrieval is a weighty part of a modern anti-plagiarism tool. This paper describes an architecture and concepts of a real-world document retrieval system, which is a part of a general anti-plagiarism software. Up to date systems for plagiarism detection are discussed from the source retrieval perspective. The key approaches of source retrieval are compared. The system recommendations stem from design, implementation, and several years of operation experience of a nationwide plagiarism solution at Masaryk University in the Czech Republic. The design can be adapted to many situations. Proper usage of such systems contributes to the gradual improvement of the quality of student theses.

5 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125