scispace - formally typeset
Search or ask a question
Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.


Papers
More filters
Proceedings ArticleDOI
16 Sep 2014
TL;DR: SimSeerX is introduced, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents.
Abstract: The need to find similar documents occurs in many settings, such as in plagiarism detection or research paper recommendation. Manually constructing queries to find similar documents may be overly complex, thus motivating the use of whole documents as queries. This paper introduces SimSeerX, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents. Key to the design of SimSeerX is that is able to work with multiple similarity functions and document collections. We present the architecture and interface of SimSeerX, show its applicability with 3 different similarity functions and demonstrate its scalability on a collection of 3.5 million academic documents.

10 citations

Book ChapterDOI
08 Apr 2017
TL;DR: Two alternative approaches for detecting plagiarism in homoglyph obfuscated texts are presented: the first approach utilizes the Unicode list of confusables to replaceHomoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyPH obfuscated words with source words.
Abstract: Homoglyphs can be used for disguising plagiarized text by replacing letters in source texts with visually identical letters from other scripts. Most current plagiarism detection systems are not able to detect plagiarism when text has been obfuscated using homoglyphs. In this work, we present two alternative approaches for detecting plagiarism in homoglyph obfuscated texts. The first approach utilizes the Unicode list of confusables to replace homoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyph obfuscated words with source words. Empirical testing on datasets from PAN-2015 shows that both approaches perform equally well for plagiarism detection in homoglyph obfuscated texts.

10 citations

01 Jan 2003
TL;DR: Electronic submission of student assignments certainly provides many advantages for the faculty member and graders, and paperless transactions are especially useful when the number of submissions is large and the assignments must be distributed to multiple locations.
Abstract: And so it is with grading assignments that have been submitted electronically. Electronic submission of student assignments certainly provides many advantages for the faculty member and graders. For instance, electronic submissions are easier to manage and keep track of than their paper counterparts, particularly as the number of submissions gets large. Submissions can be time-stamped automatically and archived, thus minimizing the potential for disputes over lateness and lost assignments and/or grades. Furthermore, archives can help resolve issues involving academic dishonesty and/or plagiarism. Finally, paperless transactions are especially useful when the number of submissions is large and the assignments must be distributed to multiple locations (such as to teaching assistants, graders, and plagiarism detection software).

10 citations

Proceedings ArticleDOI
03 Aug 2017
TL;DR: This paper investigates cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.
Abstract: This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.

10 citations

Book ChapterDOI
TL;DR: A code plagiarism detection based on the AST is studied that pre-formats code, analysis lexical and syntax and obtains the corresponding AST and calculates the similarity of the code sequence and gets the code plagiarisms detection report.
Abstract: In this paper, a code plagiarism detection based on the AST is studied. It pre-formats code, analysis lexical and syntax and obtains the corresponding AST. Then it traverses AST to generate code sequences, calculates the similarity of the code sequence and gets the code plagiarism detection report. Test results verify the effectiveness of the method.

9 citations


Network Information
Related Topics (5)
Active learning
42.3K papers, 1.1M citations
78% related
The Internet
213.2K papers, 3.8M citations
77% related
Software development
73.8K papers, 1.4M citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
76% related
Deep learning
79.8K papers, 2.1M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202359
2022126
202183
2020118
2019130
2018125