Topic
Plagiarism detection
About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.
Papers published on a yearly basis
Papers
More filters
••
16 Sep 2014TL;DR: SimSeerX is introduced, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents.
Abstract: The need to find similar documents occurs in many settings, such as in plagiarism detection or research paper recommendation. Manually constructing queries to find similar documents may be overly complex, thus motivating the use of whole documents as queries. This paper introduces SimSeerX, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents. Key to the design of SimSeerX is that is able to work with multiple similarity functions and document collections. We present the architecture and interface of SimSeerX, show its applicability with 3 different similarity functions and demonstrate its scalability on a collection of 3.5 million academic documents.
10 citations
••
08 Apr 2017TL;DR: Two alternative approaches for detecting plagiarism in homoglyph obfuscated texts are presented: the first approach utilizes the Unicode list of confusables to replaceHomoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyPH obfuscated words with source words.
Abstract: Homoglyphs can be used for disguising plagiarized text by replacing letters in source texts with visually identical letters from other scripts. Most current plagiarism detection systems are not able to detect plagiarism when text has been obfuscated using homoglyphs. In this work, we present two alternative approaches for detecting plagiarism in homoglyph obfuscated texts. The first approach utilizes the Unicode list of confusables to replace homoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyph obfuscated words with source words. Empirical testing on datasets from PAN-2015 shows that both approaches perform equally well for plagiarism detection in homoglyph obfuscated texts.
10 citations
01 Jan 2003
TL;DR: Electronic submission of student assignments certainly provides many advantages for the faculty member and graders, and paperless transactions are especially useful when the number of submissions is large and the assignments must be distributed to multiple locations.
Abstract: And so it is with grading assignments that have been submitted electronically. Electronic submission of student assignments certainly provides many advantages for the faculty member and graders. For instance, electronic submissions are easier to manage and keep track of than their paper counterparts, particularly as the number of submissions gets large. Submissions can be time-stamped automatically and archived, thus minimizing the potential for disputes over lateness and lost assignments and/or grades. Furthermore, archives can help resolve issues involving academic dishonesty and/or plagiarism. Finally, paperless transactions are especially useful when the number of submissions is large and the assignments must be distributed to multiple locations (such as to teaching assistants, graders, and plagiarism detection software).
10 citations
••
03 Aug 2017TL;DR: This paper investigates cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.
Abstract: This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.
10 citations
••
TL;DR: A code plagiarism detection based on the AST is studied that pre-formats code, analysis lexical and syntax and obtains the corresponding AST and calculates the similarity of the code sequence and gets the code plagiarisms detection report.
Abstract: In this paper, a code plagiarism detection based on the AST is studied. It pre-formats code, analysis lexical and syntax and obtains the corresponding AST. Then it traverses AST to generate code sequences, calculates the similarity of the code sequence and gets the code plagiarism detection report. Test results verify the effectiveness of the method.
9 citations