Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
read more
Citations
State-of-the-art in detecting academic plagiarism
Academic Plagiarism Detection: A Systematic Literature Review
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence
On the mono- and cross-language detection of text reuse and plagiarism
Evaluating Link-based Recommendations for Wikipedia
References
Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas
A New Dimension in Documentation through Association of Ideas
The Oxford companion to the English language
Overview of the 2nd International Competition on Plagiarism Detection
Methods for identifying versioned and plagiarized documents
Related Papers (5)
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence
Frequently Asked Questions (10)
Q2. What is the strength of the existing PDS?
Whereas the strength of existing PDS lies in detecting plagiarism on the sentence level in the form of identifying similar or identical consecutive words, the strength of the citation based approach lies in identifying translation- and idea-plagiarism or disguised paraphrasing.
Q3. What system is used to detect copy&paste plagiarism?
which usually scores among the top 3 PDS in the HTW comparisons [10], the freely available Ferret system [11], both systems use fingerprinting detection, and WCopyFind [2], a PDS that employs substring matching.
Q4. What is the main purpose of the PAN-PC evaluation corpus?
The PAN-PC evaluation corpus mainly contains artificially plagiarized sections that were created and partially obfuscated through automated methods such as translation, random shuffles, or semantic substitutions of terms.
Q5. What is the purpose of the GuttenPlag project?
After the popular politician repudiated the accusations as “abstruse”, volunteers initiated the GuttenPlag project [1] to crowd-source the investigation and determine the true amount of plagiarism present in the work.
Q6. What are the strengths of the text-based approach?
Text-based PDS convince in detecting local forms of plagiarism, such as short passages of copied or only slightly paraphrased text.
Q7. What is the meaning of tocitation patterns?
The authors refer tocitation patterns as subsequences in the citation tuples and of two texts and that (partially) consist of shared references, and are therefore similar to each other.
Q8. What system was used to identify copy&paste plagiarism?
Since the two latter mentioned systems depend on local availability of possible source documents, all digitally available sources identified by the GuttenPlag project were collected and used.
Q9. What system can be used to identify copy&paste plagiarism?
The text-based PDS, especially Ferret and WCopyfind, which work with local document comparisons, deliver good results for identifying copy&paste plagiarism given that the sources are available, as in their case.
Q10. What is the degree of similarity between citation patterns?
The degree of similarity between citation patterns depends, among others factors, mainly on the amount of shared references1 Citations are short strings in the body of scientific texts representing sources contained in the bibliography whereas references denote entries in the bibliography.(bibliographic coupling strength), and the extent to which the order of included citations, as well as their distance towards each other is similar.