Open Access
CitePlag : A Citation-based Plagiarism Detection System Prototype
TLDR
An open-source prototype of a citation-based plagiarism detection system called CitePlag, to evaluate the citations of academic documents as language independent markers to detect plagiarism, is presented.Abstract:
This paper presents an open-source prototype of a citation-based plagiarism detection system called CitePlag. The underlying idea of the system is to evaluate the citations of academic documents as language independent markers to detect plagiarism. CitePlag uses three different detection algorithms that analyze the citation sequence of academic documents for similar patterns that may indicate unduly used foreign text or ideas. The algorithms consider multiple citation-related factors such as proximity and order of citations within the text, or their probability of co-occurrence in order to compute document similarity scores. We present technical details of CitePlag’s detection algorithms and the acquisition of test data from the PubMed Central Open Access Subset. Future advancement of the prototype lies in increasing the reference database by enabling the system to process more document and citation formats. Improving CitePlag’s detection algorithms and scoring functions to reduce the number of false positives is another major goal. Eventually, we plan to integrate text-based detection algorithms in addition to the citation-based detection algorithms within CitePlag.read more
Citations
More filters
Journal ArticleDOI
Systematic review automation technologies
TL;DR: A detailed survey of the state of the art of information systems designed to support or automate individual tasks in the systematic review, and in particular systematic reviews of randomized controlled clinical trials, reveals trends that see the convergence of several parallel research projects.
Journal ArticleDOI
State-of-the-art in detecting academic plagiarism
Norman Meuschke,Bela Gipp +1 more
TL;DR: In the future, plagiarism detection systems may benefit from combining traditional character-based detection methods with these emerging detection approaches, including intrinsic, cross-lingual and citation-based plagiarism Detection.
Journal ArticleDOI
Study on extrinsic text plagiarism detection techniques and tools
K Vani,Deepa Gupta +1 more
TL;DR: The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art, and an overview of some of the available detection software tools, their features and detection efficiency is discussed.
Journal ArticleDOI
Comparing and combining Content- and Citation-based approaches for plagiarism detection
TL;DR: This work compares content and citation‐based approaches for plagiarism detection with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection and concluded that a combination of the methods can be beneficial.
Journal ArticleDOI
An academic Arabic corpus for plagiarism detection: design, construction and experimentation
TL;DR: The design and construction of an Arabic PD reference corpus that is dedicated to academic language and a database for the detection of plagiarism in student assignments, reports, and dissertations is discussed.
References
More filters
Journal ArticleDOI
The Matthew effect in science. The reward and communication systems of science are considered.
TL;DR: The psychosocial conditions and mechanisms underlying the Matthew effect are examined and a correlation between the redundancy function of multiple discoveries and the focalizing function of eminent men of science is found—a function which is reinforced by the great value these men place upon finding basic problems and by their self-assurance.
Proceedings Article
ParsCit: an Open-source CRF Reference String Parsing Package
TL;DR: Parsing package ParsCit is described, a freely available, open-source implementation of a reference string parsing package that wraps a trained conditional random field model with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts.
Proceedings Article
An Evaluation Framework for Plagiarism Detection
TL;DR: Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.
Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis
Bela Gipp,Jöran Beel +1 more
TL;DR: The approach called Citation Proximity Analysis (CPA) is a further development of co-citation analysis, but in addition, considers the proximity of citations to each other within an article’s full-text.
Journal ArticleDOI
Déjà vu—A study of duplicate citations in Medline
Mounir Errami,Justin M. Hicks,Wayne Fisher,David Trusty,Jonathan D. Wren,Tara C. Long,Harold R. Garner +6 more
TL;DR: Using text similarity searches, a database of manually verified duplicate citations was created to study author publication behavior and found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism.
Related Papers (5)
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence
Bela Gipp,Norman Meuschke +1 more