scispace - formally typeset
Open Access

CitePlag : A Citation-based Plagiarism Detection System Prototype

TLDR
An open-source prototype of a citation-based plagiarism detection system called CitePlag, to evaluate the citations of academic documents as language independent markers to detect plagiarism, is presented.
Abstract
This paper presents an open-source prototype of a citation-based plagiarism detection system called CitePlag. The underlying idea of the system is to evaluate the citations of academic documents as language independent markers to detect plagiarism. CitePlag uses three different detection algorithms that analyze the citation sequence of academic documents for similar patterns that may indicate unduly used foreign text or ideas. The algorithms consider multiple citation-related factors such as proximity and order of citations within the text, or their probability of co-occurrence in order to compute document similarity scores. We present technical details of CitePlag’s detection algorithms and the acquisition of test data from the PubMed Central Open Access Subset. Future advancement of the prototype lies in increasing the reference database by enabling the system to process more document and citation formats. Improving CitePlag’s detection algorithms and scoring functions to reduce the number of false positives is another major goal. Eventually, we plan to integrate text-based detection algorithms in addition to the citation-based detection algorithms within CitePlag.

read more

Citations
More filters
Journal ArticleDOI

Systematic review automation technologies

TL;DR: A detailed survey of the state of the art of information systems designed to support or automate individual tasks in the systematic review, and in particular systematic reviews of randomized controlled clinical trials, reveals trends that see the convergence of several parallel research projects.
Journal ArticleDOI

State-of-the-art in detecting academic plagiarism

TL;DR: In the future, plagiarism detection systems may benefit from combining traditional character-based detection methods with these emerging detection approaches, including intrinsic, cross-lingual and citation-based plagiarism Detection.
Journal ArticleDOI

Study on extrinsic text plagiarism detection techniques and tools

TL;DR: The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art, and an overview of some of the available detection software tools, their features and detection efficiency is discussed.
Journal ArticleDOI

Comparing and combining Content- and Citation-based approaches for plagiarism detection

TL;DR: This work compares content and citation‐based approaches for plagiarism detection with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection and concluded that a combination of the methods can be beneficial.
Journal ArticleDOI

An academic Arabic corpus for plagiarism detection: design, construction and experimentation

TL;DR: The design and construction of an Arabic PD reference corpus that is dedicated to academic language and a database for the detection of plagiarism in student assignments, reports, and dissertations is discussed.
References
More filters
Journal ArticleDOI

The Matthew effect in science. The reward and communication systems of science are considered.

TL;DR: The psychosocial conditions and mechanisms underlying the Matthew effect are examined and a correlation between the redundancy function of multiple discoveries and the focalizing function of eminent men of science is found—a function which is reinforced by the great value these men place upon finding basic problems and by their self-assurance.
Proceedings Article

ParsCit: an Open-source CRF Reference String Parsing Package

TL;DR: Parsing package ParsCit is described, a freely available, open-source implementation of a reference string parsing package that wraps a trained conditional random field model with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts.
Proceedings Article

An Evaluation Framework for Plagiarism Detection

TL;DR: Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis

Bela Gipp, +1 more
TL;DR: The approach called Citation Proximity Analysis (CPA) is a further development of co-citation analysis, but in addition, considers the proximity of citations to each other within an article’s full-text.
Journal ArticleDOI

Déjà vu—A study of duplicate citations in Medline

TL;DR: Using text similarity searches, a database of manually verified duplicate citations was created to study author publication behavior and found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism.
Related Papers (5)