scispace - formally typeset
Journal ArticleDOI

A coarse-to-fine framework to efficiently thwart plagiarism

Haijun Zhang, +1 more
- 01 Feb 2011 - 
- Vol. 44, Iss: 2, pp 471-487
TLDR
The Earth Mover's Distance (EMD) is employed to retrieve relevant documents, which enables us to markedly shrink the searching domain and corroborate that the proposed approach is accurate and computationally efficient for performing PD.
About
This article is published in Pattern Recognition.The article was published on 2011-02-01. It has received 32 citations till now. The article focuses on the topics: Plagiarism detection & Document retrieval.

read more

Citations
More filters
Journal ArticleDOI

Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

TL;DR: A new taxonomy of plagiarism is presented that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view, and supports deep understanding of different linguistic patterns in committing plagiarism.
Journal ArticleDOI

Tree2Vector: Learning a Vectorial Representation for Tree-Structured Data

TL;DR: An efficient learning framework for transforming tree-structured data into vectorial representations is presented and a locality-sensitive reconstruction method is introduced to model a process, in which each parent node is assumed to be reconstructed by its children.
Proceedings ArticleDOI

On the mono- and cross-language detection of text reuse and plagiarism

TL;DR: The aim of this PhD thesis is to address three of the main problems in the development of better models for automatic plagiarism detection: the adequate identification of good potential sources for a given suspicious text, the detection of plagiarism despite modifications and the generation of standard collections of cases of plagiarisms and text reuse.
Journal ArticleDOI

Comparing and combining Content- and Citation-based approaches for plagiarism detection

TL;DR: This work compares content and citation‐based approaches for plagiarism detection with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection and concluded that a combination of the methods can be beneficial.
Posted Content

Plagiarism: Taxonomy, Tools and Detection Techniques.

TL;DR: A taxonomy of various plagiarism forms is presented and include discussion on each of these forms to highlight a list of issues and research challenges related to this evolving research problem.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Book

Modern Information Retrieval

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.