Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Source Retrieval Based on Learning to Rank and Text Alignment Based on Plagiarism Type Recognition for Plagiarism Detection.

[...]

Leilei Kong, Yong Han, Zhongyuan Han, Yu Haihao, Qibo Wang, Tinglei Zhang, Haoliang Qi - Show less +3 more

01 Jan 2014

TL;DR: In this paper, a ranking model based on Ranking SVM is proposed to rank the query keywords group which is contributed to get the higher evaluation measure F.

...read moreread less

Abstract: 1Heilongjiang Institute of Technology, China 2Harbin Engineering University, China 3Harbin Institute of Technology, China kongleilei1979@gmail.com Abstract. For the task of source retrieval, the target is to retrieve all plagiarized sources while minimizing retrieval costs. It has become standard for plagiarism detection to retrieve plagiarism sources with query keywords selected from suspicious document. This paper regards the keywords selection problem as learning a ranking model to choose the method of keywords extraction over suspicious document segments. There are four basic methods which are used in our ranking function, which are BM25, TFIDF, TF and EW. Then, a ranking model based on Ranking SVM is proposed to rank the query keywords group which is contributed to get the higher evaluation measure F. In our ranking model, achieving the best performance measure F of source retrieval is used as the target of learning to rank and all kinds of statistic features are fused for searching the better query keywords groups.

...read moreread less

10 citations

Proceedings Article•DOI•

TextFlow: A Text Similarity Measure based on Continuous Sequences

[...]

Yassine Mrabet¹, Halil Kilicoglu¹, Dina Demner-Fushman¹•Institutions (1)

National Institutes of Health¹

01 Jan 2017

TL;DR: A novel text similarity measure inspired from a common representation in DNA sequence alignment algorithms is presented, called TextFlow, which represents input text pairs as continuous curves and uses both the actual position of the words and sequence matching to compute the similarity value.

...read moreread less

Abstract: Text similarity measures are used in multiple tasks such as plagiarism detection, information ranking and recognition of paraphrases and textual entailment. While recent advances in deep learning highlighted the relevance of sequential models in natural language generation, existing similarity measures do not fully exploit the sequential nature of language. Examples of such similarity measures include n-grams and skip-grams overlap which rely on distinct slices of the input texts. In this paper we present a novel text similarity measure inspired from a common representation in DNA sequence alignment algorithms. The new measure, called TextFlow, represents input text pairs as continuous curves and uses both the actual position of the words and sequence matching to compute the similarity value. Our experiments on 8 different datasets show very encouraging results in paraphrase detection, textual entailment recognition and ranking relevance.

...read moreread less

10 citations

Proceedings Article•

Arabic Plagiarism Detection Using Word Correlation in N-Grams with K-Overlapping Approach, Working Notes for PAN-AraPlagDet at FIRE 2015.

[...]

Salha Alzahrani¹•Institutions (1)

Taif University¹

01 Jan 2015

TL;DR: This system can detect some means of obfuscation such as restructuring or rewording of few phrases, it might not work with handmade paraphrases, and its future work is to advance the candidate retrieval stage and contain semantic-based metrics in the detection stage.

...read moreread less

Abstract: This report explains our Arabic plagiarism detection system which we used to submit our run to AraPlagDetect competition at FIRE 2015. The system was constructed through four main stages. First is pre-processing which includes tokenisation and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using K-gram fingerprinting and Jaccard coefficient. Suspicious documents are then compared indepth with the associated candidate documents. This stage entails the computation of the similarity between constructed N-grams with K-overlapping where N and K were experimentally assigned to 8 and 3, respectively. The similarity between N-Gram pairs were computed based on word correlations. Each word was compared with words in candidate N-Gram and correlated by 1 if they are matched. Correlation values were averaged then compared to a threshold. The last step is post-processing whereby consecutive N-Grams were joined to form united plagiarised segments. Our performance measures on the training corpus were encouraging (recall=0.829, precision=0.843, granularity=1.11). The recall measure on the test collection was unfortunately less (recall= 0.530) but precision and granularity remained consistent with the train set (precision= 0.831, granularity= 1.18). This drop in recall may be due to the fact that our candidate retrieval stage retrieves only documents which share copied fragments but there exist plagiarised documents which have no exact-copied cases. Although this system can detect some means of obfuscation such as restructuring or rewording of few phrases, it might not work with handmade paraphrases. Our future work is to advance the candidate retrieval stage and contain semantic-based metrics in the detection stage.

...read moreread less

10 citations

Journal Article•DOI•

[...]

Hanane Ezzikouri, Mohammed Erritali, Mohamed Oukessou

01 Feb 2016-Indonesian Journal of Electrical Engineering and Computer Science

TL;DR: This paper presents an application programming interface for several Semantic Relatedness/Similarity metrics measuring semantic similarity/distance between multilingual words and concepts, in order to use it after for sentences and paragraphs in Cross Language Plagiarism Detection (CLPD).

...read moreread less

Abstract: Generally utterances in natural language are highly ambiguous, and a unique interpretation can usually be determined only by taking into account the context in the utterance occurred. Automatically determining the correct sense of a polysemous word is a complicated problem especially in multilingual corpuses. This paper presents an application programming interface for several Semantic Relatedness/Similarity metrics measuring semantic similarity/distance between multilingual words and concepts, in order to use it after for sentences and paragraphs in Cross Language Plagiarism Detection (CLPD); using WordNet for the English-French and English-Arabic multilingual plagiarism cases.

...read moreread less

10 citations

Proceedings Article•DOI•

What do we need to know about clones? deriving information needs from user goals

[...]

Hamid Abdul Basit¹, Muhammad Hammad, Stan Jarzabek, Rainer Koschke²•Institutions (2)

Lahore University of Management Sciences¹, University of Bremen²

06 Mar 2015

TL;DR: This paper makes a first step toward gathering clone information needs from the description of user goals and results are useful for various stakeholders such as programmers, managers, tool developers, and researchers.

...read moreread less

Abstract: —Clone detection can be used to achieve diverse objectives such as refactoring, program understanding, bug localization, and plagiarism detection, etc. Each goal takes a different perspective on clone information needs. Different clone detection tools report different information about clones. To gauge the suitability of a given clone detector for a particular user objective, we need to determine which information needs implied by the objective a clone detector addresses. In this paper, we make a first step toward gathering clone information needs from the description of user goals. The results of our analysis are useful for various stakeholders such as programmers, managers, tool developers, and researchers.

...read moreread less

10 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics