Home
/
Topics
/
Plagiarism detection

Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1998
1997
1996
1994
1990
1989
1988
1987
1985
1981

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An efficient classification approach in imbalanced datasets for intrinsic plagiarism detection

[...]

Andrianna Polydouri¹, Eleni Vathi¹, Georgios Siolas¹, Andreas Stafylopatis¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Sep 2020-Evolving Systems

TL;DR: This work considers, for the first time, the fact of imbalanced data as a crucial parameter of the problem and experiment with various balancing techniques, and combines features and imbalanced dataset treatment with various classification methods.

...read moreread less

Abstract: The ever increasing volume of information due to the widespread use of computers and the web has made effective plagiarism detection methods a necessity Plagiarism can be found in many settings and forms, in literature, in academic papers, even in programming code Intrinsic plagiarism detection is the task that deals with the discovery of plagiarized passages in a text document, by identifying the stylistic changes and inconsistencies within the document itself, given that no reference corpus is available The main idea consists in profiling the style of the original author and marking the passages that seem to differ significantly In this work, we follow a supervised machine learning classification approach We consider, for the first time, the fact of imbalanced data as a crucial parameter of the problem and experiment with various balancing techniques Apart from this, we propose some novel stylistic features We combine our features and imbalanced dataset treatment with various classification methods Our detection system is tested on the data corpora of PAN Webis intrinsic plagiarism detection shared tasks It is compared to the best performing detection systems on these datasets, and succeeds the best resulting scores

...read moreread less

9 citations

Book Chapter•DOI•

Semantic duplicate identification with parsing and machine learning

[...]

Sven Hartrumpf¹, Tim vor der Brück¹, Christian Eichhorn²•Institutions (2)

Rolf C. Hagen Group¹, Technical University of Dortmund²

06 Sep 2010

TL;DR: The deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee a high recall for texts which are not fully parsable and increases precision considerably in comparison to traditional shallow methods.

...read moreread less

Abstract: Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized index. In order to detect many kinds of paraphrases the semantic networks of a candidate text are varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Important phenomena occurring in difficult duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora is explained briefly. The deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee a high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably in comparison to traditional shallow methods.

...read moreread less

9 citations

Experiments in Electronic Plagiarism Detection

[...]

Caroline Lyon, Ruth Barrett, James A. Malcolm

01 Jan 2003

9 citations

Proceedings Article•

Online plagiarism detection through exploiting lexical, syntactic, and semantic information

[...]

Wan-Yu Lin¹, Nanyun Peng², Chun-Chao Yen¹, Shou-De Lin¹•Institutions (2)

National Taiwan University¹, Peking University²

10 Jul 2012

TL;DR: A framework that identifies online plagiarism by exploiting lexical, syntactic and semantic features that includes duplication-gram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences is introduced.

...read moreread less

Abstract: In this paper, we introduce a framework that identifies online plagiarism by exploiting lexical, syntactic and semantic features that includes duplication-gram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We establish an ensemble framework to combine the predictions of each model. Results demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms and commercial software.

...read moreread less

9 citations

Journal Article•DOI•

Plagiarism Detection by using Karp-Rabin and String Matching Algorithm Together

[...]

Sonawane Kiran Shivaji, Prabhudeva S

22 Apr 2015-International Journal of Computer Applications

TL;DR: In this paper, the authors have focused on practical assignments (projects) as well as written document which is to be submitted by students in to college or university and their algorithm divides submitted articles in small pieces and scans it to compare with connected databases to the server on internet.

...read moreread less

Abstract: In today word copying something from other sources and claiming it as an own contribution is a crime. We have also seen it is major problem in academic where students of UG, PG or even at PhD level copying some part of original documents and publishing on own name without taking proper permission from author or developer. Many software tools in exist to find out and assist the monotonous and time consuming task of tracing plagiarism, because identifying the owner of that whole text is practically difficult and impossible for markers. In our presentation we have focused on practical assignments (projects) as well as written document which is to be submitted by students in to college or university. Because of this crucial task and day by day increasing research in different fields, industry, academy people demanding such software to detect whether submitted articles, books, national or international papers are genuine or not. In this paper, our algorithm divides submitted articles in small pieces and scans it to compare with connected databases to the server on internet. Some existing work compares submitted articles with previously submitted articles i.e. with existing database.

...read moreread less

9 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
…
128
129
130
131
132
133
134
…
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics