Home
/
Topics
/
Plagiarism detection

Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1998
1997
1996
1994
1990
1989
1988
1987
1985
1981

Papers

PDF

Open Access

More filters

Journal Article•

Evaluating the Performance of LSA for Source-code Plagiarism Detection

[...]

Georgina Cosma, Mike Joy

01 Jan 2012-Informatica (lithuanian Academy of Sciences)

TL;DR: The experiments revealed that the best retrieval performance is obtained after removal of in-code comments and applying a combined weighting scheme based on terms frequencies, normalized term frequencies, and a cosine-based document normalization.

...read moreread less

Abstract: Latent Semantic Analysis (LSA) is an intelligent information retrieval technique that uses mathematical algorithms for analyzing large corpora of text and revealing the underlying semantic information of documents. LSA is a highly parameterized statistical method, and its effectiveness is driven by the setting of its parameters which are adjusted based on the task to which it is applied. This paper discusses and evaluates the importance of parameterization for LSA based similarity detection of source-code documents, and the applicability of LSA as a technique for source-code plagiarism detection when its parameters are appropriately tuned. The parameters involve preprocessing techniques, weighting approaches; and parameter tweaking inherent to LSA processing – in particular, the choice of dimensions for the step of reducing the original post-SVD matrix. The experiments revealed that the best retrieval performance is obtained after removal of in-code comments (Java comment blocks) and applying a combined weighting scheme based on term frequencies, normalized term frequencies, and a cosine-based document normalization. Furthermore, the use of similarity thresholds (instead of mere rankings) requires the use of a higher number of dimensions. Povzetek: Prispevek analizira metodo LSA posebej glede plagiarizma izvirne kode.

...read moreread less

23 citations

Proceedings Article•DOI•

A natural language processing approach to automatic plagiarism detection

[...]

Chi-Hong Leung¹, Yuen-Yan Chan¹•Institutions (1)

The Chinese University of Hong Kong¹

18 Oct 2007

TL;DR: It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach, and application of natural language processing can help to resolve this kind of problem.

...read moreread less

Abstract: The problem of plagiarism has existed for a long time but with the advance of information technology the problem becomes worse. It is because there are many electronic versions of published materials available to everyone. The Web is an important and common source for plagiarism. Some plagiarism detection programs (such as Turnitin) were developed to attempt to deal with this problem. To determine whether an article is copied from the Web or other electronic sources, the plagiarism detection program should calculate the similarity between two articles. However, it is often difficult to detect plagiarism accurately after modification of the copied contents. For example, it is possible to simply replace a word with its synonym (e.g. "program" -- "software ") and change the entire sentence structure. Most plagiarism detection programs can only compare whether two words are the same lexically and count how many matched words are there in a paper. Thus, if the copied materials are modified deliberately, it becomes difficult to detect plagiarism.Application of natural language processing can help to resolve this kind of problem. The underlying syntactic structure and semantic meaning of two sentences can be compared to reveal their similarity. There are several steps in the matching procedure. First, the thesaurus (or the lexical hierarchical structure) is referenced to find out the synonyms, broader terms and narrower terms used in the paper being checked. Then, the paper will be compared with the documents in the database. Wordnet is a typical example of the thesaurus that can be used for this purpose. If it is suspected that the paper contains some contents from the database, the sentences of the paper may be parsed to construct their parsing trees and semantic representations for further detailed comparison. The context free grammar and the case grammar are used to represent the syntactic structure and semantic meaning of sentences in the system. It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach.

...read moreread less

23 citations

Proceedings Article•

Using natural language parsers in plagiarism detection.

[...]

Maxim Mozgovoy, Tuomo Kakkonen, Erkki Sutinen

01 Jan 2007

TL;DR: In this work it is shown how a natural language parser can be used to fight against basic plagiarism hiding methods.

...read moreread less

Abstract: The problem of plagiarism detection system design is a subject of numerous works of the last decades. Various advanced file-file comparison techniques were developed. However, most existing systems, aimed at natural language texts, do not perform any significant preprocessing of the input documents. So in many cases it is possible to hide the presence of plagiarism by utilizing some simple techniques. In this work we show how a natural language parser can be used to fight against basic plagiarism hiding methods.

...read moreread less

23 citations

Book Chapter•DOI•

Plagiarism Detection Software: Promises, Pitfalls, and Practices

[...]

Debora Weber-Wulff¹•Institutions (1)

HTW Berlin - University of Applied Sciences¹

01 Jan 2016

23 citations

Proceedings Article•DOI•

Detecting plagiarisms in elementary programming courses

[...]

Ushio Inoue¹, Shuhei Wada¹•Institutions (1)

Tokyo Denki University¹

29 May 2012

TL;DR: Evaluation results using real student program assignments show the effectiveness of the proposed method to automatically detect plagiarisms, i.e. illegal copies, among a set of programs submitted by students in elementary programming courses.

...read moreread less

Abstract: This paper proposes a method to automatically detect plagiarisms, i.e. illegal copies, among a set of programs submitted by students in elementary programming courses. In such courses, programming assignments are so simple that submitted programs are very short and similar to each other. Existing plagiarism detection methods, therefore, may yield many false positive results. The proposed method solves the problem by using three types of similarity: code, comment, and inconsistence. The inconsistence similarity, a unique feature of the method, improves the precision and recall ratios, and helps to find evidences of plagiarisms. Evaluation results using real student program assignments show the effectiveness of the proposed method.

...read moreread less

23 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
…
58
59
60
61
62
63
64
…
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics