Topic

Plagiarism detection

About: Plagiarism detection is a research topic. Over the lifetime, 1790 publications have been published within this topic receiving 24740 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Plagiarism in submitted manuscripts: incidence, characteristics and optimization of screening-case study in a major specialty medical journal.

[...]

Janet R. Higgins¹, Feng-Chang Lin², James P. Evans²•Institutions (2)

American College of Medical Genetics¹, University of North Carolina at Chapel Hill²

10 Oct 2016

TL;DR: Plagiarism was a common occurrence among manuscripts submitted for publication to a major American specialty medical journal and most manuscripts with plagiarized material were submitted from countries in which English was not an official language.

...read moreread less

Abstract: Plagiarism is common and threatens the integrity of the scientific literature. However, its detection is time consuming and difficult, presenting challenges to editors and publishers who are entrusted with ensuring the integrity of published literature. In this study, the extent of plagiarism in manuscripts submitted to a major specialty medical journal was documented. We manually curated submitted manuscripts and deemed an article contained plagiarism if one sentence had 80 % of the words copied from another published paper. Commercial plagiarism detection software was utilized and its use was optimized. In 400 consecutively submitted manuscripts, 17 % of submissions contained unacceptable levels of plagiarized material with 82 % of plagiarized manuscripts submitted from countries where English was not an official language. Using the most commonly employed commercial plagiarism detection software, sensitivity and specificity were studied with regard to the generated plagiarism score. The cutoff score maximizing both sensitivity and specificity was 15 % (sensitivity 84.8 % and specificity 80.5 %). Plagiarism was a common occurrence among manuscripts submitted for publication to a major American specialty medical journal and most manuscripts with plagiarized material were submitted from countries in which English was not an official language. The use of commercial plagiarism detection software can be optimized by selecting a cutoff score that reflects desired sensitivity and specificity.

...read moreread less

38 citations

Journal Article•DOI•

Machine Learning-Based Analysis of Program Binaries: A Comprehensive Study

[...]

Hongfa Xue¹, Shaowen Sun¹, Guru Venkataramani¹, Tian Lan¹•Institutions (1)

George Washington University¹

20 May 2019-IEEE Access

TL;DR: The taxonomy of machine learning-based binary code analysis is provided, the recent advances and key findings on the topic are described, and the thoughts for future directions on this topic are presented.

...read moreread less

Abstract: Binary code analysis is crucial in various software engineering tasks, such as malware detection, code refactoring, and plagiarism detection. With the rapid growth of software complexity and the increasing number of heterogeneous computing platforms, binary analysis is particularly critical and more important than ever. Traditionally adopted techniques for binary code analysis are facing multiple challenges, such as the need for cross-platform analysis, high scalability and speed, and improved fidelity, to name a few. To meet these challenges, machine learning-based binary code analysis frameworks attract substantial attention due to their automated feature extraction and drastically reduced efforts needed on large-scale programs. In this paper, we provide the taxonomy of machine learning-based binary code analysis, describe the recent advances and key findings on the topic, and discuss the key challenges and opportunities. Finally, we present our thoughts for future directions on this topic.

...read moreread less

38 citations

Proceedings Article•DOI•

On the mono- and cross-language detection of text reuse and plagiarism

[...]

Alberto Barrón-Cedeño¹•Institutions (1)

Polytechnic University of Valencia¹

19 Jul 2010

TL;DR: The aim of this PhD thesis is to address three of the main problems in the development of better models for automatic plagiarism detection: the adequate identification of good potential sources for a given suspicious text, the detection of plagiarism despite modifications and the generation of standard collections of cases of plagiarisms and text reuse.

...read moreread less

Abstract: Plagiarism, the unacknowledged reuse of text, has increased in recent years due to the large amount of texts readily available. For instance, recent studies claim that nowadays a high rate of student reports include plagiarism, making manual plagiarism detection practically infeasible. Automatic plagiarism detection tools assist experts to analyse documents for plagiarism. Nevertheless, the lack of standard collections with cases of plagiarism has prevented accurate comparing models, making differences hard to appreciate. Seminal efforts on the detection of text reuse [2] have fostered the composition of standard resources for the accurate evaluation and comparison of methods. The aim of this PhD thesis is to address three of the main problems in the development of better models for automatic plagiarism detection: (i) the adequate identification of good potential sources for a given suspicious text; (ii) the detection of plagiarism despite modifications, such as words substitution and paraphrasing (special stress is given to cross-language plagiarism); and (iii) the generation of standard collections of cases of plagiarism and text reuse in order to provide a framework for accurate comparison of models. Regarding difficulties (i) and (ii) , we have carried out preliminary experiments over the METER corpus [2]. Given a suspicious document dq and a collection of potential source documents D, the process is divided in two steps. First, a small subset of potential source documents D* in D is retrieved. The documents d in D* are the most related to dq and, therefore, the most likely to include the source of the plagiarised fragments in it. We performed this stage on the basis of the Kullback-Leibler distance, over a subsample of document's vocabularies. Afterwards, a detailed analysis is carried out comparing dq to every d in D* in order to identify potential cases of plagiarism and their source. This comparison was made on the basis of word n-grams, by considering n = {2, 3}. These n-gram levels are flexible enough to properly retrieve plagiarised fragments and their sources despite modifications [1]. The result is offered to the user to take the final decision. Further experiments were done in both stages in order to compare other similarity measures, such as the cosine measure, the Jaccard coefficient and diverse fingerprinting and probabilistic models. One of the main weaknesses of currently available models is that they are unable to detect cross-language plagiarism. Approaching the detection of this kind of plagiarism is of high relevance, as the most of the information published is written in English, and authors in other languages may find it attractive to make use of direct translations. Our experiments, carried out over parallel and a comparable corpora, show that models of "standard" cross-language information retrieval are not enough. In fact, if the analysed source and target languages are related in some way (common linguistic ancestors or technical vocabulary), a simple comparison based on character n-grams seems to be the option. However, in those cases where the relation between the implied languages is weaker, other models, such as those based on statistical machine translation, are necessary [3]. We plan to perform further experiments, mainly to approach the detection of cross-language plagiarism. In order to do that, we will use the corpora developed under the framework of the PAN competition on plagiarism detection (cf. PAN@CLEF: http://pan.webis.de). Models that consider cross-language thesauri and comparison of cognates will also be applied.

...read moreread less

37 citations

Journal Article•DOI•

Integrating an online compiler and a plagiarism detection tool into the Moodle distance education system for easy assessment of programming assignments

[...]

Mümine Kaya¹, Selma Ayşe Özel²•Institutions (2)

Adana Science and Technology University¹, Çukurova University²

01 May 2015-Computer Applications in Engineering Education

TL;DR: It is observed that using the online compiler and the plagiarism detection tool reduces time and effort needed for the assessment of the programming assignments; prevents the authors' students from plagiarism; and increases their success in their programming based Data Structures course.

...read moreread less

Abstract: In this study, an online compiler and a source code plagiarism detection tool have been included into the Moodle based distance education system of our Computer Engineering department. For this purpose Moodle system has been extended with the GCC compiler, and the Moss source code plagiarism detection tool. We observed that using the online compiler and the plagiarism detection tool reduces time and effort needed for the assessment of the programming assignments; prevents our students from plagiarism; and increases their success in their programming based Data Structures course. © 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:363–373, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21606

...read moreread less

37 citations

Journal Article•DOI•

Plagiarism: A silent epidemic in scientific writing - Reasons, recognition and remedies.

[...]

Jyotindu Debnath¹•Institutions (1)

Armed Forces Medical College¹

01 Apr 2016-Medical journal, Armed Forces India

TL;DR: Regular usage of professional plagiarism detection tools for similarity checks with critical interpretation by the editorial team at the pre-review stage will certainly help in reducing the menace of plagiarism in submitted manuscripts.

...read moreread less

Abstract: Plagiarism is one of the most serious forms of scientific misconduct prevalent today and is an important reason for significant proportion of rejection of manuscripts and retraction of published articles. It is time for the medical fraternity to unanimously adopt a 'zero tolerance' policy towards this menace. While responsibility for ensuring a plagiarism-free manuscript primarily lies with the authors, editors cannot absolve themselves of their accountability. The only way to write a plagiarism-free manuscript for an author is to write an article in his/her own words, literally and figuratively. This article discusses various types of plagiarism, reasons for increasingly reported instances of plagiarism, pros and cons of use of plagiarism detection tools for detecting plagiarism and role of authors and editors in preventing/avoiding plagiarism in a submitted manuscript. Regular usage of professional plagiarism detection tools for similarity checks with critical interpretation by the editorial team at the pre-review stage will certainly help in reducing the menace of plagiarism in submitted manuscripts.

...read moreread less

37 citations

Collapse

Network Information

Performance

Metrics

1,976

Papers

29,005

Citations

No. of papers in the topic in previous years
Year	Papers
2023	59
2022	126
2021	83
2020	118
2019	130
2018	125

Plagiarism detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics