Home
/
Authors
/
Andreas Eiselt

Author

Andreas Eiselt

Bio: Andreas Eiselt is an academic researcher from Bauhaus University, Weimar. The author has contributed to research in topics: Digital audio broadcasting & Plagiarism detection. The author has an hindex of 5, co-authored 10 publications receiving 606 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•

Overview of the 2nd International Competition on Plagiarism Detection

[...]

Martin Potthast, Alberto Barrón-Cedeño, Andreas Eiselt, Benno Stein, Paolo Rosso¹, Bauhaus-Universiät Weimar - Show less +2 more•Institutions (1)

Polytechnic University of Valencia¹

01 Jan 2011

TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.

...read moreread less

Abstract: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. We start with a unified retrieval process that sum- marizes the best practices employed this year. Then, the detectors' performances are evaluated in detail, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length. Finally, all results are compared to those of last year's competition.

...read moreread less

419 citations

Overview of the 1st international competition on plagiarism detection

[...]

Martin Potthast¹, Benno Stein¹, Andreas Eiselt¹, Alberto Barrón-Cedeño², Paolo Rosso² - Show less +1 more•Institutions (2)

Bauhaus University, Weimar¹, Polytechnic University of Valencia²

01 Jan 2009

TL;DR: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length.

...read moreread less

152 citations

Monolingual Text Similarity Measures: A Comparison of Models over Wikipedia Articles Revisions

[...]

Andreas Eiselt, Paolo Rosso

01 Jan 2009

TL;DR: An exhaustive comparison of similarity estimation models is carried out in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi).

...read moreread less

Abstract: Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. We have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison we introduce a publicly available corpus specially suited for this task. Furthermore we introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, our experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.

...read moreread less

20 citations

Dataset•DOI•

PAN Plagiarism Corpus 2009 (PAN-PC-09)

[...]

Martin Potthast, Benno Stein, Andreas Eiselt, Alberto Barrón-Cedeño, Paolo Rosso - Show less +1 more

10 Sep 2009

TL;DR: The PAN plagiarism corpus 2009 (PAN-PC-09) is a corpus for the evaluation of automatic plagiarism detection algorithms and can be used free of charge for research purposes.

...read moreread less

Abstract: This corpus is outdated. Please use its successor PAN-PC-11: https://doi.org/10.5281/zenodo.3250095 The PAN plagiarism corpus 2009 (PAN-PC-09) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge. The PAN-PC-09 contains documents in which artificial plagiarism has been inserted automatically. The plagiarism cases have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of random variables. The variables include the percentage of plagiarism in the whole corpus, the percentage of plagiarism per document, the length of a single plagiarized section, and the degree of obfuscation per plagiarized section.

...read moreread less

16 citations

Proceedings Article•

A Two-Step Named Entity Recognizer for Open-Domain Search Queries

[...]

Andreas Eiselt¹, Alejandro Figueroa²•Institutions (2)

Bauhaus University, Weimar¹, University of Concepción²

01 Oct 2013

TL;DR: This paper proposes a two-step strategy that takes advantage of binary labels for categorizing query terms into a pre-defined set of 28 named entity classes and shows that this strategy is promising by outperforming a one-step traditional baseline by more than 10%.

...read moreread less

Abstract: Named entity recognition in queries is the task of identifying sequences of terms in search queries that refer to a unique concept. This problem is catching increasing attention, since the lack of context in short queries makes this task difficult for full-text off-the-shelf named entity recognizers. In this paper, we propose to deal with this problem in a two-step fashion. The first step classifies each query term as token or part of a named entity. The second step takes advantage of these binary labels for categorizing query terms into a pre-defined set of 28 named entity classes. Our results show that our two-step strategy is promising by outperforming a one-step traditional baseline by more than 10%.

...read moreread less

16 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•

Overview of the 2nd International Competition on Plagiarism Detection

[...]

Martin Potthast, Alberto Barrón-Cedeño, Andreas Eiselt, Benno Stein, Paolo Rosso¹, Bauhaus-Universiät Weimar - Show less +2 more•Institutions (1)

Polytechnic University of Valencia¹

01 Jan 2011

...read moreread less

419 citations

Proceedings Article•

An Evaluation Framework for Plagiarism Detection

[...]

Martin Potthast¹, Benno Stein¹, Alberto Barrón-Cedeño², Paolo Rosso²•Institutions (2)

Bauhaus University, Weimar¹, Polytechnic University of Valencia²

23 Aug 2010

TL;DR: Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

...read moreread less

Abstract: We present an evaluation framework for plagiarism detection. The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

...read moreread less

327 citations

Overview of the 2nd Author Profiling Task at PAN 2014

[...]

Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, Giacomo Inches - Show less +1 more

01 Jan 2013

TL;DR: The framework and results for the Author Pro- filing task at PAN 2013 are presented and the evaluation framework used to measure the participants performance to solve the problem of identifying age and gender from anonymous texts is described.

...read moreread less

Abstract: The PAN task on author profiling has been organised in the framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP 7 Marie Curie People Framework of the European Commission. We would like to thank Atribus by Corex for sponsoring the award for the winner team. We thank Julio Gonzalo, Jorge Carrillo and Damiano Spina from UNED for helping with the Twitter subcorpus. The work of the first author was partially funded by Autoritas Consulting SA and by Ministerio de Economia y Competitividad de Espana under grant ECOPORTUNITY IPT-2012-1220-430000 and CSO2013-43054-R. The work of the second author was in the framework the DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.

...read moreread less

290 citations

Journal Article•DOI•

Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

[...]

Salha Alzahrani¹, Naomie Salim², Ajith Abraham³•Institutions (3)

Taif University¹, Universiti Teknologi Malaysia², University of Ostrava³

01 Mar 2012

TL;DR: A new taxonomy of plagiarism is presented that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view, and supports deep understanding of different linguistic patterns in committing plagiarism.

...read moreread less

Abstract: Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.

...read moreread less

275 citations

Book Chapter•DOI•

Improving the Reproducibility of PAN’s Shared Tasks:

[...]

Martin Potthast¹, Tim Gollub¹, Francisco Rangel², Paolo Rosso², Efstathios Stamatatos³, Benno Stein¹ - Show less +2 more•Institutions (3)

Bauhaus University, Weimar¹, Polytechnic University of Valencia², University of the Aegean³

15 Sep 2014

TL;DR: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling, which forms the largest collection of softwares for these tasks to date.

...read moreread less

Abstract: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. To improve the reproducibility of shared tasks in general, and PAN’s tasks in particular, the Webis group developed a new web service called TIRA, which facilitates software submissions. Unlike many other labs, PAN asks participants to submit running softwares instead of their run output. To deal with the organizational overhead involved in handling software submissions, the TIRA experimentation platform helps to significantly reduce the workload for both participants and organizers, whereas the submitted softwares are kept in a running state. This year, we addressed the matter of responsibility of successful execution of submitted softwares in order to put participants back in charge of executing their software at our site. In sum, 57 softwares have been submitted to our lab; together with the 58 software submissions of last year, this forms the largest collection of softwares for our three tasks to date, all of which are readily available for further analysis. The report concludes with a brief summary of each task.

...read moreread less

171 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

Collapse