Showing papers by "Walter Daelemans published in 2018"

PDF

Open Access

Journal Article•DOI•

Automatic detection of cyberbullying in social media text

[...]

Cynthia Van Hee¹, Gilles Jacobs¹, Chris Emmery², Bart Desmet¹, Els Lefever¹, Ben Verhoeven², Guy De Pauw², Walter Daelemans², Veronique Hoste¹ - Show less +5 more•Institutions (2)

Ghent University¹, University of Antwerp²

08 Oct 2018-PLOS ONE

TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.

...read moreread less

Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

...read moreread less

231 citations

Overview of the Author Identification Task at PAN-2018: Cross-domain Authorship Attribution and Style Change Detection.

[...]

Mike Kestemont, Michael Tschuggnall, Efstathios Stamatatos, Walter Daelemans, Günther Specht, Benno Stein, Martin Potthast - Show less +3 more

01 Jan 2018

TL;DR: This edition of PAN studies two task, the novel task of cross-domain authorship attribution, where the texts of known and unknown authorship belong to different domains, and style change detection, where single-author and multi-author texts are to be distinguished.

...read moreread less

Abstract: Author identification attempts to reveal the authors behind texts. It is an emerging area of research associated with applications in literary research, cyber-security, forensics, and social media analysis. In this edition of PAN, we study two task, the novel task of cross-domain authorship attribution, where the texts of known and unknown authorship belong to different domains, and style change detection, where single-author and multi-author texts are to be distinguished. For the former task, we make use of fanfiction texts, a large part of contemporary fiction written by non-professional authors who are inspired by specific well-known works, to enable us control the domain of texts for the first time. We describe a new corpus of fanfiction texts covering five languages (English, French, Italian, Polish, and Spanish). For the latter, a new data set of Q&As covering multiple topics in English is introduced. We received 11 submissions for the cross-domain authorship attribution task and 5 submissions for the style change detection task. A survey of participant methods and analytical evaluation results are presented in this paper.

...read moreread less

84 citations

Proceedings Article•DOI•

CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension

[...]