Home
/
Authors
/
Ben Verhoeven

Author

Ben Verhoeven

Bio: Ben Verhoeven is an academic researcher from University of Antwerp. The author has contributed to research in topics: Semantics & Noun. The author has an hindex of 14, co-authored 42 publications receiving 1130 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Automatic detection of cyberbullying in social media text

[...]

Cynthia Van Hee¹, Gilles Jacobs¹, Chris Emmery², Bart Desmet¹, Els Lefever¹, Ben Verhoeven², Guy De Pauw², Walter Daelemans², Veronique Hoste¹ - Show less +5 more•Institutions (2)

Ghent University¹, University of Antwerp²

08 Oct 2018-PLOS ONE

TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.

...read moreread less

Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

...read moreread less

231 citations

Overview of the 4th Author Profiling Task at PAN 2016: Cross-genre Evaluations

[...]

Francisco M. Rangel Pardo, Paolo Rosso, Ben Verhoeven, Walter Daelemans, Martin Potthast, Benno Stein - Show less +2 more

01 Jan 2016

TL;DR: The framework and the results of the Author Profiling task at PAN 2016, to predict age and gender from a cross-genre perspective, are presented.

...read moreread less

Abstract: This overview presents the framework and the results of the Author Profiling task at PAN 2016. The objective was to predict age and gender from a cross-genre perspective. For this purpose a corpus from Twitter has been provided for training, and different corpora from social media, blogs, essays, and reviews have been provided for evaluation. Altogether, the approaches of 22 participants were evaluated.

...read moreread less

184 citations

Proceedings Article•

Detection and Fine-Grained Classification of Cyberbullying Events

[...]

Cynthia Van Hee¹, Els Lefever¹, Ben Verhoeven², Julie Mennes¹, Bart Desmet¹, Guy De Pauw², Walter Daelemans², Veronique Hoste¹ - Show less +4 more•Institutions (2)

Ghent University¹, University of Antwerp²

01 Sep 2015

TL;DR: A new scheme for cyberbullying annotation is developed and applied, which describes the presence and severity of cyberbullies, a post author's role (harasser, victim or bystander) and a number of fine-grained categories related to cyber Bullying, such as insults and threats.

...read moreread less

Abstract: In the current era of online interactions, both positive and negative experiences are abundant on the Web. As in real life, negative experiences can have a serious impact on youngsters. Recent studies have reported cybervictimization rates among teenagers that vary between 20% and 40%. In this paper, we focus on cyberbullying as a particular form of cybervictimization and explore its automatic detection and fine-grained classification. Data containing cyberbullying was collected from the social networking site Ask.fm. We developed and applied a new scheme for cyberbullying annotation, which describes the presence and severity of cyberbullying, a post author's role (harasser, victim or bystander) and a number of fine-grained categories related to cyberbullying, such as insults and threats. We present experimental results on the automatic detection of cyberbullying and explore the feasibility of detecting the more fine-grained cyberbullying categories in online posts. For the first task, an F-score of 55.39% is obtained. We observe that the detection of the fine-grained categories (e.g. threats) is more challenging, presumably due to data sparsity, and because they are often expressed in a subtle and implicit way.

...read moreread less

140 citations

Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering.

[...]

Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, Martin Potthast - Show less +3 more

01 Jan 2017

TL;DR: This edition of PAN focuses on style breach detection and author clustering, two unsupervised authorship analysis tasks, and provides both benchmark data and an evaluation framework to compare different approaches.

...read moreread less

Abstract: Several authorship analysis tasks require the decomposition of a multiauthored text into its authorial components. In this regard two basic prerequisites need to be addressed: (1) style breach detection, i.e., the segmenting of a text into stylistically homogeneous parts, and (2) author clustering, i.e., the grouping of paragraph-length texts by authorship. In the current edition of PAN we focus on these two unsupervised authorship analysis tasks and provide both benchmark data and an evaluation framework to compare different approaches. We received three submissions for the style breach detection task and six submissions for the author clustering task; we analyze the submissions with different baselines while highlighting their strengths and weaknesses.

...read moreread less

113 citations

Overview of the author identification task at PAN 2014

[...]

Efstathios Stamatatos¹, Walter Daelemans², Ben Verhoeven², Benno Stein³, Martin Potthast³, Patrick Juola⁴, Miguel A. Sanchez-Perez⁵, Alberto Barrón-Cedeño⁶ - Show less +4 more•Institutions (6)

University of the Aegean¹, University of Antwerp², Bauhaus University, Weimar³, Duquesne University⁴, Instituto Politécnico Nacional⁵, Polytechnic University of Catalonia⁶

01 Jan 2014

TL;DR: The author identification task at PAN-2014 as mentioned in this paper focused on author verification, where the authors were given a set of documents by the same author along with exactly one document of questioned authorship.

...read moreread less

Abstract: The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.

...read moreread less

96 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Proceedings Article•DOI•

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

[...]

Zeerak Waseem¹, Dirk Hovy¹•Institutions (1)

University of Copenhagen¹

01 Jun 2016

TL;DR: A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.

...read moreread less

Abstract: Hate speech in the form of racist and sexist remarks are a common occurrence on social media. For that reason, many social media services address the problem of identifying hate speech, but the definition of hate speech varies markedly and is largely a manual effort (BBC, 2015; Lomas, 2015). We provide a list of criteria founded in critical race theory, and use them to annotate a publicly available corpus of more than 16k tweets. We analyze the impact of various extra-linguistic features in conjunction with character n-grams for hatespeech detection. We also present a dictionary based the most indicative words in our data.

...read moreread less

1,368 citations

Proceedings Article•DOI•

Exploiting Unintended Feature Leakage in Collaborative Learning

[...]

Luca Melis¹, Congzheng Song², Emiliano De Cristofaro¹, Vitaly Shmatikov²•Institutions (2)

University College London¹, Cornell University²

19 May 2019

TL;DR: In this article, passive and active inference attacks are proposed to exploit the leakage of information about participants' training data in federated learning, where each participant can infer the presence of exact data points and properties that hold only for a subset of the training data and are independent of the properties of the joint model.

...read moreread less

Abstract: Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

...read moreread less

1,084 citations

Proceedings Article•DOI•

A Survey on Hate Speech Detection using Natural Language Processing

[...]

Anna Schmidt, Michael Wiegand

01 Apr 2017

TL;DR: A survey on hate speech detection describes key areas that have been explored to automatically recognize these types of utterances using natural language processing and discusses limits of those approaches.

...read moreread less

Abstract: This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.

...read moreread less

1,030 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse