scispace - formally typeset
Search or ask a question
Author

Guy De Pauw

Other affiliations: University of Nairobi
Bio: Guy De Pauw is an academic researcher from University of Antwerp. The author has contributed to research in topics: Machine translation & Grammar. The author has an hindex of 19, co-authored 62 publications receiving 1189 citations. Previous affiliations of Guy De Pauw include University of Nairobi.


Papers
More filters
Journal ArticleDOI
08 Oct 2018-PLOS ONE
TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.
Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

231 citations

Proceedings Article
01 Sep 2015
TL;DR: A new scheme for cyberbullying annotation is developed and applied, which describes the presence and severity of cyberbullies, a post author's role (harasser, victim or bystander) and a number of fine-grained categories related to cyber Bullying, such as insults and threats.
Abstract: In the current era of online interactions, both positive and negative experiences are abundant on the Web. As in real life, negative experiences can have a serious impact on youngsters. Recent studies have reported cybervictimization rates among teenagers that vary between 20% and 40%. In this paper, we focus on cyberbullying as a particular form of cybervictimization and explore its automatic detection and fine-grained classification. Data containing cyberbullying was collected from the social networking site Ask.fm. We developed and applied a new scheme for cyberbullying annotation, which describes the presence and severity of cyberbullying, a post author's role (harasser, victim or bystander) and a number of fine-grained categories related to cyberbullying, such as insults and threats. We present experimental results on the automatic detection of cyberbullying and explore the feasibility of detecting the more fine-grained cyberbullying categories in online posts. For the first task, an F-score of 55.39% is obtained. We observe that the detection of the fine-grained categories (e.g. threats) is more challenging, presumably due to data sparsity, and because they are often expressed in a subtle and implicit way.

140 citations

Journal ArticleDOI
TL;DR: The findings of this study confirm the traditional view that speech rate is determined by extralinguistic variables, but also suggest that there may be intrinsic tempo differences between language varieties.
Abstract: This paper investigates speech rate in two standard national varieties of Dutch on the basis of 160 15 mins conversations with native speakers who belong to four different regions in the Netherlands and four in the Dutch-speaking part of Belgium (Flanders). Speech rate was quantified as articulation rate and speaking rate, both expressed as the number of syllables per second (syll/s). The results show a significant effect of speakers' country of origin: subjects in the Netherlands speak 16% faster than subjects in Belgium (articulation: 5.05 vs. 4.23 syll/s, speaking: 4.23 vs. 4.00 syll/s). In addition, the independent variable sex was also found to be significant: on average, men speak 6% faster than women (articulation: 4.79 vs. 4.50 syll/s, speaking: 4.23 vs. 4.01 syll/s). The independent variable age was significant too: younger subjects speak 5% faster than older ones (articulation: 4.78 vs. 4.52 syll/s, speaking: 4.23 vs. 4.01 syll/s). The findings of this study confirm the traditional view that speech rate is determined by extralinguistic variables, but also suggest that there may be intrinsic tempo differences between language varieties.

136 citations

Journal ArticleDOI
25 Nov 2019
TL;DR: In this article, the authors present a study of the online discussion forum Incelsme and its users, involuntary celibates or incels, who see women as the cause of their problems and often use the forum for misogynistic hate speech and other forms of incitement.
Abstract: This paper presents a study of the (now suspended) online discussion forum Incelsme and its users, involuntary celibates or incels, a virtual community of isolated men without a sexual life, who see women as the cause of their problems and often use the forum for misogynistic hate speech and other forms of incitement Involuntary celibates have attracted media attention and concern, after a killing spree in April 2018 in Toronto, Canada The aim of this study is to shed light on the group dynamics of the incel community, by applying mixed-methods quantitative and qualitative approaches to analyze how the users of the forum create in-group identity and how they construct major out-groups, particularly women We investigate the vernacular used by incels, apply automatic profiling techniques to determine who they are, discuss the hate speech posted in the forum, and propose a Deep Learning system that is able to detect instances of misogyny, homophobia, and racism, with approximately 95% accuracy

109 citations

11 Oct 2015
TL;DR: This paper presents the construction and annotation of a corpus of Dutch social media posts annotated with fine-grained cyberbullying-related text categories, such as insults and threats, and describes the specific participants (harasser, victim or bystander) in a cyberbullies conversation to enhance the analysis of human interactions involving cyber Bullying.
Abstract: The recent development of social media poses new challenges to the research community in analyzing online interactions between people. Social networking sites offer great opportunities for connecting with others, but also increase the vulnerability of young people to undesirable phenomena, such as cybervictimization. Recent research reports that on average, 20% to 40% of all teenagers have been victimized online. In this paper, we focus on cyberbullying as a particular form of cybervictimization. Successful prevention depends on the adequate detection of potentially harmful messages. However, given the massive information overload on the Web, there is a need for intelligent systems to identify potential risks automatically. We present the construction and annotation of a corpus of Dutch social media posts annotated with fine-grained cyberbullying-related text categories, such as insults and threats. Also, the specific participants (harasser, victim or bystander) in a cyberbullying conversation are identified to enhance the analysis of human interactions involving cyberbullying. Apart from describing our dataset construction and annotation, we present proof-of-concept experiments on the automatic identification of cyberbullying events and fine-grained cyberbullying categories.

73 citations


Cited by
More filters
Proceedings ArticleDOI
01 Apr 2017
TL;DR: A survey on hate speech detection describes key areas that have been explored to automatically recognize these types of utterances using natural language processing and discusses limits of those approaches.
Abstract: This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.

1,030 citations

Journal ArticleDOI

764 citations

Journal ArticleDOI
TL;DR: This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and main features used, and provides a unifying definition of hate speech.
Abstract: The scientific study of hate speech, from a computer science point of view, is recent. This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and main features used. This work also discusses the complexity of the concept of hate speech, defined in many platforms and contexts, and provides a unifying definition. This area has an unquestionable potential for societal impact, particularly in online communities and digital media platforms. The development and systematization of shared resources, such as guidelines, annotated datasets in multiple languages, and algorithms, is a crucial step in advancing the automatic detection of hate speech.

728 citations