Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication
Peter Bourgonje,Julián Moreno-Schneider,Ankit Srivastava,Georg Rehm +3 more
- pp 180-191
Reads0
Chats0
TLDR
This contribution evaluates a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German) and focuses on classifying the data according to the annotated characteristics using several text classification algorithms.Abstract:
The sheer ease with which abusive and hateful utterances can be made online – typically from the comfort of your home and the lack of any immediate negative repercussions – using today’s digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.read more
Citations
More filters
Proceedings ArticleDOI
Detection of Abusive Language: the Problem of Biased Datasets
TL;DR: It is shown that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced, most notably on datasets that are created by focused sampling instead of random sampling.
Proceedings ArticleDOI
From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles
TL;DR: This work wants to contribute to the debate on how to deal with fake news and related online phenomena with technological means, by providing means to separate related from unrelated headlines and further classifying the related headlines.
Journal ArticleDOI
Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance
TL;DR: An approach to detect and visualize online aggression, a special case of hate speech, over social media, and designed a user interface based on a web browser plugin over Facebook and Twitter to visualize the aggressive comments posted on the Social media user’s timelines.
Journal ArticleDOI
Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis
Akshi Kumar,Nitin Sachdeva +1 more
TL;DR: This work is a systematic literature review to gather, explore, comprehend and analyze the research trends, gaps and prospects of this alliance of using soft computing techniques for cyberbullying detection on social multimedia using a meta-analytic approach.
Proceedings ArticleDOI
Weakly supervised cyberbullying detection using co-trained ensembles of embedding models
Elaheh Raisi,Bert Huang +1 more
TL;DR: The effectiveness of the approach is evaluated using post-hoc, crowdsourced annotation of Twitter, Ask.fm, and Instagram data, finding that the deep ensembles outperform previous non-deep methods for weakly supervised harassment detection.
References
More filters
Proceedings ArticleDOI
An empirical comparison of supervised learning algorithms
TL;DR: A large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps is presented.
Book
Language and the Internet
TL;DR: Covering a range of Internet genres, including e-mail, chat, and the Web, this is a revealing account of how the Internet is radically changing the way the authors use language.
Proceedings ArticleDOI
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
Zeerak Waseem,Dirk Hovy +1 more
TL;DR: A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.
Proceedings ArticleDOI
A Survey on Hate Speech Detection using Natural Language Processing
Anna Schmidt,Michael Wiegand +1 more
TL;DR: A survey on hate speech detection describes key areas that have been explored to automatically recognize these types of utterances using natural language processing and discusses limits of those approaches.
Proceedings ArticleDOI
Abusive Language Detection in Online User Content
TL;DR: A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.