Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

doi:10.1007/978-3-319-73706-5_15

Open AccessBook ChapterDOI

Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Peter Bourgonje, +3 more

- pp 180-191

Chats0

TLDR

This contribution evaluates a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German) and focuses on classifying the data according to the annotated characteristics using several text classification algorithms.

Abstract:

The sheer ease with which abusive and hateful utterances can be made online – typically from the comfort of your home and the lack of any immediate negative repercussions – using today’s digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Detection of Abusive Language: the Problem of Biased Datasets

Michael Wiegand, +2 more

TL;DR: It is shown that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced, most notably on datasets that are created by focused sampling instead of random sampling.

...read moreread less

Proceedings ArticleDOI

From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles

Peter Bourgonje, +2 more

TL;DR: This work wants to contribute to the debate on how to deal with fake news and related online phenomena with technological means, by providing means to separate related from unrelated headlines and further classifying the related headlines.

...read moreread less

Journal ArticleDOI

Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance

Sandip Modha, +4 more

- 15 Dec 2020 -

Expert Systems With Applications

TL;DR: An approach to detect and visualize online aggression, a special case of hate speech, over social media, and designed a user interface based on a web browser plugin over Facebook and Twitter to visualize the aggressive comments posted on the Social media user’s timelines.

...read moreread less

Journal ArticleDOI

Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis

Akshi Kumar, +1 more

- 01 Sep 2019 -

Multimedia Tools and Applications

TL;DR: This work is a systematic literature review to gather, explore, comprehend and analyze the research trends, gaps and prospects of this alliance of using soft computing techniques for cyberbullying detection on social multimedia using a meta-analytic approach.

...read moreread less

Proceedings ArticleDOI

Weakly supervised cyberbullying detection using co-trained ensembles of embedding models

Elaheh Raisi, +1 more

TL;DR: The effectiveness of the approach is evaluated using post-hoc, crowdsourced annotation of Twitter, Ask.fm, and Instagram data, finding that the deep ensembles outperform previous non-deep methods for weakly supervised harassment detection.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

An empirical comparison of supervised learning algorithms

Rich Caruana, +1 more

TL;DR: A large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps is presented.

...read moreread less

Book

Language and the Internet

David Crystal

TL;DR: Covering a range of Internet genres, including e-mail, chat, and the Web, this is a revealing account of how the Internet is radically changing the way the authors use language.

...read moreread less

Proceedings ArticleDOI

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

Zeerak Waseem, +1 more

TL;DR: A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.

...read moreread less

Proceedings ArticleDOI

A Survey on Hate Speech Detection using Natural Language Processing

Anna Schmidt, +1 more

TL;DR: A survey on hate speech detection describes key areas that have been explored to automatically recognize these types of utterances using natural language processing and discusses limits of those approaches.

...read moreread less

Proceedings ArticleDOI

Abusive Language Detection in Online User Content

Chikashi Nobata, +4 more

TL;DR: A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.

...read moreread less

Collapse

Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Citations

Detection of Abusive Language: the Problem of Biased Datasets

From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles

Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance

Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis

Weakly supervised cyberbullying detection using co-trained ensembles of embedding models

References

An empirical comparison of supervised learning algorithms

Language and the Internet

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

A Survey on Hate Speech Detection using Natural Language Processing

Abusive Language Detection in Online User Content

Related Papers (5)

Abusive Language Detection in Online User Content

Detecting Hate Speech on the World Wide Web

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

Ex Machina: Personal Attacks Seen at Scale

Hate Speech Detection with Comment Embeddings