scispace - formally typeset
Open AccessBook ChapterDOI

Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Reads0
Chats0
TLDR
This contribution evaluates a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German) and focuses on classifying the data according to the annotated characteristics using several text classification algorithms.
Abstract
The sheer ease with which abusive and hateful utterances can be made online – typically from the comfort of your home and the lack of any immediate negative repercussions – using today’s digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Detection of Abusive Language: the Problem of Biased Datasets

TL;DR: It is shown that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced, most notably on datasets that are created by focused sampling instead of random sampling.
Proceedings ArticleDOI

From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles

TL;DR: This work wants to contribute to the debate on how to deal with fake news and related online phenomena with technological means, by providing means to separate related from unrelated headlines and further classifying the related headlines.
Journal ArticleDOI

Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance

TL;DR: An approach to detect and visualize online aggression, a special case of hate speech, over social media, and designed a user interface based on a web browser plugin over Facebook and Twitter to visualize the aggressive comments posted on the Social media user’s timelines.
Journal ArticleDOI

Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis

TL;DR: This work is a systematic literature review to gather, explore, comprehend and analyze the research trends, gaps and prospects of this alliance of using soft computing techniques for cyberbullying detection on social multimedia using a meta-analytic approach.
Proceedings ArticleDOI

Weakly supervised cyberbullying detection using co-trained ensembles of embedding models

TL;DR: The effectiveness of the approach is evaluated using post-hoc, crowdsourced annotation of Twitter, Ask.fm, and Instagram data, finding that the deep ensembles outperform previous non-deep methods for weakly supervised harassment detection.
References
More filters
Proceedings ArticleDOI

An empirical comparison of supervised learning algorithms

TL;DR: A large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps is presented.
Book

Language and the Internet

TL;DR: Covering a range of Internet genres, including e-mail, chat, and the Web, this is a revealing account of how the Internet is radically changing the way the authors use language.
Proceedings ArticleDOI

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

TL;DR: A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.
Proceedings ArticleDOI

A Survey on Hate Speech Detection using Natural Language Processing

TL;DR: A survey on hate speech detection describes key areas that have been explored to automatically recognize these types of utterances using natural language processing and discusses limits of those approaches.
Proceedings ArticleDOI

Abusive Language Detection in Online User Content

TL;DR: A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.