scispace - formally typeset
Open AccessPosted Content

Tackling Online Abuse: A Survey of Automated Abuse Detection Methods

Reads0
Chats0
TLDR
A comprehensive survey of the methods that have been proposed to date for automated abuse detection in the field of natural language processing (NLP), providing a platform for further development of this area.
Abstract
Abuse on the Internet represents an important societal problem of our time. Millions of Internet users face harassment, racism, personal attacks, and other types of abuse on online platforms. The psychological effects of such abuse on individuals can be profound and lasting. Consequently, over the past few years, there has been a substantial research effort towards automated abuse detection in the field of natural language processing (NLP). In this paper, we present a comprehensive survey of the methods that have been proposed to date, thus providing a platform for further development of this area. We describe the existing datasets and review the computational approaches to abuse detection, analyzing their strengths and limitations. We discuss the main trends that emerge, highlight the challenges that remain, outline possible solutions, and propose guidelines for ethics and explainability

read more

Citations
More filters
Book ChapterDOI

Overview of GermEval Task 2, 2019 shared task on the identification of offensive language

TL;DR: The second edition of the GermEval Shared Task on the Identification of Offensive Language deals with the classification of German tweets from Twitter, and introduces the Classification of offensive tweets as explicit or implicit as a novel subtask.
Proceedings ArticleDOI

HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language.

TL;DR: The results indicate that the proposed models combining BERT with lexical features help improve over a baseline BERT model in many of the in-domain and cross-domain experiments.
Posted Content

AMUSED: An Annotation Framework of Multi-modal Social Media Data

TL;DR: The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process by reducing the workload and problems behind the data annotation from the social media platforms.
Proceedings ArticleDOI

Joint Modelling of Emotion and Abusive Language Detection

TL;DR: The authors presented the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework that allows one task to inform the other, and showed that incorporating affective features leads to significant improvements in abuse detection performance across datasets.
Proceedings ArticleDOI

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

TL;DR: In this article, a human and model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models is presented, which includes 15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and target of hate.
References
More filters
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content

Semi-Supervised Classification with Graph Convolutional Networks

TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
Posted Content

Inductive Representation Learning on Large Graphs

TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Journal ArticleDOI

Enriching Word Vectors with Subword Information

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Proceedings Article

Distributed Representations of Sentences and Documents

TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.