Tackling Online Abuse: A Survey of Automated Abuse Detection Methods

Open AccessPosted Content

Tackling Online Abuse: A Survey of Automated Abuse Detection Methods

Pushkar Mishra, +2 more

- 13 Aug 2019 -

arXiv: Computation and Language

Chats0

TLDR

A comprehensive survey of the methods that have been proposed to date for automated abuse detection in the field of natural language processing (NLP), providing a platform for further development of this area.

Abstract:

Abuse on the Internet represents an important societal problem of our time. Millions of Internet users face harassment, racism, personal attacks, and other types of abuse on online platforms. The psychological effects of such abuse on individuals can be profound and lasting. Consequently, over the past few years, there has been a substantial research effort towards automated abuse detection in the field of natural language processing (NLP). In this paper, we present a comprehensive survey of the methods that have been proposed to date, thus providing a platform for further development of this area. We describe the existing datasets and review the computational approaches to abuse detection, analyzing their strengths and limitations. We discuss the main trends that emerge, highlight the challenges that remain, outline possible solutions, and propose guidelines for ethics and explainability

Citations

PDF

Open Access

More filters

Book ChapterDOI

Overview of GermEval Task 2, 2019 shared task on the identification of offensive language

Julia Maria Struß, +4 more

TL;DR: The second edition of the GermEval Shared Task on the Identification of Offensive Language deals with the classification of German tweets from Twitter, and introduces the Classification of offensive tweets as explicit or implicit as a novel subtask.

...read moreread less

Proceedings ArticleDOI

HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language.

Anna Koufakou, +3 more

TL;DR: The results indicate that the proposed models combining BERT with lexical features help improve over a baseline BERT model in many of the in-domain and cross-domain experiments.

...read moreread less

Posted Content

AMUSED: An Annotation Framework of Multi-modal Social Media Data

Gautam Kishore Shahi

- 01 Oct 2020 -

arXiv: Social and Information Networks

TL;DR: The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process by reducing the workload and problems behind the data annotation from the social media platforms.

...read moreread less

Proceedings ArticleDOI

Joint Modelling of Emotion and Abusive Language Detection

Santhosh Rajamanickam, +3 more

TL;DR: The authors presented the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework that allows one task to inform the other, and showed that incorporating affective features leads to significant improvements in abuse detection performance across datasets.

...read moreread less

Proceedings ArticleDOI

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

Bertie Vidgen, +3 more

TL;DR: In this article, a human and model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models is presented, which includes 15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and target of hate.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

Semi-Supervised Classification with Graph Convolutional Networks

Thomas Kipf, +1 more

- 09 Sep 2016 -

arXiv: Learning

TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.

...read moreread less

Posted Content

Inductive Representation Learning on Large Graphs

William L. Hamilton, +2 more

- 07 Jun 2017 -

arXiv: Social and Information Networks

TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.

...read moreread less

Journal ArticleDOI

Enriching Word Vectors with Subword Information

Piotr Bojanowski, +3 more

- 12 Jun 2017 -

Transactions of the Association for Comp...

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Proceedings Article

Distributed Representations of Sentences and Documents

Quoc V. Le, +1 more

TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

...read moreread less

Collapse

Tackling Online Abuse: A Survey of Automated Abuse Detection Methods

Citations

Overview of GermEval Task 2, 2019 shared task on the identification of offensive language

HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language.

AMUSED: An Annotation Framework of Multi-modal Social Media Data

Joint Modelling of Emotion and Abusive Language Detection

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Semi-Supervised Classification with Graph Convolutional Networks

Inductive Representation Learning on Large Graphs

Enriching Word Vectors with Subword Information

Distributed Representations of Sentences and Documents

Related Papers (5)

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

A Survey on Hate Speech Detection using Natural Language Processing

Automated Hate Speech Detection and the Problem of Offensive Language

Abusive Language Detection in Online User Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding