Open AccessPosted Content
Tackling Online Abuse: A Survey of Automated Abuse Detection Methods
Reads0
Chats0
TLDR
A comprehensive survey of the methods that have been proposed to date for automated abuse detection in the field of natural language processing (NLP), providing a platform for further development of this area.Abstract:
Abuse on the Internet represents an important societal problem of our time. Millions of Internet users face harassment, racism, personal attacks, and other types of abuse on online platforms. The psychological effects of such abuse on individuals can be profound and lasting. Consequently, over the past few years, there has been a substantial research effort towards automated abuse detection in the field of natural language processing (NLP). In this paper, we present a comprehensive survey of the methods that have been proposed to date, thus providing a platform for further development of this area. We describe the existing datasets and review the computational approaches to abuse detection, analyzing their strengths and limitations. We discuss the main trends that emerge, highlight the challenges that remain, outline possible solutions, and propose guidelines for ethics and explainabilityread more
Citations
More filters
Book ChapterDOI
Overview of GermEval Task 2, 2019 shared task on the identification of offensive language
TL;DR: The second edition of the GermEval Shared Task on the Identification of Offensive Language deals with the classification of German tweets from Twitter, and introduces the Classification of offensive tweets as explicit or implicit as a novel subtask.
Proceedings ArticleDOI
HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language.
TL;DR: The results indicate that the proposed models combining BERT with lexical features help improve over a baseline BERT model in many of the in-domain and cross-domain experiments.
Posted Content
AMUSED: An Annotation Framework of Multi-modal Social Media Data
TL;DR: The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process by reducing the workload and problems behind the data annotation from the social media platforms.
Proceedings ArticleDOI
Joint Modelling of Emotion and Abusive Language Detection
TL;DR: The authors presented the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework that allows one task to inform the other, and showed that incorporating affective features leads to significant improvements in abuse detection performance across datasets.
Proceedings ArticleDOI
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
TL;DR: In this article, a human and model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models is presented, which includes 15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and target of hate.
References
More filters
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf,Max Welling +1 more
TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
Posted Content
Inductive Representation Learning on Large Graphs
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Journal ArticleDOI
Enriching Word Vectors with Subword Information
TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Proceedings Article
Distributed Representations of Sentences and Documents
Quoc V. Le,Tomas Mikolov +1 more
TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Related Papers (5)
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
Zeerak Waseem,Dirk Hovy +1 more
A Survey on Hate Speech Detection using Natural Language Processing
Anna Schmidt,Michael Wiegand +1 more