The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Open AccessPosted Content

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Douwe Kiela, +6 more

- 10 May 2020 -

arXiv: Artificial Intelligence

Chats0

TLDR

The authors proposed a new challenge set for multimodal classification, focusing on detecting hate speech in multi-modal memes, where difficult examples are added to the dataset to make it hard to rely on unimodal signals.

Abstract:

This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.

Citations

PDF

Open Access

More filters

Posted Content

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, +11 more

- 26 Feb 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a pre-training task of predicting which caption goes with which image is used to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

...read moreread less

Proceedings ArticleDOI

Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages

Thomas Mandl, +6 more

TL;DR: The HASOC track intends to stimulate development in Hate Speech for Hindi, German and English by identifying Hate Speech in Social Media using LSTM networks processing word embedding input.

...read moreread less

Journal ArticleDOI

Directions in abusive language training data, a systematic review: Garbage in, garbage out.

Bertie Vidgen, +1 more

- 28 Dec 2020 -

PLOS ONE

TL;DR: This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data leading to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

...read moreread less

Proceedings ArticleDOI

TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

Amanpreet Singh, +5 more

TL;DR: TextOCR as discussed by the authors is an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset, which can do scene text based reasoning on an image in an end-to-end fashion.

...read moreread less

Posted Content

Tackling Online Abuse: A Survey of Automated Abuse Detection Methods

Pushkar Mishra, +2 more

- 13 Aug 2019 -

arXiv: Computation and Language

TL;DR: A comprehensive survey of the methods that have been proposed to date for automated abuse detection in the field of natural language processing (NLP), providing a platform for further development of this area.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text

Mingda Zhang, +2 more

TL;DR: A variety of features that capture the creativity of images and the specificity or ambiguity of text, as well as methods that analyze the semantics within and across channels are developed.

...read moreread less

Proceedings ArticleDOI

An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

Jean-Benoit Delbrouck, +1 more

- 04 Jul 2017 -

arXiv: Computation and Language

TL;DR: This article used an attention mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word and achieved state-of-the-art results on the Multi30k data set.

...read moreread less

Proceedings Article

Grounded Textual Entailment.

Hoa Trong Vu, +8 more

TL;DR: The authors compare blind and visual-augmented models of textual entailment and show that visual information is beneficial, but also conduct an in-depth error analysis that reveals that current multimodal models are not performing "grounding" in an optimal fashion.

...read moreread less

Posted Content

Grounded Textual Entailment

Hoa Trong Vu, +8 more

- 14 Jun 2018 -

arXiv: Computation and Language

TL;DR: This paper argues for a visually-grounded version of the Textual Entailment task, and asks whether models can perform better if, in addition to P and H, there is also an image (corresponding to the relevant “world” or “situation”).

...read moreread less

Posted Content

Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text

Mingda Zhang, +2 more

- 21 Jul 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a dataset of advertisement interpretations for whether the image and slogan in the same visual advertisement form a parallel (conveying the same message without literally saying the same thing) or non-parallel relationship is collected, with the help of workers recruited on Amazon Mechanical Turk.

...read moreread less

Collapse

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Citations

Learning Transferable Visual Models From Natural Language Supervision

Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages

Directions in abusive language training data, a systematic review: Garbage in, garbage out.

TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

Tackling Online Abuse: A Survey of Automated Abuse Detection Methods

References

Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text

An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

Grounded Textual Entailment.

Grounded Textual Entailment

Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Deep Residual Learning for Image Recognition

Attention is All you Need

Faster R-CNN: towards real-time object detection with region proposal networks