Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

doi:10.18653/V1/2021.WOAH-1.4

Open AccessProceedings ArticleDOI

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

Hannah Rose Kirk, +9 more

- pp 26-35

Chats0

TLDR

The authors collected hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset and found that hateful memes are more diverse than traditional memes.

Abstract:

Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to ‘memes in the wild’. In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that ‘memes in the wild’ differ in two key aspects: 1) Captions must be extracted via OCR, injecting noise and diminishing performance of multimodal models, and 2) Memes are more diverse than ‘traditional memes’, including screenshots of conversations or text on a plain background. This paper thus serves as a reality-check for the current benchmark of hateful meme detection and its applicability for detecting real world hate.

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

Citations

IMKG: The Internet Meme Knowledge Graph

Multi-channel Convolutional Neural Network for Precise Meme Classification

References

Visualizing Data using t-SNE

FaceNet: A unified embedding for face recognition and clustering

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

An Overview of the Tesseract OCR Engine

Automated Hate Speech Detection and the Problem of Offensive Language

Related Papers (5)

Meme extraction and tracing in crisis events

Tracking Large-Scale Video Remix in Real-World Events

On the Origins of Memes by Means of Fringe Web Communities

Visual memes in social media: tracking real-world news in YouTube videos

"Like Sheep Among Wolves": Characterizing Hateful Users on Twitter.