scispace - formally typeset
Open AccessProceedings ArticleDOI

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

Reads0
Chats0
TLDR
The authors collected hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset and found that hateful memes are more diverse than traditional memes.
Abstract
Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to ‘memes in the wild’. In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that ‘memes in the wild’ differ in two key aspects: 1) Captions must be extracted via OCR, injecting noise and diminishing performance of multimodal models, and 2) Memes are more diverse than ‘traditional memes’, including screenshots of conversations or text on a plain background. This paper thus serves as a reality-check for the current benchmark of hateful meme detection and its applicability for detecting real world hate.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

IMKG: The Internet Meme Knowledge Graph

TL;DR: The Internet Meme Knowledge Graph (IMKG) as discussed by the authors is an explicit representation with 2 million edges that capture the semantics encoded in the text, vision, and metadata of thousands of media frames and their adaptations as memes.
Proceedings ArticleDOI

Multi-channel Convolutional Neural Network for Precise Meme Classification

TL;DR: This article proposed a multi-channel convolutional neural network (MC-CNN) for classifying memes and non-memes, which is trained and validated on a challenging dataset with textual attributes, which are also circulated online but rarely accounted for in meme classification tasks.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings ArticleDOI

FaceNet: A unified embedding for face recognition and clustering

TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

TL;DR: It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.
Proceedings ArticleDOI

An Overview of the Tesseract OCR Engine

TL;DR: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.
Proceedings Article

Automated Hate Speech Detection and the Problem of Offensive Language

TL;DR: This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Related Papers (5)