Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
Hannah Rose Kirk,Yennie Jun,Paulius Rauba,Gal Wachtel,Ruining Li,Xingjian Bai,Noah Broestl,Martin Doff-Sotta,Aleksandar Shtedritski,Yuki M. Asano +9 more
- pp 26-35
Reads0
Chats0
TLDR
The authors collected hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset and found that hateful memes are more diverse than traditional memes.Abstract:
Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to ‘memes in the wild’. In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that ‘memes in the wild’ differ in two key aspects: 1) Captions must be extracted via OCR, injecting noise and diminishing performance of multimodal models, and 2) Memes are more diverse than ‘traditional memes’, including screenshots of conversations or text on a plain background. This paper thus serves as a reality-check for the current benchmark of hateful meme detection and its applicability for detecting real world hate.read more
Citations
More filters
Book ChapterDOI
IMKG: The Internet Meme Knowledge Graph
TL;DR: The Internet Meme Knowledge Graph (IMKG) as discussed by the authors is an explicit representation with 2 million edges that capture the semantics encoded in the text, vision, and metadata of thousands of media frames and their adaptations as memes.
Proceedings ArticleDOI
Multi-channel Convolutional Neural Network for Precise Meme Classification
TL;DR: This article proposed a multi-channel convolutional neural network (MC-CNN) for classifying memes and non-memes, which is trained and validated on a challenging dataset with textual attributes, which are also circulated online but rarely accounted for in meme classification tasks.
References
More filters
Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings ArticleDOI
FaceNet: A unified embedding for face recognition and clustering
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
Joy Buolamwini,Timnit Gebru +1 more
TL;DR: It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.
Proceedings ArticleDOI
An Overview of the Tesseract OCR Engine
TL;DR: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.
Proceedings Article
Automated Hate Speech Detection and the Problem of Offensive Language
TL;DR: This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.