Open AccessPosted Content
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela,Hamed Firooz,Aravind Mohan,Vedanuj Goswami,Amanpreet Singh,Pratik Ringshia,Davide Testuggine +6 more
Reads0
Chats0
TLDR
The authors proposed a new challenge set for multimodal classification, focusing on detecting hate speech in multi-modal memes, where difficult examples are added to the dataset to make it hard to rely on unimodal signals.Abstract:
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.read more
Citations
More filters
Posted Content
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge.
Riza Velioglu,Jewgeni Rose +1 more
TL;DR: This article used VisualBERT to detect hate speech in multimodal memes and achieved an accuracy of 0.765 on the challenge test set and placed third out of 3,173 participants.
Journal ArticleDOI
AOMD: An analogy-aware approach to offensive meme detection on social media
TL;DR: Zhang et al. as discussed by the authors developed a deep learning based Analogy-aware Offensive Meme Detection (AOMD) framework to learn the implicit analogy from the multi-modal contents of the meme and effectively detect offensive analogy memes.
Journal ArticleDOI
Combating the hate speech in Thai textual memes
TL;DR: The Thai textual meme detection is introduced as a new research problem in Thai natural language processing (Thailand-NLP) that is the settlement of transmission linkage between scene text localization, Thai optical recognition (Thai-OCR) and language understanding.
Posted Content
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
TL;DR: This paper proposed Mask-and-Predict (MOP) pre-training on text-only and image-only corpora and introduced the object tags detected by an object recognition model as anchor points to bridge two modalities.
Posted Content
A Survey on Multimodal Disinformation Detection
Firoj Alam,Stefano Cresci,Tanmoy Chakraborty,Fabrizio Silvestri,Dimiter Dimitrov,Giovanni Da San Martino,Shaden Shaar,Hamed Firooz,Preslav Nakov +8 more
TL;DR: The state-of-the-art on multimodal disinformation detection covers various combinations of modalities: text, images, audio, video, network structure, and temporal information.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.