ArtEmis: Affective Language for Visual Art

doi:10.1109/CVPR46437.2021.01140

Open AccessProceedings ArticleDOI

ArtEmis: Affective Language for Visual Art

- pp 11569-11579

TLDR

ArtEmis as mentioned in this paper is a large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.

Abstract:

We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 455K emotion attributions and explanations from humans, on 80K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at https://artemisdataset.org.

ArtEmis: Affective Language for Visual Art

Citations

Multimodal Learning with Transformers: A Survey

Language-Driven Artistic Style Transfer

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

Exploring CLIP for Assessing the Look and Feel of Images

Data-efficient image captioning of fine art paintings via virtual-real semantic alignment training

References

Deep Residual Learning for Image Recognition

Long short-term memory

ImageNet: A large-scale hierarchical image database

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bleu: a Method for Automatic Evaluation of Machine Translation

Related Papers (5)

Emotion Reinforced Visual Storytelling

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality

Emotional Dialogue Generation using Image-Grounded Language Models

A creative artificial intelligence system to investigate user experience, affect, emotion and creativity

Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network