ArtEmis: Affective Language for Visual Art
Panos Achlioptas,Maks Ovsjanikov,Kilichbek Haydarov,Mohamed Elhoseiny,Leonidas J. Guibas +4 more
- pp 11569-11579
TLDR
ArtEmis as mentioned in this paper is a large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language.Abstract:
We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 455K emotion attributions and explanations from humans, on 80K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at https://artemisdataset.org.read more
Citations
More filters
Journal ArticleDOI
Multimodal Learning with Transformers: A Survey
TL;DR: A comprehensive survey of Transformer techniques oriented at multimodal data and a discussion of open problems and potential research directions for the community are presented.
Book ChapterDOI
Language-Driven Artistic Style Transfer
TL;DR: This article proposed contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST by the patch-wise style discriminator, which considers the correlation between language and patches of style images or transferred results to jointly embed style instructions.
Journal ArticleDOI
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
TL;DR: ChatCaptioner as discussed by the authors is an automatic question answering method deployed in image captioning, where a large language model, such as ChatGPT, is prompted to ask a series of informative questions about images to BLIP-2, a strong vision question answering model.
Journal ArticleDOI
Exploring CLIP for Assessing the Look and Feel of Images
TL;DR: This paper goes beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception and abstract perception of images in a zero-shot manner and explores the possibility of exploiting vision-language priors captured in CLIP to bypass the quality labeling process.
Journal ArticleDOI
Data-efficient image captioning of fine art paintings via virtual-real semantic alignment training
TL;DR: Zhang et al. as discussed by the authors proposed a virtual-real semantic alignment training process to address the challenges in painting captioning, which achieved significant improvements and higher data efficiency than the baselines in two data-hungry scenarios on all datasets.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.