scispace - formally typeset
Open AccessProceedings ArticleDOI

Understanding and Predicting Image Memorability at a Large Scale

Reads0
Chats0
TLDR
LaMem is built, the largest annotated image memorability dataset to date, using Convolutional Neural Networks, to demonstrate that one can now robustly estimate the memorability of images from many different classes, positioning memorability and deep memorability features as prime candidates to estimate the utility of information for cognitive systems.
Abstract
Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data. Here, we introduce a novel experimental procedure to objectively measure human memory, allowing us to build LaMem, the largest annotated image memorability dataset to date (containing 60,000 images from diverse sources). Using Convolutional Neural Networks (CNNs), we show that fine-tuned deep features outperform all other features by a large margin, reaching a rank correlation of 0.64, near human consistency (0.68). Analysis of the responses of the high-level CNN layers shows which objects and regions are positively, and negatively, correlated with memorability, allowing us to create memorability maps for each image and provide a concrete method to perform image memorability manipulation. This work demonstrates that one can now robustly estimate the memorability of images from many different classes, positioning memorability and deep memorability features as prime candidates to estimate the utility of information for cognitive systems. Our model and data are available at: http://memorability.csail.mit.edu.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Eye Tracking for Everyone

TL;DR: iTracker, a convolutional neural network for eye tracking, is trained, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device.
Proceedings ArticleDOI

Eye Tracking for Everyone

TL;DR: Gaze Capture as mentioned in this paper is the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2:5M frames and trained iTracker, a convolutional neural network, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device.
Journal ArticleDOI

Fine-tuning Convolutional Neural Networks for fine art classification

TL;DR: It is shown that features derived from fine-tuned networks can be employed to retrieve images similar in either style or content, which can be used to enhance capabilities of search systems in different online art collections.

Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola: GANalyze: Toward Visual Definitions of Cognitive Image Properties.

TL;DR: In this article, a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability is introduced, where GANs allow to generate a manifold of natural-looking images with fine-grained differences in their visual attributes.
Proceedings ArticleDOI

GANalyze: Toward Visual Definitions of Cognitive Image Properties

TL;DR: A framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability is introduced and it is demonstrated that the same framework can be used to analyze image aesthetics and emotional valence.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Related Papers (5)