Journal ArticleDOI
The Dynamic Representation of Scenes
Reads0
Chats0
TLDR
In this paper, it is argued that focused attention provides spatiotemporal coherence for the stable representation of one object at a time, and that the allocation of attention can be co-ordinated to create a virtual representation.Abstract:
One of the more powerful impressions created by vision is that of a coherent, richly detailed world where everything is present simultaneously. Indeed, this impression is so compelling that we tend to ascribe these properties not only to the external world, but to our internal representations as well. But results from several recent experiments argue against this latter ascription. For example, changes in images of real-world scenes often go unnoticed when made during a saccade, flicker, blink, or movie cut. This “change blindness” provides strong evidence against the idea that our brains contain a picture-like representation of the scene that is everywhere detailed and coherent. How then do we represent a scene? It is argued here that focused attention provides spatiotemporal coherence for the stable representation of one object at a time. It is then argued that the allocation of attention can be co-ordinated to create a “virtual representation”. In such a scheme, a stable object representation is formed...read more
Citations
More filters
Journal ArticleDOI
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
Aude Oliva,Antonio Torralba +1 more
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Proceedings Article
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Kelvin Xu,Jimmy Ba,Ryan Kiros,Kyunghyun Cho,Aaron Courville,Ruslan Salakhudinov,Ruslan Salakhudinov,Rich Zemel,Rich Zemel,Yoshua Bengio,Yoshua Bengio +10 more
TL;DR: An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.
Posted Content
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Kelvin Xu,Jimmy Ba,Ryan Kiros,Kyunghyun Cho,Aaron Courville,Ruslan Salakhutdinov,Richard S. Zemel,Yoshua Bengio +7 more
TL;DR: This paper proposed an attention-based model that automatically learns to describe the content of images by focusing on salient objects while generating corresponding words in the output sequence, which achieved state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.
Posted Content
CBAM: Convolutional Block Attention Module
TL;DR: The proposed Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks, can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs.
Book ChapterDOI
CBAM: Convolutional Block Attention Module
TL;DR: Convolutional Block Attention Module (CBAM) as discussed by the authors is a simple yet effective attention module for feed-forward convolutional neural networks, given an intermediate feature map, the module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement.
References
More filters
Journal ArticleDOI
Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory.
TL;DR: Tested the 2-process theory of detection, search, and attention presented by the current authors (1977) in a series of experiments and demonstrated the qualitative difference between 2 modes of information processing: automatic detection and controlled search.
Intelligence without Representation
TL;DR: Brooks et al. as mentioned in this paper decompose an intelligent system into independent and parallel activity producers which all interface directly to the world through perception and action, rather than interface to each other particularly much.
Book
The visual brain in action
TL;DR: This chapter discusses vision from a biological point of view, attention, consciousness, and the coordination of behaviour in primate visual cortex, and discusses dissociations between perception and action in normal subjects.
Journal ArticleDOI
Intelligence without representation
TL;DR: Brooks et al. as discussed by the authors decompose an intelligent system into independent and parallel activity producers which all interface directly to the world through perception and action, rather than interface to each other particularly much.
Journal ArticleDOI
Guided Search 2.0 A revised model of visual search
Jeremy M. Wolfe,Jeremy M. Wolfe +1 more
TL;DR: This paper reviews the visual search literature and presents a model of human search behavior, a revision of the guided search 2.0 model in which virtually all aspects of the model have been made more explicit and/or revised in light of new data.