scispace - formally typeset
Search or ask a question

Showing papers by "Antonio Torralba published in 2001"


Journal ArticleDOI
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

6,882 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: A simple probabilistic framework for modeling the relationship between context and object properties is introduced, representing global context information in terms of the spatial layout of spectral components and serving as an effective procedure for context driven focus of attention and scale-selection on real-world scenes.
Abstract: There is general consensus that context can be a rich source of information about an object's identity, location and scale. However the issue of how to formalize centextual influences is still largely open. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties. We represent global context information in terms of the spatial layout of spectral components. The resulting scheme serves as an effective procedure for context driven focus of attention and scale-selection on real-world scenes. Based on a simple holistic analysis of an image, the scheme is able to accurately predict object locations and sizes.

171 citations


Journal ArticleDOI
TL;DR: This work addresses the question of how the visual system classifies images into face and non-face patterns and focuses on face detection in impoverished images, which allow for an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.
Abstract: The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance. Research reported in this paper was supported in part by funds from the Defense Advanced Research Projects Agency and a Sloan fellowship for neuroscience to PS.

46 citations


Proceedings Article
03 Jan 2001
TL;DR: This paper shows that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search.
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. In this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes.

27 citations


01 Dec 2001
TL;DR: It is demonstrated that, by recognizing the properties of the structures present in the image, one can infer the scale of the scene, and therefore its absolute mean depth, and illustrate the interest in computing the mean depth of thescene with application to scene recognition and object detection.
Abstract: In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual 'size' of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the diÆculty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on speci c objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.

4 citations