scispace - formally typeset
Search or ask a question

Showing papers by "Antonio Torralba published in 2005"


Proceedings ArticleDOI
17 Oct 2005
TL;DR: Applied to a database of images of isolated objects, the sharing of parts among objects improves detection accuracy when few training examples are available and this hierarchical probabilistic model is extended to scenes containing multiple objects.
Abstract: We describe a hierarchical probabilistic model for the detection and recognition of objects in cluttered, natural scenes. The model is based on a set of parts which describe the expected appearance and position, in an object centered coordinate frame, of features detected by a low-level interest operator. Each object category then has its own distribution over these parts, which are shared between objects. We learn the parameters of this model via a Gibbs sampler which uses the graphical model's structure to analytically average over many parameters. Applied to a database of images of isolated objects, the sharing of parts among objects improves detection accuracy when few training examples are available. We also extend this hierarchical framework to scenes containing multiple objects

358 citations


Journal ArticleDOI
01 Jul 2005
TL;DR: Motion magnification as discussed by the authors is a technique that acts like a microscope for visual motion, which can amplify subtle motions in a video sequence, allowing for visualization of deformations that would otherwise be invisible.
Abstract: We present motion magnification, a technique that acts like a microscope for visual motion. It can amplify subtle motions in a video sequence, allowing for visualization of deformations that would otherwise be invisible. To achieve motion magnification, we need to accurately measure visual motions, and group the pixels to be modified. After an initial image registration step, we measure motion by a robust analysis of feature point trajectories, and segment pixels based on similarity of position, color, and motion. A novel measure of motion similarity groups even very small motions according to correlation over time, which often relates to physical cause. An outlier mask marks observations not explained by our layered motion model, and those pixels are simply reproduced on the output from the original registered observations.The motion of any selected layer may be magnified by a user-specified amount; texture synthesis fills-in unseen "holes" revealed by the amplified motions. The resulting motion-magnified images can reveal or emphasize small motions in the original sequence, as we demonstrate with deformations in load-bearing structures, subtle motions or balancing corrections of people, and "rigid" structures bending under hand pressure.

306 citations


01 Jan 2005
TL;DR: A web-based tool that allows easy image annotation and instant sharing of such annotations is developed and a large dataset that spans many object categories, often containing multiple instances over a wide variety of images is collected.
Abstract: Abstract We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

203 citations


Proceedings Article
05 Dec 2005
TL;DR: This work develops a hierarchical probabilistic model for the spatial structure of visual scenes based on the transformed Dirichlet process, a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data.
Abstract: Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object-centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP's inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images.

169 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: The role of Contextual priors in guiding visual search is investigated by monitoring eye movements as participants search very familiar scenes for a target object by identifying which stage of the visual search benefits from contextual priors.
Abstract: Attention allocation in visual search is known to be influenced by low-level image features, visual scene context and top down task constraints. Here, we investigate the role of Contextual priors in guiding visual search by monitoring eye movements as participants search very familiar scenes for a target object. The goal of the study is to identify which stage of the visual search benefits from contextual priors. Two groups of participants differed in the expectation of target presence associated with a scene. Stronger priors are established when a scene exemplar is always associated with the presence of the target than when the scene is periodically observed with and without the target. In both cases, overall search performance improves over repeated presentations of scenes. An analytic decomposition of the time course of the effect of contextual priors shows a time benefit to the exploration stage of search (scan time) and a decrease in gaze duration on the target. The strength of the contextual relationship modulates the magnitude of gaze duration gain, while the scan time gain constitutes one half of the overall search performance benefit regardless of the probability (50% or 100%) of target presence. These data are discussed in terms of the implications of contextdependent scene processing and its putative role in various stages of visual search.

30 citations


Proceedings ArticleDOI
17 Oct 2005
TL;DR: This work proposes a method for cross-modal inference that simultaneously learns shape recipes between two modalities and estimates missing information by using a prior on image structure gleaned from the alternate modality.
Abstract: In cross-modal inference, we estimate complete fields from noisy and missing observations of one sensory modality using structure found in another sensory modality. This inference problem occurs in several areas including texture reconstruction and reconstruction of geophysical fields. We propose a method for cross-modal inference that simultaneously learns shape recipes between two modalities and estimates missing information by using a prior on image structure gleaned from the alternate modality. In the absence of a physical basis for representing image priors, we use a statistical one that represents correlations in differential features. This is done efficiently using a perturbation sampling scheme. Using just one example of the alternate modality, we produce a factorized ensemble representation of feature correlations that yields efficient solutions to large-sized spatial inference problems. We demonstrate the utility of this approach on cross-modal inference with depth and spectral data.

6 citations