scispace - formally typeset
Search or ask a question

Showing papers by "Trevor Darrell published in 2001"


Proceedings ArticleDOI
01 Jan 2001
TL;DR: This work develops a view-normalization approach to multi-view face and gait recognition that provides greater recognition accuracy than is obtained using the unnormalized input sequences, and that integrated face andgait recognition provides improved performance over either modality alone.
Abstract: We develop a view-normalization approach to multi-view face and gait recognition. An image-based visual hull (IBVH) is computed from a set of monocular views and used to render virtual views for tracking and recognition. We determine canonical viewpoints by examining the 3D structure, appearance (texture), and motion of the moving person. For optimal face recognition, we place virtual cameras to capture frontal face appearance; for gait recognition we place virtual cameras to capture a side-view of the person. Multiple cameras can be rendered simultaneously, and camera position is dynamically updated as the person moves through the workspace. Image sequences from each canonical view are passed to an unmodified face or gait recognition algorithm. We show that our approach provides greater recognition accuracy than is obtained using the unnormalized input sequences, and that integrated face and gait recognition provides improved performance over either modality alone. Canonical view estimation, rendering, and recognition have been efficiently implemented and can run at near real-time speeds.

263 citations


Journal ArticleDOI
TL;DR: It is argued that privacy in context-aware computing, especially those with perceptually aware environments, will be quite complex, and future research will need to consider how regulatory and technical solutions might be co-designed to form a public good.
Abstract: Context-aware computing offers the promise of significant user gains--the ability for systems to adapt more readily to user needs, models, and goals. Dey, Abowd, and Salber (2001 [this special issue]) present a masterful step toward understanding context-aware applications. We examine Dey et al. in the light of privacy issues--that is, individuals' control over their personal data--to highlight some of the thorny issues in context-aware computing that will be upon us soon. We argue that privacy in context-aware computing, especially those with perceptually aware environments, will be quite complex. Indeed, privacy forms a co-design space between the social, the technical, and the regulatory. We recognize that Dey et al. is a necessary first step in examining important software engineering concerns, but future research will need to consider how regulatory and technical solutions might be co-designed to form a public good.

234 citations


Proceedings ArticleDOI
01 Feb 2001
TL;DR: In this paper, the authors derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions.
Abstract: In a known environment, objects may be tracked in multiple views using a set of background models. Stereo-based models can be illumination-invariant, but often have undefined values which inevitably lead to foreground classification errors. We derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions. Foreground points are detected quickly in new images using pruned disparity search. We adopt a "late-segmentation" strategy, using an integrated plan-view density representation. Foreground points are segmented into object regions only when a trajectory is finally estimated, using a dynamic programming-based method. Object entry and exit are optimally determined and are not restricted to special spatial zones.

125 citations


Proceedings ArticleDOI
07 May 2001
TL;DR: A rigid transformation is introduced that maps two disparity images of a rigidly moving object and how it is related to the Euclidean rigid motion and a motion estimation algorithm is derived.
Abstract: A new method for 3D rigid motion estimation from stereo is proposed in this paper. The appealing feature of this method is that it directly uses the disparity images obtained from stereo matching. We assume that the stereo rig has parallel cameras and show, in that case, the geometric and topological properties of the disparity images. Then we introduce a rigid transformation (called d-motion) that maps two disparity images of a rigidly moving object. We show how it is related to the Euclidean rigid motion and a motion estimation algorithm is derived. We show with experiments that our approach is simple and more accurate than standard approaches.

62 citations


Proceedings ArticleDOI
07 May 2001
TL;DR: A class of differential motion trackers that automatically stabilize when in finite domains is developed, using an approximation to the posterior distribution of pose changes as an uncertainty model for parametric motion in order to help arbitrate the use of multiple base frames.
Abstract: We develop a class of differential motion trackers that automatically stabilize when in finite domains. Most differential trackers compute motion only relative to one previous frame, accumulating errors indefinitely. We estimate pose changes between a set of past frames, and develop a probabilistic framework for integrating those estimates. We use an approximation to the posterior distribution of pose changes as an uncertainty model for parametric motion in order to help arbitrate the use of multiple base frames. We demonstrate this framework on a simple 2D translational tracker and a 3D, 6-degree of freedom tracker.

49 citations


Journal ArticleDOI
TL;DR: The use of Gaussian mixtures to model correspondence uncertainties for disparity and image velocity estimation is introduced and some properties of the disparity space are shown and how rigid transformations can be represented.
Abstract: In this paper we explore a multiple hypothesis approach to estimating rigid motion from a moving stereo rig. More precisely, we introduce the use of Gaussian mixtures to model correspondence uncertainties for disparity and velocity (optical flow) estimation. We show some properties of the disparity space and show how rigid transformations can be represented. An algorithm derived from standard random sampling-based robust estimators, that efficiently estimates rigid transformations from multi-hypothesis disparity maps and velocity fields is given.

21 citations


Proceedings ArticleDOI
15 Nov 2001
TL;DR: This work presents an audio-video localization technique that combines the benefits of the two modalities, and achieves an 8.9 dB improvement over a single far-field microphone and a 6.7dB improvement over source separation based on video-only localization.
Abstract: Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.

20 citations


Proceedings ArticleDOI
15 Nov 2001
TL;DR: This work presents an information theoretic approach for fusion of multiple modalities, and presents empirical results demonstrating audio-video localization and consistency measurement.
Abstract: Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.

17 citations


Journal ArticleDOI
TL;DR: A local image transform based on cumulative similarity measures is defined and is shown to enable efficient correspondence and tracking near occluding boundaries and results comparing this method to traditional least-squares and robust correspondence matching are shown.
Abstract: A local image transform based on cumulative similarity measures is defined and is shown to enable efficient correspondence and tracking near occluding boundaries. Unlike traditional methods, this transform allows correspondences to be found when the only contrast present is the occluding boundary itself and when the sign of contrast along the boundary is possibly reversed. The transform is based on the idea of a cumulative similarity measure which characterizes the shape of local image homogeneity; both the value of an image at a particular point and the shape of the region with locally similar and connected values is captured. This representation is insensitive to structure beyond an occluding boundary but is sensitive to the shape of the boundary itself, which is often an important cue. We show results comparing this method to traditional least-squares and robust correspondence matching.

9 citations


Journal ArticleDOI
TL;DR: In cases where the background pattern is stationary, it is shown how visibility constraints from other views can generate virtual background values at points with no valid depth in the primary view.
Abstract: Visibility constraints can aid the segmentation of foreground objects observed with multiple range images. In our approach, points are defined as foreground if they can be determined to occlude some empty space in the scene. We present an efficient algorithm to estimate foreground points in each range view using explicit epipolar search. In cases where the background pattern is stationary, we show how visibility constraints from other views can generate virtual background values at points with no valid depth in the primary view. We demonstrate the performance of both algorithms for detecting people in indoor office environments.

8 citations