Showing papers by "Trevor Darrell published in 2003"

PDF

Open Access

Proceedings Article•DOI•

[...]

Louis-Philippe Morency¹, Ali Rahimi¹, Trevor Darrell¹•Institutions (1)

18 Jun 2003

TL;DR: This work presents a method for online rigid object tracking using an adaptive view-based appearance model that has bounded drift and can track objects undergoing large motion for long periods of time when the object's pose trajectory crosses itself.

...read moreread less

Abstract: We present a method for online rigid object tracking using an adaptive view-based appearance model. When the object's pose trajectory crosses itself, our tracker has bounded drift and can track objects undergoing large motion for long periods of time. Our tracker registers each incoming frame against the views of the appearance model using a two-frame registration algorithm. Using a linear Gaussian filter, we simultaneously estimate the pose of the object and adjust the view-based model as pose-changes are recovered from the registration algorithm. The adaptive view-based model is populated online with views of the object as it undergoes different orientations in pose space, allowing us to capture non-Lambertian effects. We tested our approach on a real-time rigid object tracking task using stereo cameras and observed an RMS error within the accuracy limit of an attached inertial sensor.

...read moreread less

129 citations

Book Chapter•DOI•

Activity Zones for Context-Aware Computing

[...]

Kimberle Koile¹, Konrad Tollmar¹, David Demirdjian¹, Howard Shrobe¹, Trevor Darrell¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

12 Oct 2003

TL;DR: In this paper, the authors discuss the concept of activity zones and suggest that such zones can be used to trigger application actions, retrieve information based on previous context, and present information to users.

...read moreread less

Abstract: Location is a primary cue in many context-aware computing systems, and is often represented as a global coordinate, room number, or a set of Euclidean distances to various landmarks. A user’s concept of location, however, is often defined in terms of regions in which similar activities occur. We discuss the concept of such regions, which we call activity zones, and suggest that such zones can be used to trigger application actions, retrieve information based on previous context, and present information to users. We show how to semi- automatically partition a space into activity zones based on patterns of observed user location and motion. We describe our system and two implemented example applications whose behavior is controlled by users’ entry, exit, and presence in the zones.

...read moreread less

121 citations

Proceedings Article•DOI•

A Bayesian approach to image-based visual hull reconstruction

[...]

Kristen Grauman¹, Gregory Shakhnarovich¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

18 Jun 2003

TL;DR: It is shown how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process.

...read moreread less

Abstract: We present a Bayesian approach to image-based visual hull reconstruction. The 3D (three-dimensional) shape of an object of a known class is represented by sets of silhouette views simultaneously observed from multiple cameras. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. In our representation, 3D information is implicit in the joint observations of multiple contours from known viewpoints. We model the prior density using a probabilistic principal components analysis-based technique and estimate a maximum a posteriori reconstruction of multi-view contours. The proposed method is applied to a dataset of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown.

...read moreread less

88 citations

Proceedings Article•DOI•

Pose estimation using 3D view-based eigenspaces

[...]

Louis-Philippe Morency¹, P. Sundberg¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

17 Oct 2003

TL;DR: This work presents a method for estimating the absolute pose of a rigid object based on intensity and depth view-based eigenspaces, built across multiple views of example objects of the same class.

...read moreread less

Abstract: We present a method for estimating the absolute pose of a rigid object based on intensity and depth view-based eigenspaces, built across multiple views of example objects of the same class. Given an initial frame of an object with unknown pose, we reconstruct a prior model for all views represented in the eigenspaces. For each new frame, we compute the pose-changes between every view of the reconstructed prior model and the new frame. The resulting pose-changes are then combined and used in a Kalman filter update. This approach for pose estimation is user-independent and the prior model can be initialized automatically from any viewpoint of the view-based eigenspaces. To track more robustly over time, we present an extension of this pose estimation technique where we integrate our prior model approach with an adaptive differential tracker. We demonstrate the accuracy of our approach on face pose tracking using stereo cameras.

...read moreread less

74 citations

Proceedings Article•

Inferring 3D Structure with a Statistical Image-Based Shape Model

[...]

Kristen Grauman, Gregory Shakhnarovich¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

13 Oct 2003

TL;DR: In this article, a probabilistic shape+structure model is proposed to estimate the 3D locations of 19 joints on the body based on observedsilhouette contours from real images.

...read moreread less

Abstract: We present an image-based approach to infer 3D structureparameters using a probabilistic "shape+structure" model.The 3D shape of an object class is represented by setsof contours from silhouette views simultaneously observedfrom multiple calibrated cameras, while structural featuresof interest on the object are denoted by a number of 3D locations.A prior density over the multi-view shape and correspondingstructure is constructed with a mixture of probabilisticprincipal components analyzers. Given a novelset of contours, we infer the unknown structure parametersfrom the new shape's Bayesian reconstruction. Modelmatching and parameter inference are done entirely in theimage domain and require no explicit 3D construction. Ourshape model enables accurate estimation of structure despitesegmentation errors or missing views in the input silhouettes,and it works even with only a single input view.Using a training set of thousands of pedestrian images generatedfrom a synthetic model, we can accurately infer the3D locations of 19 joints on the body based on observedsilhouette contours from real images.

...read moreread less

46 citations

Proceedings Article•DOI•

A Probabilistic Framework for Multi-modal Multi-Person Tracking

[...]

Neal Checka¹, Kevin W. Wilson¹, Vibhav Rangarajan¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

16 Jun 2003

TL;DR: A probabilistic tracking framework that combines sound and vision to achieve more robust and accurate tracking of multiple objects and accurately reflects the number of people present is presented.

...read moreread less

Abstract: In this paper, we present a probabilistic tracking framework that combines sound and vision to achieve more robust and accurate tracking of multiple objects. In a cluttered or noisy scene, our measurements have a non-Gaussian, multi-modal distribution. We apply a particle filter to track multiple people using combined audio and video observations. We have applied our algorithm to the domain of tracking people with a stereo-based visual foreground detection algorithm and audio localization using a beamforming technique. Our model also accurately reflects the number of people present. We test the efficacy of our system on a sequence of multiple people moving and speaking in an indoor environment.

...read moreread less

40 citations

Proceedings Article•DOI•

Gesture + play: full-body interaction for virtual environments

[...]

Tollmar Konrad¹, David Demirdjian¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Apr 2003

TL;DR: Several different interaction styles are compared, based on an analysis of the space of possible perceptual interface abstractions for full-body navigation and the results of a wizard-of-oz study of user preferences, for passive, real-time articulated tracking with standard cameras and personal computers.

...read moreread less

Abstract: Navigating virtual environments usually requires a wired interface, game console, or keyboard. The advent of perceptual interface techniques allows a new option, the passive and untethered sensing of users' pose and gesture to allow them maneuver through virtual worlds. We show new algorithms for passive, real-time articulated tracking with standard cameras and personal computers. Several different interaction styles are compared, based on an analysis of the space of possible perceptual interface abstractions for full-body navigation and the results of a wizard-of-oz study of user preferences. In this demo we show our prototype system with users guiding avatars through a series of 3-D virtual game worlds.

...read moreread less

39 citations

Proceedings Article•DOI•

A multi-modal approach for determining speaker location and focus

[...]

Michael R. Siracusa¹, Louis-Philippe Morency¹, Kevin R. Wilson¹, John W. Fisher¹, Trevor Darrell¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

05 Nov 2003

TL;DR: A simple probabilistic framework that combines multiple cues derived from both audio and video information that provides a more robust solution than using any single cue alone is presented.

...read moreread less

Abstract: This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines multiple cues derived from both audio and video information. A purely visual cue is obtained using a head tracker to identify possible speakers in a scene and provide both their 3-D positions and orientation. In addition, estimates of the audio signal's direction of arrival are obtained with the help of a two-element microphone array. A third cue measures the association between the audio and the tracked regions in the video. Integrating these cues provides a more robust solution than using any single cue alone. The usefulness of our approach is shown in our results for video sequences with two or more people in a prototype interactive kiosk environment.

...read moreread less

37 citations

Journal Article•DOI•

Perceptive presence

[...]

Frank Bentley¹, Konrad Tollmar¹, David Demirdjian¹, Kimberle Koile¹, Trevor Darrell¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Sep 2003-IEEE Computer Graphics and Applications

TL;DR: This component-based architecture creates presence applications using perceptual user interface widgets that automatically convey user states to a remote location or application without user input.

...read moreread less

Abstract: Perceptive presence systems automatically convey user states to a remote location or application without user input. Our component-based architecture creates presence applications using perceptual user interface widgets.

...read moreread less

16 citations

Proceedings Article•DOI•

Gesture + Play Exploring Full-Body Navigation for Virtual Environments

[...]

Konrad Tollmar¹, David Demirdjian¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

16 Jun 2003

TL;DR: This work describes new algorithms for interacting with 3-D environments using real-time articulated body tracking with standard cameras and personal computers, based on rigid stereo-motion estimation algorithms and uses a linear technique for enforcing articulation constraints.

...read moreread less

Abstract: Navigating virtual environments usually requires a wired interface, game console, or keyboard The advent of perceptual interface techniques allows a new option: the passive and untethered sensing of users' pose and gesture to allow them maneuver through and manipulate virtual worlds We describe new algorithms for interacting with 3-D environments using real-time articulated body tracking with standard cameras and personal computers Our method is based on rigid stereo-motion estimation algorithms and uses a linear technique for enforcing articulation constraints With our tracking system users can navigate virtual environments using 3-D gesture and body poses We analyze the space of possible perceptual interface abstractions for full-body navigation, and present a prototype system based on these results We finally describe an initial evaluation of our prototype system with users guiding avatars through a series of 3-D virtual game worlds

...read moreread less

12 citations

Proceedings Article•DOI•

Untethered gesture acquisition and recognition for a multimodal conversational system

[...]

Teresa Ko¹, David Demirdjian¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Nov 2003

TL;DR: This research focuses on a module that provides parameterized gesture recognition, using various machine learning techniques, and trains the support vector classifier to model the boundary of the space of possible gestures, and train Hidden Markov Models on specific gestures.

...read moreread less

Abstract: Humans use a combination of gesture and speech to convey meaning, and usually do so without holding a device or pointer. We present a system that incorporates body tracking and gesture recognition for an untethered human-computer interface. This research focuses on a module that provides parameterized gesture recognition, using various machine learning techniques. We train the support vector classifier to model the boundary of the space of possible gestures, and train Hidden Markov Models on specific gestures. Given a sequence, we can find the start and end of various gestures using a support vector classifier, and find gesture likelihoods and parameters with a HMM. Finally multimodal recognition is performed using rank-order fusion to merge speech and vision hypotheses.

...read moreread less

Proceedings Article•DOI•

Learning cross-modal appearance models with application to tracking

[...]

John W. Fisher¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Jul 2003

TL;DR: An algorithm and experimental results of a human speaker moving in a scene and a method which successfully learns such a model without benefit of hand initialization using only the associated audio signal to "decide" which object to model and track.

...read moreread less

Abstract: Objects of interest are rarely silent or invisible. Analysis of multi-modal signal generation from a single object represents a rich and challenging area for smart sensor arrays. We consider the problem of simultaneously learning and audio and visual appearance model of a moving subject. We present a method which successfully learns such a model without benefit of hand initialization using only the associated audio signal to "decide" which object to model and track. We are interested in particular in modeling joint audio and video variation, such as produced by a speaking face. We present an algorithm and experimental results of a human speaker moving in a scene.

...read moreread less