Showing papers by "Trevor Darrell published in 2000"

PDF

Open Access

Journal Article•DOI•

Integrated Person Tracking Using Stereo, Color, and Pattern Detection

[...]

Trevor Darrell¹, G. Gordon¹, M. Harville¹, John Iselin Woodfill¹•Institutions (1)

24 Jun 2000-International Journal of Computer Vision

TL;DR: This work combines stereo, color, and face detection modules into a single robust system, shows an initial application in an interactive, face-responsive display, and discusses the failure modes of each individual module.

...read moreread less

Abstract: We present an approach to real-time person tracking in crowded and/or unknown environments using integration of multiple visual modalities. We combine stereo, color, and face detection modules into a single robust system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours or days). Short-term tracking is performed using simple region position and size correspondences, while medium and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual module, describe our integration method, and report results with the complete system in trials with thousands of users.

...read moreread less

435 citations

Proceedings Article•

Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

[...]

John W. Fisher¹, Trevor Darrell¹, William T. Freeman², Paul A. Viola¹•Institutions (2)

Massachusetts Institute of Technology¹, Mitsubishi Electric²

01 Jan 2000

TL;DR: First, the data is projected into a maximally informative, low-dimensional subspace, suitable for density estimation, and the complicated stochastic relationships between the signals are modeled using a nonparametric density estimator.

...read moreread less

Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a low-level, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.

...read moreread less

226 citations

Book Chapter•DOI•

Ausio-visual Segmentation and The Cocktail Party Effect

[...]

Trevor Darrell¹, John W. Fisher¹, Paul A. Viola¹, William T. Freeman•Institutions (1)

Mitsubishi Electric Research Laboratories¹

14 Oct 2000

TL;DR: It is shown how audio utterances from several speakers recorded with a single microphone can be separated into constituent streams, and how the method can help reduce the effect of noise in automatic speech recognition.

...read moreread less

Abstract: Audio-based interfaces usually suffer when noise or other acoustic sources are present in the environment. For robust audio recognition, a single source must first be isolated. Existing solutions to this problem generally require special microphone configurations, and often assume prior knowledge of the spurious sources. We have developed new algorithms for segmenting streams of audio-visual information into their constituent sources by exploiting the mutual information present between audio and visual tracks. Automatic face recognition and image motion analysis methods are used to generate visual features for a particular user; empirically these features have high mutual information with audio recorded from that user. We show how audio utterances from several speakers recorded with a single microphone can be separated into constituent streams; we also show how the method can help reduce the effect of noise in automatic speech recognition.

...read moreread less

54 citations

Proceedings Article•DOI•

Articulated-pose estimation using brightness- and depth-constancy constraints

[...]

Michele Covell¹, A. Rahini, M. Harville¹, Trevor Darrell¹•Institutions (1)

Interval Research Corporation¹

01 Jun 2000

TL;DR: This paper addresses several important issues in the formation of the constraint equations, including updating the body rotation matrix without using a first-order matrix approximation and removing the coupling between the rotation and translation updates.

...read moreread less

Abstract: This paper explores several approaches for articulated-pose estimation, assuming that video-rate depth information is available, from either stereo cameras or other sensors. We use these depth measurements in the traditional linear brightness constraint equation, as well as in a depth constraint equation. To capture the joint constraints, we combine the brightness and depth constraints with twist mathematics. We address several important issues in the formation of the constraint equations, including updating the body rotation matrix without using a first-order matrix approximation and removing the coupling between the rotation and translation updates. The resulting constraint equations are linear on a modified parameter set. After solving these linear constraints, there is a single closed-form non-linear transformation to return the updates to the original pose parameters. We show results for tracking body pose in oblique views of synthetic walking sequences and in moving-camera views of synthetic jumping-jack sequences. We also show results for tracking body pose in side views of a real walking sequence.

...read moreread less

30 citations

Tracking Articualted Figures with Cylindrical Limb Constraints

[...]

Leonid Taycher, Trevor Darrell

01 Jan 2000

TL;DR: A tracking system will need to automatically initialize a model from the video, track persons for long periods of time (possibly tens of minutes), and be able to recover if it looses track.

...read moreread less

Abstract: Motivation: An ability to obtain this information would be extremely useful in applications such as virtual reality, remote human identification (gait analysis), nonintrusive medical diagnostics, and others. The common way to model human body for this purpose is a kinematic tree which is parametrized on the sizes of the limbs and the joint angles. The tracking system will need to automatically initialize a model from the video, track persons for long periods of time (possibly tens of minutes), and be able to recover if it looses track.

...read moreread less

Person Tracking with Stereo Range Sensors

[...]

John Viloria, David Demirdjian, Neal Checka, Trevor Darrell

01 Jan 2000

TL;DR: This system demonstrates the capabilities of a solely vision-based system for understanding human speech and movements and shows the ability to track and understand people.

...read moreread less

Abstract: Motivation: Systems which can track and understand people have a wide variety of commercial applications. It is predicted that computers of the future will interact more naturally with humans than they do now. Instead of the desktop computer paradigm with humans communicating by typing, computers of the future will be able to understand human speech and movements. Our system demonstrates the capabilities of a solely vision-based system for these ends.

...read moreread less