scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Bayesian computer vision system for modeling human interactions

TL;DR: A real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task and demonstrates the ability to use these a priori models to accurately classify real human behaviors and interactions with no additional tuning or training.
Abstract: We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task. The system deals in particularly with detecting when interactions between people occur and classifying the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both components employing a statistical Bayesian approach. We propose and compare two different state-based learning architectures, namely, HMMs and CHMMs for modeling behaviors and interactions. Finally, a synthetic "Alife-style" training system is used to develop flexible prior models for recognizing human interactions. We demonstrate the ability to use these a priori models to accurately classify real human behaviors and interactions with no additional tuning or training.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
Abstract: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

5,318 citations


Cites background or methods from "A Bayesian computer vision system f..."

  • ...Results of two point correspondence algorithms. (a) Tracking using the algorithm proposed by Veenman et al. [2001] in the rotating dish sequence color segmentation was used to detect black dots on a white dish ( c ©2001 IEEE)....

    [...]

  • ...Background Modeling Mixture of Gaussians[Stauffer and Grimson 2000], Eigenbackground[Oliver et al. 2000], Wall flower [Toyama et al....

    [...]

  • ...Background Modeling Mixture of Gaussians[Stauffer and Grimson 2000], Eigenbackground[Oliver et al. 2000], Wall .ower [Toyama et al. 1999], Dynamic texture background [Monnet et al. 2003]....

    [...]

Journal ArticleDOI
TL;DR: This paper focuses on motion tracking and shows how one can use observed motion to learn patterns of activity in a site and create a hierarchical binary-tree classification of the representations within a sequence.
Abstract: Our goal is to develop a visual monitoring system that passively observes moving objects in a site and learns patterns of activity from those observations. For extended sites, the system will require multiple cameras. Thus, key elements of the system are motion tracking, camera coordination, activity classification, and event detection. In this paper, we focus on motion tracking and show how one can use observed motion to learn patterns of activity in a site. Motion segmentation is based on an adaptive background subtraction method that models each pixel as a mixture of Gaussians and uses an online approximation to update the model. The Gaussian distributions are then evaluated to determine which are most likely to result from a background process. This yields a stable, real-time outdoor tracker that reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. While a tracking system is unaware of the identity of any object it tracks, the identity remains the same for the entire tracking sequence. Our system leverages this information by accumulating joint co-occurrences of the representations within a sequence. These joint co-occurrence statistics are then used to create a hierarchical binary-tree classification of the representations. This method is useful for classifying sequences, as well as individual instances of activities in a site.

3,631 citations


Cites background from "A Bayesian computer vision system f..."

  • ...Changes in scene lighting can cause problems for many backgrounding methods....

    [...]

Journal ArticleDOI
TL;DR: The context for socially interactive robots is discussed, emphasizing the relationship to other research fields and the different forms of “social robots”, and a taxonomy of design methods and system components used to build socially interactive Robots is presented.

2,869 citations

Journal ArticleDOI
TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.

2,738 citations


Cites background from "A Bayesian computer vision system f..."

  • ...Oliver et al. [270] also use a pixel’s neighbors to represent it....

    [...]

  • ...Year First author Initialisation Tracking Pose estimation Recognition 2000 Barron [26] 2000 Buades [45] 2000 Chang * [56] * 2000 Davis [84] 2000 Deutscher * [90] 2000 Felzenszwalb [104] 2000 Haritaoglu * [137] * * 2000 Howe * [153] 2000 Ivanov * [170] 2000 Karaulova * * [188] 2000 Khan * [193] 2000 Oliver [270] 2000 Ormoneit [274] * * 2000 Ricquebourg [304] * 2000 Stauffer * [355] 2000 Takahashi [359] * 2000 Taylor [361] * 2000 Trivedi [367] 2000 Trivedi * [368] 2000 Zhao [417] ∑ Total=20 2 8 8 2...

    [...]

  • ...[270] also use a pixel’s neighbors to represent it....

    [...]

Proceedings ArticleDOI
10 Oct 2004
TL;DR: A review of the main methods and an original categorisation based on speed, memory requirements and accuracy can effectively guide the designer to select the most suitable method for a given application in a principled way.
Abstract: Background subtraction is a widely used approach for detecting moving objects from static cameras. Many different methods have been proposed over the recent years and both the novice and the expert can be confused about their benefits and limitations. In order to overcome this problem, this paper provides a review of the main methods and an original categorisation based on speed, memory requirements and accuracy. Such a review can effectively guide the designer to select the most suitable method for a given application in a principled way. Methods reviewed include parametric and non-parametric background density estimates and spatial correlation approaches.

2,346 citations


Cites background from "A Bayesian computer vision system f..."

  • ...In [13], however, it is not explicitly specified what images should be part of the initial sample, and whether and how such a model should be updated over time....

    [...]

  • ...In [13], however, the authors report good results with lower computational load than a Mixture of Ganssians...

    [...]

References
More filters
Journal ArticleDOI
Lawrence R. Rabiner1
01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

21,819 citations

Journal ArticleDOI
TL;DR: Pfinder is a real-time system for tracking people and interpreting their behavior that uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions.
Abstract: Pfinder is a real-time system for tracking people and interpreting their behavior. It runs at 10 Hz on a standard SGI Indy computer, and has performed reliably on thousands of people in many different physical locations. The system uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions. Pfinder has been successfully used in a wide range of applications including wireless interfaces, video databases, and low-bandwidth coding.

4,280 citations

Proceedings ArticleDOI
14 Oct 1996
TL;DR: Pfinder uses a multi-class statistical model of color and shape to obtain a 2-D representation of head and hands in a wide range of viewing conditions, useful for applications such as wireless interfaces, video databases, and low-bandwidth coding.
Abstract: Pfinder is a real-time system for tracking and interpretation of people. It runs on a standard SGI Indy computer, and has performed reliably on thousands of people in many different physical locations. The system uses a multi-class statistical model of color and shape to obtain a 2-D representation of head and hands in a wide range of viewing conditions. These representations are useful for applications such as wireless interfaces, video databases, and low-bandwidth coding, without cumbersome wires or attached sensors.

1,951 citations

Journal ArticleDOI
27 Nov 1995
TL;DR: A generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner, and a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model.
Abstract: Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a single discrete variable—the hidden state. We discuss a generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner. We describe an exact algorithm for inferring the posterior probabilities of the hidden state variables given the observations, and relate it to the forward–backward algorithm for HMMs and to algorithms for more general graphical models. Due to the combinatorial nature of the hidden state representation, this exact algorithm is intractable. As in other intractable systems, approximate inference can be carried out using Gibbs sampling or variational methods. Within the variational framework, we present a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model. Empirical comparisons suggest that these approximations are efficient and provide accurate alternatives to the exact methods. Finally, we use the structured approximation to model Bach‘s chorales and show that factorial HMMs can capture statistical structure in this data set which an unconstrained HMM cannot.

1,384 citations

Proceedings ArticleDOI
17 Jun 1997
TL;DR: Algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions are presented.
Abstract: We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm and a clear Bayesian semantics. However the Markovian framework makes strong restrictive assumptions about the system generating the signal-that it is a single process having a small number of states and an extremely limited state memory. The single-process model is often inappropriate for vision (and speech) applications, resulting in low ceilings on model performance. Coupled HMMs provide an efficient way to resolve many of these problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions.

1,181 citations


"A Bayesian computer vision system f..." refers background in this paper

  • ...…without addi tional training Another potential problem arises when a com pletely new pattern of behavior is presented to the system After the system has been trained at a few di erent sites previously unobserved behaviors will be by de nition rare and unusual To account for such novel…...

    [...]