scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Video object tracking using adaptive Kalman filter

01 Dec 2006-Journal of Visual Communication and Image Representation (Academic Press, Inc.)-Vol. 17, Iss: 6, pp 1190-1208
TL;DR: The proposed method has the robust ability to track theMoving object in the consecutive frames under some kinds of real-world complex situations such as the moving object disappearing totally or partially due to occlusion by other ones, fast moving object, changing lighting, changing the direction and orientation of the movingobject, and changing the velocity of moving object suddenly.
About: This article is published in Journal of Visual Communication and Image Representation.The article was published on 2006-12-01. It has received 314 citations till now. The article focuses on the topics: Video tracking & Kalman filter.
Citations
More filters
Posted Content
TL;DR: The Encoder-Recurrent-Decoder (ERD) model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers that extends previous Long Short Term Memory models in the literature to jointly learn representations and their dynamics.
Abstract: We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoid drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units.

570 citations


Cites methods from "Video object tracking using adaptiv..."

  • ...Parametric temporal filters such as Kalman filtering [47], HMMs or Gaussian processes for activity specific dynamics [39, 19, 28] generally use simple, linear dynamics models for prediction....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, the Encoder-Recurrent-Decoder (ERD) model is proposed for recognition and prediction of human body pose in videos and motion capture, which is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers.
Abstract: We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoiding drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units [31].

546 citations

Proceedings ArticleDOI
20 Jun 2010
TL;DR: An algorithm of feature-based using Kalman filter motion to handle multiple objects tracking is proposed and shows that the algorithm achieves efficient tracking of multiple moving objects under the confusing situations.
Abstract: It is important to maintain the identity of multiple targets while tracking them in some applications such as behavior understanding. However, unsatisfying tracking results may be produced due to different real-time conditions. These conditions include: inter-object occlusion, occlusion of the ocjects by background obstacles, splits and merges, which are observed when objects are being tracked in real-time. In this paper, an algorithm of feature-based using Kalman filter motion to handle multiple objects tracking is proposed. The system is fully automatic and requires no manual input of any kind for initialization of tracking. Through establishing Kalman filter motion model with the features centroid and area of moving objects in a single fixed camera monitoring scene, using information obtained by detection to judge whether merge or split occurred, the calculation of the cost function can be used to solve the problems of correspondence after split happened. The algorithm proposed is validated on human and vehicle image sequence. The results shows that the algorithm proposed achieves efficient tracking of multiple moving objects under the confusing situations.

185 citations


Cites methods from "Video object tracking using adaptiv..."

  • ...Kalman Filter for Multi-object Tracking Describing the object’s geometric features can include location, shape and center of mass (centroid)[10], etc....

    [...]

Journal ArticleDOI
TL;DR: A heterogeneous association graph is constructed that fuses high-level detections and low-level image evidence for target association and the novel idea of adaptive weights is proposed to analyze the contribution between motion and appearance.
Abstract: Tracking-by-detection is one of the most popular approaches to tracking multiple objects in which the detector plays an important role. Sometimes, detector failures caused by occlusions or various poses are unavoidable and lead to tracking failure. To cope with this problem, we construct a heterogeneous association graph that fuses high-level detections and low-level image evidence for target association. Compared with other methods using low-level information, our proposed heterogeneous association fusion (HAF) tracker is less sensitive to particular parameters and is easier to extend and implement. We use the fused association graph to build track trees for HAF and solve them by the multiple hypotheses tracking framework, which has been proven to be competitive by introducing efficient pruning strategies. In addition, the novel idea of adaptive weights is proposed to analyze the contribution between motion and appearance. We also evaluated our results on the MOT challenge benchmarks and achieved state-of-the-art results on the MOT Challenge 2017.

116 citations


Cites background from "Video object tracking using adaptiv..."

  • ...Kalman filters [4], [5] and particle filters [6], [7] are widely applied in real tracking applications....

    [...]

Proceedings Article
01 Jan 2016
TL;DR: In this paper, an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination").
Abstract: The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents. In this paper, we explore how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination"). Our models directly process raw visual input, and use a novel object-centric prediction formulation based on visual glimpses centered on objects (fixations) to enforce translational invariance of the learned physical laws. The agent gathers training data through random interaction with a collection of different environments, and the resulting model can then be used to plan goal-directed actions in novel environments that the agent has not seen before. We demonstrate that our agent can accurately plan actions for playing a simulated billiards game, which requires pushing a ball into a target position or into collision with another ball.

112 citations

References
More filters
BookDOI
29 Nov 1995
TL;DR: The discrete Kalman filter as mentioned in this paper is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error.
Abstract: In 1960, R.E. Kalman published his famous paper describing a recursive solution to the discrete-data linear filtering problem. Since that time, due in large part to advances in digital computing, the Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown. The purpose of this paper is to provide a practical introduction to the discrete Kalman filter. This introduction includes a description and some discussion of the basic discrete Kalman filter, a derivation, description and some discussion of the extended Kalman filter, and a relatively simple (tangible) example with real numbers & results.

2,811 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of computer vision-based human motion capture literature from the past two decades is presented, with a general overview based on a taxonomy of system functionalities, broken down into four processes: initialization, tracking, pose estimation, and recognition.

1,917 citations

Proceedings ArticleDOI
19 Oct 1998
TL;DR: An end-to-end method for extracting moving targets from a real-time video stream, classifying them into predefined categories according to image-based properties, and then robustly tracking them is described.
Abstract: This paper describes an end-to-end method for extracting moving targets from a real-time video stream, classifying them into predefined categories according to image-based properties, and then robustly tracking them. Moving targets are detected using the pixel wise difference between consecutive image frames. A classification metric is applied these targets with a temporal consistency constraint to classify them into three categories: human, vehicle or background clutter. Once classified targets are tracked by a combination of temporal differencing and template matching. The resulting system robustly identifies targets of interest, rejects background clutter and continually tracks over large distances and periods of time despite occlusions, appearance changes and cessation of target motion.

1,278 citations

Journal ArticleDOI
TL;DR: A framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects to provide robustness in the face of image outliers, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.
Abstract: We propose a framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects. The model adapts to slowly changing appearance, and it maintains a natural measure of the stability of the observed image structure during tracking. By identifying stable properties of appearance, we can weight them more heavily for motion estimation, while less stable properties can be proportionately downweighted. The appearance model involves a mixture of stable image structure, learned over long time courses, along with two-frame motion information and an outlier process. An online EM-algorithm is used to adapt the appearance model parameters over time. An implementation of this approach is developed for an appearance model based on the filter responses from a steerable pyramid. This model is used in a motion-based tracking algorithm to provide robustness in the face of image outliers, such as those caused by occlusions, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.

1,142 citations

Book ChapterDOI
26 Jun 2000
TL;DR: A probabilistic method for tracking 3D articulated human figures in monocular image sequences that relies only on a frame-to-frame assumption of brightness constancy and hence is able to track people under changing viewpoints, in grayscale image sequences, and with complex unknown backgrounds.
Abstract: A probabilistic method for tracking 3D articulated human figures in monocular image sequences is presented Within a Bayesian framework, we define a generative model of image appearance, a robust likelihood function based on image graylevel differences, and a prior probability distribution over pose and joint angles that models how humans move The posterior probability distribution over model parameters is represented using a discrete set of samples and is propagated over time using particle filtering The approach extends previous work on parameterized optical flow estimation to exploit a complex 3D articulated motion model It also extends previous work on human motion tracking by including a perspective camera model, by modeling limb self occlusion, and by recovering 3D motion from a monocular sequence The explicit posterior probability distribution represents ambiguities due to image matching, model singularities, and perspective projection The method relies only on a frame-to-frame assumption of brightness constancy and hence is able to track people under changing viewpoints, in grayscale image sequences, and with complex unknown backgrounds

692 citations