Video object tracking using adaptive Kalman filter

doi:10.1016/J.JVCIR.2006.03.004

Home
/
Papers
/
Video object tracking using adaptive Kalman filter

Journal Article•DOI•

Video object tracking using adaptive Kalman filter

Shiuh-Ku Weng¹, Chung-Ming Kuo², Shu-Kang Tu²•Institutions (2)

United States Naval Academy¹, I-Shou University²

01 Dec 2006-Journal of Visual Communication and Image Representation (Academic Press, Inc.)-Vol. 17, Iss: 6, pp 1190-1208

TL;DR: The proposed method has the robust ability to track theMoving object in the consecutive frames under some kinds of real-world complex situations such as the moving object disappearing totally or partially due to occlusion by other ones, fast moving object, changing lighting, changing the direction and orientation of the movingobject, and changing the velocity of moving object suddenly.

read less

About: This article is published in Journal of Visual Communication and Image Representation.The article was published on 2006-12-01. It has received 314 citations till now. The article focuses on the topics: Video tracking & Kalman filter.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Recurrent Network Models for Human Dynamics

[...]

Katerina Fragkiadaki¹, Sergey Levine¹, Panna Felsen¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

02 Aug 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Encoder-Recurrent-Decoder (ERD) model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers that extends previous Long Short Term Memory models in the literature to jointly learn representations and their dynamics.

...read moreread less

Abstract: We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoid drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units.

...read moreread less

570 citations

Cites methods from "Video object tracking using adaptiv..."

...Parametric temporal filters such as Kalman filtering [47], HMMs or Gaussian processes for activity specific dynamics [39, 19, 28] generally use simple, linear dynamics models for prediction....
[...]

Proceedings Article•DOI•

Recurrent Network Models for Human Dynamics

[...]

Katerina Fragkiadaki¹, Sergey Levine¹, Panna Felsen¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

07 Dec 2015

TL;DR: In this paper, the Encoder-Recurrent-Decoder (ERD) model is proposed for recognition and prediction of human body pose in videos and motion capture, which is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers.

...read moreread less

Abstract: We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoiding drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units [31].

...read moreread less

546 citations

Proceedings Article•DOI•

A multiple object tracking method using Kalman filter

[...]

Xin Li¹, Kejun Wang, Wei Wang, Yang Li•Institutions (1)

Harbin Engineering University¹

20 Jun 2010

TL;DR: An algorithm of feature-based using Kalman filter motion to handle multiple objects tracking is proposed and shows that the algorithm achieves efficient tracking of multiple moving objects under the confusing situations.

...read moreread less

Abstract: It is important to maintain the identity of multiple targets while tracking them in some applications such as behavior understanding. However, unsatisfying tracking results may be produced due to different real-time conditions. These conditions include: inter-object occlusion, occlusion of the ocjects by background obstacles, splits and merges, which are observed when objects are being tracked in real-time. In this paper, an algorithm of feature-based using Kalman filter motion to handle multiple objects tracking is proposed. The system is fully automatic and requires no manual input of any kind for initialization of tracking. Through establishing Kalman filter motion model with the features centroid and area of moving objects in a single fixed camera monitoring scene, using information obtained by detection to judge whether merge or split occurred, the calculation of the cost function can be used to solve the problems of correspondence after split happened. The algorithm proposed is validated on human and vehicle image sequence. The results shows that the algorithm proposed achieves efficient tracking of multiple moving objects under the confusing situations.

...read moreread less

185 citations

Cites methods from "Video object tracking using adaptiv..."

...Kalman Filter for Multi-object Tracking Describing the object’s geometric features can include location, shape and center of mass (centroid)[10], etc....
[...]

Journal Article•DOI•

Heterogeneous Association Graph Fusion for Target Association in Multiple Object Tracking

[...]

Hao Sheng¹, Yang Zhang¹, Jiahui Chen¹, Zhang Xiong¹, Jun Zhang² - Show less +1 more•Institutions (2)

Beihang University¹, University of Wisconsin–Milwaukee²

01 Nov 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A heterogeneous association graph is constructed that fuses high-level detections and low-level image evidence for target association and the novel idea of adaptive weights is proposed to analyze the contribution between motion and appearance.

...read moreread less

Abstract: Tracking-by-detection is one of the most popular approaches to tracking multiple objects in which the detector plays an important role. Sometimes, detector failures caused by occlusions or various poses are unavoidable and lead to tracking failure. To cope with this problem, we construct a heterogeneous association graph that fuses high-level detections and low-level image evidence for target association. Compared with other methods using low-level information, our proposed heterogeneous association fusion (HAF) tracker is less sensitive to particular parameters and is easier to extend and implement. We use the fused association graph to build track trees for HAF and solve them by the multiple hypotheses tracking framework, which has been proven to be competitive by introducing efficient pruning strategies. In addition, the novel idea of adaptive weights is proposed to analyze the contribution between motion and appearance. We also evaluated our results on the MOT challenge benchmarks and achieved state-of-the-art results on the MOT Challenge 2017.

...read moreread less

116 citations

Cites background from "Video object tracking using adaptiv..."

...Kalman filters [4], [5] and particle filters [6], [7] are widely applied in real tracking applications....
[...]

Proceedings Article•

Learning Visual Predictive Models of Physics for Playing Billiards

[...]

Katerina Fragkiadaki¹, Pulkit Agrawal¹, Sergey Levine¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2016

TL;DR: In this paper, an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination").

...read moreread less

Abstract: The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents. In this paper, we explore how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination"). Our models directly process raw visual input, and use a novel object-centric prediction formulation based on visual glimpses centered on objects (fixations) to enforce translational invariance of the learned physical laws. The agent gathers training data through random interaction with a collection of different environments, and the resulting model can then be used to plan goal-directed actions in novel environments that the agent has not seen before. We demonstrate that our agent can accurately plan actions for playing a simulated billiards game, which requires pushing a ball into a target position or into collision with another ball.

...read moreread less

112 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

Collapse

References

PDF

Open Access

More filters

Book•DOI•

An Introduction to the Kalman Filter

[...]

Greg Welch¹, Gary Bishop¹•Institutions (1)

University of North Carolina at Chapel Hill¹

29 Nov 1995

TL;DR: The discrete Kalman filter as mentioned in this paper is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error.

...read moreread less

Abstract: In 1960, R.E. Kalman published his famous paper describing a recursive solution to the discrete-data linear filtering problem. Since that time, due in large part to advances in digital computing, the Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown. The purpose of this paper is to provide a practical introduction to the discrete Kalman filter. This introduction includes a description and some discussion of the basic discrete Kalman filter, a derivation, description and some discussion of the extended Kalman filter, and a relatively simple (tangible) example with real numbers & results.

...read moreread less

2,811 citations

Journal Article•DOI•

A Survey of Computer Vision-Based Human Motion Capture

[...]

Thomas B. Moeslund¹, Erik Granum¹•Institutions (1)

Aalborg University¹

01 Mar 2001-Computer Vision and Image Understanding

TL;DR: A comprehensive survey of computer vision-based human motion capture literature from the past two decades is presented, with a general overview based on a taxonomy of system functionalities, broken down into four processes: initialization, tracking, pose estimation, and recognition.

...read moreread less

1,917 citations

Proceedings Article•DOI•

Moving target classification and tracking from real-time video

[...]

Alan J. Lipton¹, Hironobu Fujiyoshi¹, R.S. Patil•Institutions (1)

Carnegie Mellon University¹

19 Oct 1998

TL;DR: An end-to-end method for extracting moving targets from a real-time video stream, classifying them into predefined categories according to image-based properties, and then robustly tracking them is described.

...read moreread less

Abstract: This paper describes an end-to-end method for extracting moving targets from a real-time video stream, classifying them into predefined categories according to image-based properties, and then robustly tracking them. Moving targets are detected using the pixel wise difference between consecutive image frames. A classification metric is applied these targets with a temporal consistency constraint to classify them into three categories: human, vehicle or background clutter. Once classified targets are tracked by a combination of temporal differencing and template matching. The resulting system robustly identifies targets of interest, rejects background clutter and continually tracks over large distances and periods of time despite occlusions, appearance changes and cessation of target motion.

...read moreread less

1,278 citations

Journal Article•DOI•

Robust online appearance models for visual tracking

[...]

Allan D. Jepson¹, David J. Fleet², T.F. El-Maraghi¹•Institutions (2)

University of Toronto¹, PARC²

01 Oct 2003-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects to provide robustness in the face of image outliers, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.

...read moreread less

Abstract: We propose a framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects. The model adapts to slowly changing appearance, and it maintains a natural measure of the stability of the observed image structure during tracking. By identifying stable properties of appearance, we can weight them more heavily for motion estimation, while less stable properties can be proportionately downweighted. The appearance model involves a mixture of stable image structure, learned over long time courses, along with two-frame motion information and an outlier process. An online EM-algorithm is used to adapt the appearance model parameters over time. An implementation of this approach is developed for an appearance model based on the filter responses from a steerable pyramid. This model is used in a motion-based tracking algorithm to provide robustness in the face of image outliers, such as those caused by occlusions, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.

...read moreread less

1,142 citations

Book Chapter•DOI•

Stochastic Tracking of 3D Human Figures Using 2D Image Motion

[...]

Hedvig Sidenbladh¹, Michael J. Black², David J. Fleet²•Institutions (2)

Royal Institute of Technology¹, PARC²

26 Jun 2000

TL;DR: A probabilistic method for tracking 3D articulated human figures in monocular image sequences that relies only on a frame-to-frame assumption of brightness constancy and hence is able to track people under changing viewpoints, in grayscale image sequences, and with complex unknown backgrounds.

...read moreread less

Abstract: A probabilistic method for tracking 3D articulated human figures in monocular image sequences is presented Within a Bayesian framework, we define a generative model of image appearance, a robust likelihood function based on image graylevel differences, and a prior probability distribution over pose and joint angles that models how humans move The posterior probability distribution over model parameters is represented using a discrete set of samples and is propagated over time using particle filtering The approach extends previous work on parameterized optical flow estimation to exploit a complex 3D articulated motion model It also extends previous work on human motion tracking by including a perspective camera model, by modeling limb self occlusion, and by recovering 3D motion from a monocular sequence The explicit posterior probability distribution represents ambiguities due to image matching, model singularities, and perspective projection The method relies only on a frame-to-frame assumption of brightness constancy and hence is able to track people under changing viewpoints, in grayscale image sequences, and with complex unknown backgrounds

...read moreread less

692 citations