scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Stable multi-target tracking in real-time surveillance video

20 Jun 2011-pp 3457-3464
TL;DR: This work presents a multi-target tracking system that is designed specifically for the provision of stable and accurate head location estimates and uses a more principled approach based on a Minimal Description Length (MDL) objective which accurately models the affinity between observations.
Abstract: The majority of existing pedestrian trackers concentrate on maintaining the identities of targets, however systems for remote biometric analysis or activity recognition in surveillance video often require stable bounding-boxes around pedestrians rather than approximate locations. We present a multi-target tracking system that is designed specifically for the provision of stable and accurate head location estimates. By performing data association over a sliding window of frames, we are able to correct many data association errors and fill in gaps where observations are missed. The approach is multi-threaded and combines asynchronous HOG detections with simultaneous KLT tracking and Markov-Chain Monte-Carlo Data Association (MCM-CDA) to provide guaranteed real-time tracking in high definition video. Where previous approaches have used ad-hoc models for data association, we use a more principled approach based on a Minimal Description Length (MDL) objective which accurately models the affinity between observations. We demonstrate by qualitative and quantitative evaluation that the system is capable of providing precise location estimates for large crowds of pedestrians in real-time. To facilitate future performance comparisons, we make a new dataset with hand annotated ground truth head locations publicly available.
Citations
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing, and it is found that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score.
Abstract: There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.

1,604 citations


Cites background from "Stable multi-target tracking in rea..."

  • ...This category comprise of three videos from [1] with a fast-moving motorbike in the desert, a low contrast recording of a car on the highway, and a car-chase; three videos from the 3DPeS dataset [28] with varying illumination conditions and with complex object motion; one video from the dataset in [107] with surveillance of multiple people; and three complex videos from YouTube....

    [...]

Posted Content
TL;DR: With MOTChallenge, the work toward a novel multiple object tracking benchmark aimed to address issues of standardization, and the way toward a unified evaluation framework for a more meaningful quantification of multi-target tracking is described.
Abstract: In the recent past, the computer vision community has developed centralized benchmarks for the performance evaluation of a variety of tasks, including generic object and pedestrian detection, 3D reconstruction, optical flow, single-object short-term tracking, and stereo estimation. Despite potential pitfalls of such benchmarks, they have proved to be extremely helpful to advance the state of the art in the respective area. Interestingly, there has been rather limited work on the standardization of quantitative benchmarks for multiple target tracking. One of the few exceptions is the well-known PETS dataset, targeted primarily at surveillance applications. Despite being widely used, it is often applied inconsistently, for example involving using different subsets of the available data, different ways of training the models, or differing evaluation scripts. This paper describes our work toward a novel multiple object tracking benchmark aimed to address such issues. We discuss the challenges of creating such a framework, collecting existing and new data, gathering state-of-the-art methods to be tested on the datasets, and finally creating a unified evaluation system. With MOTChallenge we aim to pave the way toward a unified evaluation framework for a more meaningful quantification of multi-target tracking.

667 citations


Cites methods from "Stable multi-target tracking in rea..."

  • ...9 yes static high cloudy [10] ADL-Rundle-1 30 1920x1080 500 (00:17) 32 9306 18....

    [...]

  • ...For the 4 sequences filmed using a static camera, AVG-TownCentre, PETS09-S2L1, PETS09-S2L2 and TUDStadtmitte, the calibration files from the sources [5], [10], [20] are used to compute a 2D homography between the image plane and the ground plane....

    [...]

Journal ArticleDOI
TL;DR: This work proposes an alternative formulation of multitarget tracking as minimization of a continuous energy that focuses on designing an energy that corresponds to a more complete representation of the problem, rather than one that is amenable to global optimization.
Abstract: Many recent advances in multiple target tracking aim at finding a (nearly) optimal set of trajectories within a temporal window. To handle the large space of possible trajectory hypotheses, it is typically reduced to a finite set by some form of data-driven or regular discretization. In this work, we propose an alternative formulation of multitarget tracking as minimization of a continuous energy. Contrary to recent approaches, we focus on designing an energy that corresponds to a more complete representation of the problem, rather than one that is amenable to global optimization. Besides the image evidence, the energy function takes into account physical constraints, such as target dynamics, mutual exclusion, and track persistence. In addition, partial image evidence is handled with explicit occlusion reasoning, and different targets are disambiguated with an appearance model. To nevertheless find strong local minima of the proposed nonconvex energy, we construct a suitable optimization scheme that alternates between continuous conjugate gradient descent and discrete transdimensional jump moves. These moves, which are executed such that they always reduce the energy, allow the search to escape weak minima and explore a much larger portion of the search space of varying dimensionality. We demonstrate the validity of our approach with an extensive quantitative evaluation on several public data sets.

616 citations

Journal ArticleDOI
TL;DR: In this article, the ability of intelligent autonomous systems to perceive, understand, and anticipate human behavior becomes increasingly important in a growing number of intelligent systems in human environments, and the ability to do so is discussed.
Abstract: With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand, and anticipate human behavior becomes increasingly important. Spec...

547 citations

Book ChapterDOI
07 Oct 2012
TL;DR: This work proposes an approach to data association which incorporates both motion and appearance in a global manner and utilizes Generalized Minimum Clique Graphs to solve the optimization problem of the data association method.
Abstract: Data association is an essential component of any human tracking system. The majority of current methods, such as bipartite matching, incorporate a limited-temporal-locality of the sequence into the data association problem, which makes them inherently prone to IDswitches and difficulties caused by long-term occlusion, cluttered background, and crowded scenes.We propose an approach to data association which incorporates both motion and appearance in a global manner. Unlike limited-temporal-locality methods which incorporate a few frames into the data association problem, we incorporate the whole temporal span and solve the data association problem for one object at a time, while implicitly incorporating the rest of the objects. In order to achieve this, we utilize Generalized Minimum Clique Graphs to solve the optimization problem of our data association method. Our proposed method yields a better formulated approach to data association which is supported by our superior results. Experiments show the proposed method makes significant improvements in tracking in the diverse sequences of Town Center [1], TUD-crossing [2], TUD-Stadtmitte [2], PETS2009 [3], and a new sequence called Parking Lot compared to the state of the art methods.

411 citations

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Stable multi-target tracking in rea..." refers methods in this paper

  • ...We make motion estimates by following corner features with pyramidal Kanade-Lucas-Tomasi (KLT) tracking [15, 8]....

    [...]

Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

01 Jan 1991

2,443 citations

Journal ArticleDOI
TL;DR: This work introduces two intuitive and general metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time.
Abstract: Simultaneous tracking of multiple persons in real-world environments is an active research field and several approaches have been proposed, based on a variety of features and algorithms. Recently, there has been a growing interest in organizing systematic evaluations to compare the various techniques. Unfortunately, the lack of common metrics for measuring the performance of multiple object trackers still makes it hard to compare their results. In this work, we introduce two intuitive and general metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time. These metrics have been extensively used in two large-scale international evaluations, the 2006 and 2007 CLEAR evaluations, to measure and compare the performance of multiple object trackers for a wide variety of tracking tasks. Selected performance results are presented and the advantages and drawbacks of the presented metrics are discussed based on the experience gained during the evaluations.

2,286 citations


"Stable multi-target tracking in rea..." refers background or methods in this paper

  • ...Multiple Object Tracking Accuracy (MOTA) is a combined measure which takes into account false positives, false negatives and identity switches (see [5] for details)....

    [...]

  • ...The Multiple Object Tracking Precision (MOTP) measures the precision with which objects are located using the intersection of the estimated region with the ground truth region....

    [...]

  • ...A comprehensive evaluation of the realtime tracker on two different datasets is performed using the standard CLEAR MOT [5] evaluation criteria....

    [...]

  • ...Both resulted in a significant reduction in the MOTA and either the precision or recall....

    [...]

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A learning-based hierarchical approach of multi-target tracking from a single camera by progressively associating detection responses into longer and longer track fragments (tracklets) and finally the desired target trajectories by virtue of a HybridBoost algorithm.
Abstract: We propose a learning-based hierarchical approach of multi-target tracking from a single camera by progressively associating detection responses into longer and longer track fragments (tracklets) and finally the desired target trajectories. To define tracklet affinity for association, most previous work relies on heuristically selected parametric models; while our approach is able to automatically select among various features and corresponding non-parametric models, and combine them to maximize the discriminative power on training data by virtue of a HybridBoost algorithm. A hybrid loss function is used in this algorithm because the association of tracklet is formulated as a joint problem of ranking and classification: the ranking part aims to rank correct tracklet associations higher than other alternatives; the classification part is responsible to reject wrong associations when no further association should be done. Experiments are carried out by tracking pedestrians in challenging datasets. We compare our approach with state-of-the-art algorithms to show its improvement in terms of tracking accuracy.

637 citations


"Stable multi-target tracking in rea..." refers background in this paper

  • ...The first group covers feed-forward systems which use only current and past observations to estimate the current state [7, 1, 2]. The second group covers data association based methods which also use future information to estimate the current state, allowing ambiguities to be more easily resolved at the cost of increased latency [13, 11, 12 , 3, 20]....

    [...]