scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Consistent labeling of tracked objects in multiple cameras with overlapping fields of view

01 Oct 2003-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 25, Iss: 10, pp 1355-1360
TL;DR: It is shown that, if the FOV lines are known, it is possible to disambiguate between multiple possibilities for correspondence, and once these lines are initialized, the homography between the views can also be recovered.
Abstract: We address the issue of tracking moving objects in an environment covered by multiple uncalibrated cameras with overlapping fields of view, typical of most surveillance setups. In such a scenario, it is essential to establish correspondence between tracks of the same object, seen in different cameras, to recover complete information about the object. We call this the problem of consistent labeling of objects when seen in multiple cameras. We employ a novel approach of finding the limits of field of view (FOV) of each camera as visible in the other cameras. We show that, if the FOV lines are known, it is possible to disambiguate between multiple possibilities for correspondence. We present a method to automatically recover these lines by observing motion in the environment, Furthermore, once these lines are initialized, the homography between the views can also be recovered. We present results on indoor and outdoor sequences containing persons and vehicles.
Citations
More filters
Journal ArticleDOI
TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
Abstract: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

5,318 citations


Cites methods from "Consistent labeling of tracked obje..."

  • ...2001; Cai and Aggarwal 1999] or computed automatically [Lee et al. 2000; Khan and Shah 2003] from the...

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that the generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori.
Abstract: Given two to four synchronized video streams taken at eye level and from different angles, we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes. In addition, we also derive metrically accurate trajectories for each of them. Our contribution is twofold. First, we demonstrate that our generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. Second, we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences, provided that a reasonable heuristic is used to rank these individuals and that we avoid confusing them with one another.

865 citations

Journal ArticleDOI
TL;DR: This paper reviews the recent development of relevant technologies from the perspectives of computer vision and pattern recognition, and discusses how to face emerging challenges of intelligent multi-camera video surveillance.

695 citations


Cites methods from "Consistent labeling of tracked obje..."

  • ...Khan and Shah (2003) propose a method to automatically recover FOV lines, which are the boundaries of the FOV of a camera in another camera views, by observing the motions of objects....

    [...]

Proceedings ArticleDOI
13 Oct 2003
TL;DR: This work presents a novel approach for establishing object correspondence across non-overlapping cameras, which exploits the redundance in paths that people and cars tend to follow, e.g. roads, walk-ways or corridors, by using motion trends and appearance of objects, to establish correspondence.
Abstract: Conventional tracking approaches assume proximity in space, time and appearance of objects in successive observations. However, observations of objects are often widely separated in time and space when viewed from multiple non-overlapping cameras. To address this problem, we present a novel approach for establishing object correspondence across non-overlapping cameras. Our multicamera tracking algorithm exploits the redundance in paths that people and cars tend to follow, e.g. roads, walk-ways or corridors, by using motion trends and appearance of objects, to establish correspondence. Our system does not require any inter-camera calibration, instead the system learns the camera topology and path probabilities of objects using Parzen windows, during a training phase. Once the training is complete, correspondences are assigned using the maximum a posteriori (MAP) estimation framework. The learned parameters are updated with changing trajectory patterns. Experiments with real world videos are reported, which validate the proposed approach.

531 citations

Journal ArticleDOI
TL;DR: A planar homographic occupancy constraint is developed that fuses foreground likelihood information from multiple views, to resolve occlusions and localize people on a reference scene plane in the framework of plane to plane homologies.
Abstract: Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multi-view approach to solving this problem. In our approach we neither detect nor track objects from any single camera or camera pair; rather evidence is gathered from all the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multi-view approaches that require fully calibrated views our approach is purely image-based and uses only 2D constructs. To this end we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views, to resolve occlusions and localize people on a reference scene plane. For greater robustness this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis, are demonstrated in challenging multi-view, crowded scenes.

369 citations


Cites background from "Consistent labeling of tracked obje..."

  • ...In spite of the current body of knowledge, we believe monocular methods have limited ability to handle occlusions involving several objects, generally two or three, because the single viewpoint is intrinsically unable to observe the hidden areas....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper focuses on motion tracking and shows how one can use observed motion to learn patterns of activity in a site and create a hierarchical binary-tree classification of the representations within a sequence.
Abstract: Our goal is to develop a visual monitoring system that passively observes moving objects in a site and learns patterns of activity from those observations. For extended sites, the system will require multiple cameras. Thus, key elements of the system are motion tracking, camera coordination, activity classification, and event detection. In this paper, we focus on motion tracking and show how one can use observed motion to learn patterns of activity in a site. Motion segmentation is based on an adaptive background subtraction method that models each pixel as a mixture of Gaussians and uses an online approximation to update the model. The Gaussian distributions are then evaluated to determine which are most likely to result from a background process. This yields a stable, real-time outdoor tracker that reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. While a tracking system is unaware of the identity of any object it tracks, the identity remains the same for the entire tracking sequence. Our system leverages this information by accumulating joint co-occurrences of the representations within a sequence. These joint co-occurrence statistics are then used to create a hierarchical binary-tree classification of the representations. This method is useful for classifying sequences, as well as individual instances of activities in a site.

3,631 citations


Additional excerpts

  • ...camera-multiple-object tracking [ 13 , 15] is a problem that has received considerable attention in...

    [...]

Journal ArticleDOI
01 Oct 2001
TL;DR: This paper presents an overview of the issues and algorithms involved in creating this semiautonomous, multicamera surveillance system and its potential to improve the situational awareness of security providers and decision makers.
Abstract: The Video Surveillance and Monitoring (VSAM) team at Carnegie Mellon University (CMU) has developed an end-to-end, multicamera surveillance system that allows a single human operator to monitor activities in a cluttered environment using a distributed network of active video sensors. Video understanding algorithms have been developed to automatically detect people and vehicles, seamlessly track them using a network of cooperating active sensors, determine their three-dimensional locations with respect to a geospatial site model, and present this information to a human operator who controls the system through a graphical user interface. The goal is to automatically collect and disseminate real-time information to improve the situational awareness of security providers and decision makers. The feasibility of real-time video surveillance has been demonstrated within a multicamera testbed system developed on the campus of CMU. This paper presents an overview of the issues and algorithms involved in creating this semiautonomous, multicamera surveillance system.

693 citations


"Consistent labeling of tracked obje..." refers background in this paper

  • ...Index Terms—Tracking, multiple cameras, multiperspective video, surveillance, camera handoff, sensor fusion. æ...

    [...]

Proceedings ArticleDOI
13 Oct 2003
TL;DR: This work presents a novel approach for establishing object correspondence across non-overlapping cameras, which exploits the redundance in paths that people and cars tend to follow, e.g. roads, walk-ways or corridors, by using motion trends and appearance of objects, to establish correspondence.
Abstract: Conventional tracking approaches assume proximity in space, time and appearance of objects in successive observations. However, observations of objects are often widely separated in time and space when viewed from multiple non-overlapping cameras. To address this problem, we present a novel approach for establishing object correspondence across non-overlapping cameras. Our multicamera tracking algorithm exploits the redundance in paths that people and cars tend to follow, e.g. roads, walk-ways or corridors, by using motion trends and appearance of objects, to establish correspondence. Our system does not require any inter-camera calibration, instead the system learns the camera topology and path probabilities of objects using Parzen windows, during a training phase. Once the training is complete, correspondences are assigned using the maximum a posteriori (MAP) estimation framework. The learned parameters are updated with changing trajectory patterns. Experiments with real world videos are reported, which validate the proposed approach.

531 citations


"Consistent labeling of tracked obje..." refers background in this paper

  • ...Most of the information needed can be extracted by observing motion over a period of time....

    [...]

  • ...0162-8828/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society in the general surveillance scenario....

    [...]

Proceedings ArticleDOI
19 Oct 1998
TL;DR: A process is described for analysing the motion of a human target in a video stream, where a "star" skeleton is produced and two motion cues are determined: body posture, and cyclic motion of skeleton segments.
Abstract: In this paper a process is described for analysing the motion of a human target in a video stream. Moving targets are detected and their boundaries extracted. From these, a "star" skeleton is produced. Two motion cues are determined from this skeletonization: body posture, and cyclic motion of skeleton segments. These cues are used to determine human activities such as walking or running, and even potentially, the target's gait. Unlike other methods, this does not require an a priori human model, or a large number of "pixels on target". Furthermore, it is computationally inexpensive, and thus ideal for real-world video applications such as outdoor video surveillance.

464 citations


"Consistent labeling of tracked obje..." refers background in this paper

  • ...Results of experiments with both indoor and outdoor sequences were presented....

    [...]

Proceedings ArticleDOI
05 Dec 2002
TL;DR: This method provides the solution to some of the common problems that are not addressed by most background subtraction algorithms, such as fast illumination changes, repositioning of static background objects, and initialization of background model with moving objects present in the scene.
Abstract: We present a background subtraction method that uses multiple cues to detect objects robustly in adverse conditions. The algorithm consists of three distinct levels, i.e., pixel level, region level and frame level. At the pixel level, statistical models of gradients and color are separately used to classify each pixel as belonging to background or foreground. In the region level, foreground pixels obtained from the color based subtraction are grouped into regions and gradient based subtraction is then used to make inferences about the validity of these regions. Pixel based models are updated based on decisions made at the region level. Finally, frame level analysis is performed to detect global illumination changes. Our method provides the solution to some of the common problems that are not addressed by most background subtraction algorithms, such as fast illumination changes, repositioning of static background objects, and initialization of background model with moving objects present in the scene.

462 citations


"Consistent labeling of tracked obje..." refers background in this paper

  • ...Results of experiments with both indoor and outdoor sequences were presented....

    [...]