




TL;DR: An optimization problem to select set of cameras and inference dependencies for each person which attempts to minimize the computational cost under given performance constraints is presented and results show the efficiency of COST in improving the performance of such systems and reducing the computational resources required.
Abstract: Development of multiple camera based vision systems for analysis of dynamic objects such as humans is challenging due to occlusions and similarity in the appearance of a person with the background and other people- visual "confusion". Since occlusion and confusion depends on the presence of other people in the scene, it leads to a dependency structure where there are often loops in the resulting Bayesian network. While approaches such as loopy belief propagation can be used for inference, they are computationally expensive and convergence is not guaranteed in many situations. We present a unified approach, COST, that reasons about such dependencies and yields an order for the inference of each person in a group of people and a set of cameras to be used for inferences for a person. Using the probabilistic distribution of the positions and appearances of people, COST performs visibility and confusion analysis for each part of each person and computes the amount of information that can be computed with and without more accurate estimation of the positions of other people. We present an optimization problem to select set of cameras and inference dependencies for each person which attempts to minimize the computational cost under given performance constraints. Results show the efficiency of COST in improving the performance of such systems and reducing the computational resources required.
Did you find this useful? Give us your feedback
...read more
69 citations
...We conduct a detailed performance analysis with data captured on practical multi-camera systems with multiple people observed over the network....
[...]
65 citations
...ing occlusions and visual confusion [18]....
[...]
38 citations
35 citations
31 citations
...2) with the ground truth data, the result of [11] and [15]....
[...]
...(a) 3D reconstruction with 15 views at frame 199 (b) 8-view tracking result comparison with methods in [11], [15] and the ground truth data....
[...]
...[11, 15] for providing us the 16-camera dataset....
[...]
...The LAB sequence [11] with poor image contrast is also processed....
[...]
...Recent tracking efforts also use 2D probabilistic occlusion reasoning to improve object localization [11]....
[...]
1,539 citations
1,470 citations
...We show that COST not only yields a reduction in computational time compared to approaches such as Expectation Maximization (EM) or Loopy Belief Propagation (LBP) [16], but also shows quantitative improvement in the pose/position estimation due to camera selection....
[...]
442 citations
...M2Tracker is a system that segments, detects and tracks multiple people on a ground plane in a cluttered scene [15]....
[...]
...While occlusion has been considered to some extent(for weighted fusion) in some papers [15, 13], confusion due to appearance similarities has not been previously considered....
[...]
...Most of the position estimation algorithms constrain the motion to a ground plane and perform inference by first segmenting the people in each view and then using data fusion techniques to obtain an estimate of the 3D locations of each person [15, 12, 13, 6]....
[...]
383 citations
286 citations
...al [4] proposed an information theoretic based approach where the view which leads to maximum reduction in entropy is chosen....
[...]
The algorithm cycles between using segmentation to estimate people’s ground plane positions and using ground plane position estimates to obtain segmentations; the process is iterated until stable.
The number of occluded voxels that can be added due to dependencies depends on the selection of dependencies and the accuracy of the position estimate of the occluder.
A naive approach (by considering all pairwise interactions of all parts of all people) would involve constructing a large Bayesian network with loops; however, this results in an intractable optimization problem.
Let dV be a differential volume element(voxel) which might be included in part j of person i. The Occluder Region, Ωk(dV ), of a differential element dV in camera k is defined as the 3D region in which another person, l, must be present so that dV would not be visible in camera k (See Fig 3).
The weight cl,m is proportional to the probability of the part (l,m) lying in the confuser space and being visible:cl,m = 1Z ∫ Ck(dV ) P (EO k (dA))P (El,m(dV1))dV1 (6)where Z is a normalizing factor.
These approaches reduce the complexity by annihilating small probabilities [11] or removing weak dependencies [14] and arcs [21].
Typical goals of such an analysis are to recover the position, orientation or the pose of each or some subset of the people in the scene.
the error in estimating the position of person i using the stereo pair (k1, k2) is approximated by5In M2Tracker, visibility does not vary with height and hence ground plane analysis of visibility can be performed instead of 3D modelingEi(k1, k2) = (1 − f̃(θk1,k2)Sk1i Sk2i ) (11)where θk1,k2 is the angle between the viewing directions of cameras k1 and k2 on the ground plane.