scispace - formally typeset
Open AccessProceedings ArticleDOI

COST: An Approach for Camera Selection and Multi-Object Inference Ordering in Dynamic Scenes

Reads0
Chats0
TLDR
An optimization problem to select set of cameras and inference dependencies for each person which attempts to minimize the computational cost under given performance constraints is presented and results show the efficiency of COST in improving the performance of such systems and reducing the computational resources required.
Abstract
Development of multiple camera based vision systems for analysis of dynamic objects such as humans is challenging due to occlusions and similarity in the appearance of a person with the background and other people- visual "confusion". Since occlusion and confusion depends on the presence of other people in the scene, it leads to a dependency structure where there are often loops in the resulting Bayesian network. While approaches such as loopy belief propagation can be used for inference, they are computationally expensive and convergence is not guaranteed in many situations. We present a unified approach, COST, that reasons about such dependencies and yields an order for the inference of each person in a group of people and a set of cameras to be used for inferences for a person. Using the probabilistic distribution of the positions and appearances of people, COST performs visibility and confusion analysis for each part of each person and computes the amount of information that can be computed with and without more accurate estimation of the positions of other people. We present an optimization problem to select set of cameras and inference dependencies for each person which attempts to minimize the computational cost under given performance constraints. Results show the efficiency of COST in improving the performance of such systems and reducing the computational resources required.

read more

Content maybe subject to copyright    Report

COST
: An Approach for Camera Selection and Multi-Object Inference
Ordering in Dynamic Scenes
Abhinav Gupta
Dept. of Computer Science
University of Maryland
College Park, MD, USA
agupta@cs.umd.edu
Anurag Mittal
Dept. of Comp. Sc. and Engg.
IIT Madras
Chennai, India
amittal@cse.iitm.ernet.in
Larry S. Davis
Dept. of Computer Science
University of Maryland
College Park, MD, USA
lsd@cs.umd.edu
Abstract
Development of multiple camera based vision systems
for analysis of dynamic objects such as humans is chal-
lenging due to occlusions and similarity in the appearance
of a person with the background and other people- visual
“confusion”. Since occlusion and confusion depends on
the presence of other people in the scene, it leads to a de-
pendency structure where there are often loops in the re-
sulting Bayesian network. While approaches such as loopy
belief propagation can be used for inference, they are com-
putationally expensive and convergence is not guaranteed
in many situations.
We present a unified approach, COST, that reasons about
such dependencies and yields an order for the inference of
each person in a group of people and a set of cameras to be
used for inferences for a person. Using the probabilistic dis-
tribution of the positions and appearances of people, COST
performs visibility and confusion analysis for each part of
each person and computes the amount of information that
can be computed with and without more accurate estimation
of the positions of other people. We present an optimization
problem to select set of cameras and inference dependen-
cies for each person which attempts to minimize the compu-
tational cost under given performance constraints. Results
show the efficiency of COST in improving the performance
of such systems and reducing the computational resources
required.
1. Introduction
We consider the problem of multi-perspective analysis
of moving people in crowded situations. Typical goals of
such an analysis are to recover the position, orientation or
the pose of each or some subset of the people in the scene.
The analysis is difficult due to occlusions and appearance
similarities of people with one another or the background
against which they are viewed. We refer to errors arising
from appearance similarities as “confusions”. In multiple
camera systems, information fusion needs to be sensitive to
occlusions and confusions.
Confusion and Occlusion analysis for Selections based on Tasks
Our goal is to develop principled methods to “select” the
camera(s) in which there is less occlusion and confusion for
a particular person to infer that person’s position or pose
(See Figure 1). Additionally, we seek to identify the parts
of the image where such occlusion and confusion occurs
and use this information in the inference process. How-
ever, determining those regions of occlusion and confusion
depends on the positions and poses of other people in the
scene. This leads to a dependency structure for inference
of position/pose of the people present in the scene, as is il-
lustrated graphically in Figure 2(b). A Bayesian network
for such multi-object inference will generally have loops.
Those loops can be eliminated by appropriate selection of
cameras and dropping inference dependencies which are not
expected to yield significant information, as shown in the
example in Figure 2(c).
We present COST, a framework to reason about such
dependencies, that produces an inference order for multi-
person, multi-perspective pose/position estimation. We ad-
ditionally identify a set of cameras and the parts of the ac-
quired images to be analyzed for each person. We show
that COST not only yields a reduction in computational time
compared to approaches such as Expectation Maximization
(EM) or Loopy Belief Propagation (LBP) [16], but also
shows quantitative improvement in the pose/position esti-
mation due to camera selection.
1.1. Related Work
There are many multi-perspective vision algorithms that
analyze crowded scenes for either person position estima-
tion or pose estimation. Most of the position estimation
algorithms constrain the motion to a ground plane and per-
form inference by first segmenting the people in each view
and then using data fusion techniques to obtain an estimate
of the 3D locations of each person [15, 12, 13, 6]. While oc-
clusion has been considered to some extent(for weighted fu-
sion) in some papers [15, 13], confusion due to appearance
similarities has not been previously considered. Addition-
ally, most earlier work either ignores the inference depen-
dencies or uses all of them, which makes the computation
costly.
Previous work on pose estimation has only considered
self-occlusion of one body part by another of the same per-
1

(1a) (1b) (2a) (2b) (3a) (3b)
Figure 1. Segmentation results and median line determination of a person in three different views. In view 1, there is no occlusion and
confusion while in views 2 and 3 there is occlusion and confusion respectively. If the median lines are used for person position estimation
as in [13, 10], without occlusion and confusion reasoning, we might mistakenly use the median lines shown in (2b) and (3b).
2
00
00
00
00
11
11
11
11
00
00
00
00
11
11
11
11
00
00
00
00
11
11
11
11
0
0
0
0
1
1
1
1
Static Object
A
C
B
1
3
00
00
00
11
11
11
(a) Ground Plane
C
AB
(b) All cam-
eras
C(1,2)
B
A(1,2)
(2,3)
(c) COST
Figure 2. (a) A multiple-person scenario with 3 people and 3 cam-
eras. (b)The dependency graph obtained if all cameras are used for
estimation of all people. An edge A B represents the informa-
tion flow from A to B in the inference process-hence estimation
of B depends on estimation of A. In this scenario, estimation of
B depends on A due to occlusion in camera 1 and estimation of
A and C depends on B and A respectively due to occlusions in
cameras 3 and 2 respectively. (c) The dependency graph obtained
if cameras are selected using COST. The selected cameras for es-
timation of each person are shown in the respective node. Since,
camera 1 is not used for estimation of B , the estimation of B
becomes independent of A. Additionally, if the degree of occlu-
sion of C due to A is small (that is one cannot generate significant
information for the estimation of C using the estimate of the lo-
cation or pose of A) then one can also eliminate the dependency
edge A C without strongly affecting the accuracy of the re-
sult. Such elimination can be critical for loop removal when there
are not enough cameras in which a person is isolated and discrim-
inable.
son [9, 8, 19, 20]; occlusion of one person by another, lead-
ing to inference dependencies between people and their parts
has not been addressed. A naive approach (by considering
all pairwise interactions of all parts of all people) would
involve constructing a large Bayesian network with loops;
however, this results in an intractable optimization problem.
We show how many of the loops in the Bayesian network
can be eliminated using selection of the best cameras and
the most important inference dependencies.
A related problem of sensor selection and information
fusion has been studied in the field of sensor networks and
distributed computing. The problem is to selectively choose
the sensors so that information gain compensates for costs
associated with information gathering. An optimal solu-
tion using such an information theoretic approach requires
evaluating all possible combinations, making the problem
NP-Hard. Denzler et. al [4] proposed an information the-
oretic based approach where the view which leads to max-
imum reduction in entropy is chosen. Since the computa-
tion of mutual information requires exponential time, other
approximate [23] and heuristic based algorithms [22]have
also been proposed. Other approaches in this field include
use of look-up tables [17] or utility functions [2] in selection
of camera views.
These information theoretic approaches only consider ge-
ometric analysis based on the fields of view of the cam-
eras when computing mutual information. However, even
though two cameras might have overlapping fields of view
they can still provide different information due to occlusion
and confusion. While [5] presents an approach for cam-
era selection in the presence of occlusions, COST involves
visibility and discriminability analysis in cojunction with
reasoning about dependencies for camera selection.
Bayesian belief networks are an important mechanism
for representation and reasoning under uncertainty. For a
given belief-net even finding an approximate solution is NP-
Hard [3]. Our approach is related to model simplification
methods(see [7]), which simplify the model until exact meth-
ods become feasible. These approaches reduce the com-
plexity by annihilating small probabilities [11]orremoving
weak dependencies [14] and arcs [21].
Our approach is complementary to these approaches. COST’s
loop removal procedure is primarily based on camera selec-
tion, which removes redundant and unreliable information
in multi-perspective vision systems. Additionally, while
previous approaches assume that weights of the dependen-
cies are given, our approach considers occlusion and confu-
sion in different cameras and removes loops based on this
information.
The paper is organized as follows. We first describe how
visibility and confusion factors for an object are computed
in section 2. We then explain our optimization framework
and a heuristic approach for fast approximate inference in

section 4. We finally present experimental results in sec-
tion 5.
2. Computing Occlusion and Confusion
2.1. Computing Visibility
To estimate a property of a given person or object from
a given camera, that person or object must be (partially)
visible from that camera. But one person’s visibility de-
pends on the pose of other people in the scene, whose poses
are generally known only probabilistically. This lends us to
compute visibility probabilistically. Specifically, we com-
pute the probability of visibility of each part of a person in
each camera based on probabilistic estimates of the poses of
all other people in the scene. To develop a generic formula-
tion, let us consider an n-part model for a person where n is
one for simple position estimation or ten for full body pose
estimation.
Let dV be a differential volume element(voxel) which
might be included in part j of person i. The Occluder Re-
gion,
k
(dV ), of a differential element dV in camera k is
defined as the 3D region in which another person, l,must
be present so that dV would not be visible in camera k (See
Fig 3). We also define the following events:
E
i,j
(dV ) = Event that part j of person i includes dV
1
EO
k
l,m
(dV ) = Event that part m of person l intersects
k
(dV )
EO
k
(dV ) = Event that no person intersects
k
(dV )
The expected visibility of a part, that is, the number of
visible voxels contained in that part, is then given by
E
v
(i, j, k)=
V
k
P (EO
k
(dV ))P (E
i,j
(dV ))dV (1)
The probability that part m of person l does not occlude
dV is the probability that part m does not contain any of
the voxels that belongs to the set
k
(dV ). Therefore, that
probability is given by
P (EO
k
l,m
(dV )) =
dV
1
k
(dV )
1 P (E
l,m
(dV
1
)) (2)
The probability that no part of any person is in the oc-
cluder region is then given by
2
P (EO
k
(dV )) =
(l,m)
P (EO
k
l,m
(dV )) (3)
Furthermore, in a tracking scenario, new people can en-
ter the scene. In this case, we also need to consider the oc-
clusions they are likely to introduce and how the expected
visibility changes to account for new people. We assume
1
Apartj can include many such voxels.
2
By considering occlusion of a part (i, j) from itself, we implicitly
select surface voxels instead of interior voxels. Interior voxels would be
occluded the by surface voxels and would not be considered.
there are a fixed and known number of locations, which we
refer to as “portals”, from which a new person enters or
an existing person leaves the scene. Let E
new
(dV ) be the
event that a new person is present in voxel dV . The likeli-
hood of this event, P (E
new
(dV )), is the product of the like-
lihood that a portal is nearby (which is represented in terms
of a prior probability P
p
(E
new
(dV ))) and the image likeli-
hood that a new person is seen in the region P
L
(E
new
(dV )).
Therefore, P (
EO
k
(dV )) is given by:
00
00
00
00
11
11
11
11
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
111111111
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
00000000000000
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
11111111111111
000
000
000
000
111
111
111
111
000
000
111
111
dA
Camera k
C
k
(dA)
k
(dA)
Figure 3. Schematic diagram showing
k
(dV ) and C
k
(dV ) pro-
jected on the ground-plane. Because of discretization,
k
(dV )
and C
k
(dV ) represent the set of voxels where another object must
be present for occlusion or confusion to occur.
dV
1
k
(dV )
(
(l,m)
1 P (E
l,m
(dV
1
)))(1 P (E
new
(dV
1
)))
(4)
2.2. Computing Confusion
Although a person(or some part) might be visible in view
k , the view might still not be helpful in estimating the pose
because of “camouflage” - his appearance being too sim-
ilar to either the background or some other person(s) oc-
cluded by him. Due to such “confusion” with the “back-
ground”, segmenting the person accurately would be prob-
lematic, and most pose inferences would degrade as the seg-
mentation quality decreases.
Again, consider the differential element dV that a part
(i, j) may contain. To compute the discriminability of dV ,
we determine the parts which can cause confusion. The
confuser space C
k
(dV ) of an element dV is defined as the
region where the presence of a part (l, m) would cause con-
fusion in the classification of a pixel that can be formed due
to the projection of part (i, j) from dV (See Figure 3). The
amount of confusion is proportional to the similarity in ap-
pearance of the two parts. We define the discriminability of
a part (i, j) in a view k, D
k
(i, j) as:
D
k
(i, j)=
(l,m)
c
l,m
d(a
k
i,j
,a
k
l,m
)+c
0
d(a
k
i,j
,B
k
) (5)
where a
k
i,j
defines the appearance of a part (i, j), B
k
defines the appearance of the background, d is a distance
metric between the appearances and c is the correspond-
ing weight. For example, if appearance is represented as a

histogram, then d could be the dot-product of the two his-
tograms or the earth mover’s distance. The weight c
l,m
is
proportional to the probability of the part (l, m) lying in the
confuser space and being visible:
c
l,m
=
1
Z
C
k
(dV )
P (EO
k
(dA))P (E
l,m
(dV
1
))dV
1
(6)
where Z is a normalizing factor. Hence, the expected num-
ber of discriminable voxels in view k contained in part (i, j)
is given by:
I
k
(i, j)=
k
V
P (EO
k
(dV ))D
k
(i, j)P (E
i,j
(dV ))dV (7)
3. Information in Views and Dependencies
3.1. Model for Information Content
In order to perform inference reliably for some part of
a given person using some view, that part should, ideally,
not be occluded in that view and should not be “confused”
with the background or other parts. The accuracy of the in-
ference will depend upon both the degrees of occlusion and
confusion, as discussed in the previous section. It will also
depend on the uncertainty of such occlusion and confusion.
We present a simple model for measuring the information
available in a view regarding a part for the task of pose esti-
mation. We say that a specific voxel belonging to a person is
informative in some view if and only if it is both visible and
discriminable. The information available about a specific
part in a given view is then taken as the expected number of
visible and discriminable voxels in that view.
3.2. Information from Dependencies
Inference decisions can be improved if estimates of the
pose/appearance characteristics of the occluders and con-
fusers are used. Such information can be employed in a va-
riety of ways; an example for the position estimation prob-
lem is shown in Figure 4. Here the inference of a person’s
position involves constructing a median line through the sil-
houette of the person, and computing that line’s intersection
with the ground plane using calibration information. Fig-
ure 4(b) shows the segmentation of the person constructed
from the visible and discriminable voxels. However, the
estimate of the median-line is inaccurate when only these
voxels are used (see the magenta voxels on the ground plane
and median line-1 based on these voxels). If we additionally
use the position of the occluder we can identify occluded re-
gions (See light blue region in Figure 4(d)). The segmenta-
tion in the occluded region is then based on position priors,
which would yield a better estimate of the median line as
shown in Figure 4(c).
The inference of a part’s position depends on the infor-
mation about the occluders and confusers; the more accu-
rate our information about the occluder and confusers, the
(a) (b) (c)
(d)
Figure 4. Importance of using occlusion information before fusion:
(a) The original image (b) Occlusion-unaware segmentation and
object inference, (c) Occlusion-aware segmentation and inference
(d)The ground plane situation of the scenario. The black boundary
show the actual voxels contained in the person. In case b, only the
magenta voxels are used for median line estimation(1). In case c,
one uses combination of magenta and blue voxels for estimation
of median line(2). However, the true median line is represented by
(3).
more accurate will be our estimate. Thus, accurate infer-
ence of a part’s position depends upon the inference of oc-
cluders and confusers. Such dependencies can be repre-
sented in a dependency graph (See Fig 2). Using the pose
of other people in the inference process can, however, lead
to loops in the Bayesian network. Additionally, using infor-
mation from dependencies might involve expensive com-
putation. Our goal is to avoid introducing edges into the
dependency graph which either do not have sufficient infor-
mation or introduce loops in the Bayesian network. We do
this as follows: For each possible occluder or confuser l,we
associate a binary decision variable, ν
k
i,l
which represents
whether the knowledge about the pose of person l is to be
used in the inference of the pose of person i from view k
3
.
If there is no edge from node l (the node representing per-
son l) to node i in the dependency graph, then k, ν
k
i,l
=0.
Given some selection of edges to include in the dependency
graph, the total amount of information, I
k
i,j
, that an al-
gorithm can extract in view k about a part (i, j) using the
3
In our model, dependencies are between people and not parts; we use
the estimate of person l to estimate the locations of all parts of person i

estimates of its dependencies can be determined. This, how-
ever, also depends on the accuracy in the estimates of the
dependencies.
4. The Optimization Problem
Given the amount of information available(with and with-
out dependencies) regarding each person in each camera,
we estimate the binary decision variables µ
k
i
, ν
k
i,l
which rep-
resent whether or not camera k will be used in the inference
of person i(µ
k
i
) and, if so, whether to use the estimate of the
pose of person l when estimating the pose of person i (that
is whether or not we should include the edge from nodes l
to i in the Bayesian network). For instance, in Figure 2 the
decision variables (µ
1
C
, µ
2
C
, ν
2
C,A
) will be set to true for
person C. We would like to minimize the computational
cost while guaranteeing that the expected error in the esti-
mate of the pose of person i is below η
i
(termed a “perfor-
mance constraint”). Thus, the optimization problem can be
formulated as
min
µ
i
i
i
J
i
(µ
i
i
) such that, e
i
(µ
i
i
) η
i
i (8)
where e
i
represents the expected error in the estimate of
the pose of person i and J
i
represents the cost of comput-
ing the estimate of the pose of person i. This model also
supports attention-based surveillance when i s.t
j=i
,
η
i
<< η
j
. In such a case, most of the computational re-
sources would be devoted to estimating the pose of a distin-
guished person.
The optimization problem stated above is NP-Hard and
belongs to the class of subset selection problems [18]. While
approaches such as simulated-annealing can be used for op-
timization, much faster heuristic approaches can be employed.
4.1. A Heuristic Based Optimization Approach
We present a heuristic-based, greedy algorithm for the
optimization problem. We build the dependency graph G
by adding nodes one by one to G. Each node represents
a person and the set of cameras selected for estimating the
pose of that person. The edges incident on a node represent
the dependencies to be used in estimation (An edge l i
indicates that to estimate the pose of person i, the pose of
person l is used).
At each iteration, we compute the minimum cost
4
of
estimation of each person, i, by selecting the best possible
settings of the decision variables(µ
i
and ν
i
). However, to
avoid loops in G, we require that dependencies be selected
from the set of nodes already present in G, and should not
introduce loops in the Bayesian network. The person with
the lowest cost of estimation is then added to G. In the next
iteration, the cost of estimation is re-computed, since the
newly introduced node can be now used as a dependency
for the remaining people.
4
If the performance constraint for any person cannot be satisfied we
assume the cost of estimation to be
The algorithm is illustrated in Figure 5. At iteration 1,
the minimum costs of computation are B=2 (Using camera
1,2 and no dependency), A= (A needs to use dependency
on either B or C for the performance constraint to be satis-
fied; since the dependency graph at t=0 is null, A cannot use
any dependency), C= (C also needs to use the estimate of
B for its performance constraint to be satisfied). At iteration
2, the computation costs become A=8 (Using cameras 1,2,3
and the dependency from B) and C = 3 (Using cameras 2,3
and the dependency from B). Hence C is added at iteration
2. At iteration 3, the new minimum computation cost for
A=4 (Using cameras 2,3 and the dependency from C. The
dependency from B is not included since the performance
constraint of A is satisfied without it)
A(2,3)
0
0
0
1
1
1
0
0
0
1
1
1
A
B
C
2
31
Iteration 1
B(1,2)
Iteration 2
B(1,2)
C(2,3)
Iteration 3
B(1,2)
C(2,3)
0
0
0
1
1
1
Figure 5. A sample scenario to illustrate the heuristic algorithm.
To compute the minimum cost of estimation for each
remaining person at each iteration, one could exhaustively
search the space of possible cameras and dependencies se-
lection. However, such an approach requires exponential
time in the number of cameras, so becomes infeasible when
the number of cameras is large. Instead, we use a greedy
approach, where we start by selecting a minimal set of cam-
eras (two for example if pose is to estimated by stereo; one
if position is estimated by intersecting a median line with
the ground plane) and add more cameras and dependencies
one at a time, based on the increase in cost of computation
and the reduction in expected errors, until the performance
constraints are satisfied.
5. Experiments
We next demonstrate how COST can be applied to mul-
tiple camera tracking algorithms.
5.1. Tracking People on a Ground Plane
5.1.1 Framework
We applied COST to a variant of M2Tracker. M2Tracker
is a system that segments, detects and tracks multiple peo-
ple on a ground plane in a cluttered scene [15]. The algo-
rithm cycles between using segmentation to estimate peo-
ple’s ground plane positions and using ground plane posi-
tion estimates to obtain segmentations; the process is iter-
ated until stable. In M2Tracker, all people are segmented
in all cameras; then the segmentations are combined using
a wide-baseline stereo reconstruction algorithm for position
estimation. In COST, selected people are segmented in se-
lected views - those for which µ
k
i
=1. To use the estimate
of the position of an occluder, we first segment the occluder
and then classify the pixels in the occluded region based on
the prior probabilities alone.

Figures
Citations
More filters
Proceedings ArticleDOI

A method of camera selection based on partially observable Markov decision process model in camera networks

TL;DR: A novel camera selection method based on a partially observable Markov decision process model (POMDP) is proposed, in which the belief states of the model are used to represent noisy visual information and an innovative evaluation function is defined to identify the most informative of several multi-view video streams.
Journal ArticleDOI

Multi-analysis surveillance and dynamic distribution of computational resources: Towards extensible, robust, and efficient monitoring of environments

TL;DR: A formal model for the multi-analysis surveillance of environments by means of the named components of normality, designed to deploy surveillance systems that satisfies the three main characteristics: extensibility, robustness and efficiency.
Proceedings ArticleDOI

Application-driven merging and analysis of person trajectories for distributed smart camera networks

TL;DR: This paper presents a system concept and application for anonymously gathering, processing and analysis of trajectories in distributed smart camera networks that allows a multitude of analysis techniques such as inspecting individual properties of the observed movement in real-time.
Patent

Method for estimating the visibility of features on surfaces of object instances in multi-object scenes and method for perception planning in multi-object scenes

TL;DR: In this paper, a method for estimating visibility of features on surfaces of object instances in multi-object scenes with an unknown number of objects by a sensor from a current viewpoint is proposed.

Bargaining Strategies for Camera Selection in a Video Network

Bir Bhanu, +1 more
TL;DR: In this paper, the authors propose a method to solve the problem of 16.16.16-16.15.0/16.1/1/0/0.
References
More filters
Posted Content

Loopy Belief Propagation for Approximate Inference: An Empirical Study

TL;DR: In this article, the authors compare the performance of loopy belief propagation with the exact ones in four real world networks, including two real-world networks: ALARM and QMR, and find that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals.
Proceedings Article

Loopy belief propagation for approximate inference: an empirical study

TL;DR: This paper compares the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR, and finds that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals.
Journal ArticleDOI

M 2 Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene

TL;DR: A system that is capable of segmenting, detecting and tracking multiple people in a cluttered scene using multiple synchronized surveillance cameras located far from each other and the use of occlusion analysis to combine evidence from different camera pairs is presented.
Book ChapterDOI

A multiview approach to tracking people in crowded scenes using a planar homography constraint

TL;DR: In this paper, a multi-view approach is presented to track people in crowded scenes where people may be partially or completely occluding each other, by using multiple views in synergy so that information from all views is combined to detect objects.
Proceedings ArticleDOI

Entropy-based sensor selection heuristic for target localization

TL;DR: In this article, an entropy-based sensor selection heuristic for localization is proposed, which selects an informative sensor such that the fusion of the selected sensor observation with the prior target location distribution would yield on average the greatest or nearly the greatest reduction in the entropy of the target location distributions.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What have the authors contributed in "Cost∗: an approach for camera selection and multi-object inference ordering in dynamic scenes" ?

The authors present a unified approach, COST, that reasons about such dependencies and yields an order for the inference of each person in a group of people and a set of cameras to be used for inferences for a person. The authors present an optimization problem to select set of cameras and inference dependencies for each person which attempts to minimize the computational cost under given performance constraints. 

The algorithm cycles between using segmentation to estimate people’s ground plane positions and using ground plane position estimates to obtain segmentations; the process is iterated until stable. 

The number of occluded voxels that can be added due to dependencies depends on the selection of dependencies and the accuracy of the position estimate of the occluder. 

A naive approach (by considering all pairwise interactions of all parts of all people) would involve constructing a large Bayesian network with loops; however, this results in an intractable optimization problem. 

Let dV be a differential volume element(voxel) which might be included in part j of person i. The Occluder Region, Ωk(dV ), of a differential element dV in camera k is defined as the 3D region in which another person, l, must be present so that dV would not be visible in camera k (See Fig 3). 

The weight cl,m is proportional to the probability of the part (l,m) lying in the confuser space and being visible:cl,m = 1Z ∫ Ck(dV ) P (EO k (dA))P (El,m(dV1))dV1 (6)where Z is a normalizing factor. 

These approaches reduce the complexity by annihilating small probabilities [11] or removing weak dependencies [14] and arcs [21]. 

Typical goals of such an analysis are to recover the position, orientation or the pose of each or some subset of the people in the scene. 

the error in estimating the position of person i using the stereo pair (k1, k2) is approximated by5In M2Tracker, visibility does not vary with height and hence ground plane analysis of visibility can be performed instead of 3D modelingEi(k1, k2) = (1 − f̃(θk1,k2)Sk1i Sk2i ) (11)where θk1,k2 is the angle between the viewing directions of cameras k1 and k2 on the ground plane.