Avoiding the "streetlight effect": tracking by exploring likelihood modes
read more
Citations
People-tracking-by-detection and people-detection-by-tracking
A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera.
A data-driven approach for real-time full body pose reconstruction from a depth camera
Consumer Depth Cameras for Computer Vision
A free energy principle for a particular physics
References
A method for registration of 3-D shapes
C ONDENSATION —Conditional Density Propagation forVisual Tracking
Similarity Search in High Dimensions via Hashing
Articulated body motion capture by annealed particle filtering
Fast pose estimation with parameter-sensitive hashing
Related Papers (5)
Frequently Asked Questions (15)
Q2. What is the problem with a large number of hypotheses?
Since the peaks in the compatibility function between images and pose are sharp [19], and dynamics are highly uncertain (except for very structured cases such as walking), a large number of hypotheses may have to be generated in order to locate the actual pose.
Q3. What is the pose estimation algorithm for video sequences?
The single-frame pose estimation algorithm of [16] uses parameter sensitive hashing to retrieve several samples with poses similar to the image, followed by robust regression.
Q4. How do the authors obtain the temporal prior?
The authors obtain the temporal prior by propagating modes of the posterior computed at the previous time step through a weak dynamics model.
Q5. How many bins were used in the EDH?
For images of 200 by 200 pixels used in their database, with 3 scales (8, 16 and 32 pixels) and with location step size of half the scale, the EDH consisted of N = 13, 076 bins.
Q6. What is the way to define an adequate likelihood model?
In the case of monocular data, an adequate likelihood model could be defined [17] by the reprojection error of the 3D articulated model onto the images.
Q7. What is the probability of the posterior distribution at the previous time step?
if the posterior distribution at the previous time step (and thus the temporal prior, as the authors assume simple diffusion dynamics) is estimated as a mixture of K Gaussians, and the likelihood is a sum of L Gaussians, then it is reasonable to expect that the posterior estimate at the current time step will be a mixture of L × K Gaussians.
Q8. How can the authors estimate the posterior of a pose?
The authors will show, however, that when the temporal prior is wide (i.e. the noise covariance is much greater than the covariance of the likelihood modes), then the estimate of the posterior may be obtained simply by modifying the weights of the likelihood Gaussians according to the prior.
Q9. What is the importance of a good local optimization algorithm?
In order for local optimization to succeed, it is important to select starting pose hypotheses that are sufficiently close to the modes.
Q10. What is the common method for calculating the likelihood of a pose?
The authors apply a local search algorithm using initializations {xinit} from both the centers of the modes µt−1i of the likelihood p(yt−1|xt−1) at the previous time step as well as pose estimates provided by a global search algorithm such as PSH.
Q11. What is the problem with posing in probabilistic terms?
When posed in probabilistic terms, the problem is the following: the pose likelihood is sharp but multi-modal, and the (dynamics-based) temporal prior is wide.
Q12. How many samples would be drawn to obtain an initial hypothesis?
While it is possible to generate initial hypothesis from the wide temporal prior [19, 5, 17], or by uniformly sampling the pose space, in both of these methods a large number of samples would need to be drawn in order to obtain an hypothesis adequately close to the mode.
Q13. What are some of the challenges for articulated tracking algorithms?
Some of the sequences contain many challenges for articulated tracking algorithms, including perspective effects (e.g. images taken from a 45 degree angle, hands moving very close to the camera), multiple self-occlusions (e.g. body turned on the side, completely hiding one of the arms), partial visibility (e.g. arms out of the field of view of the camera) and fast motions.
Q14. What are the advantages of such algorithms?
Such algorithms are proven to converge (when initialized close to the solution) and are less computationally intensive than standard optimization techniques.
Q15. how can a lmo search a large region of the pose space?
In contrast to classic sampling approaches, their method can explore a much larger region of the pose space since searching a vast number of examples with an approximate nearest neighbor search and refining a few modes is much more efficient than maintaining a particle set of a sufficient size.