Q2. What are the future works in "Improved data association and occlusion handling for vision-based people tracking by mobile robots" ?
Such a solution has obvious pitfalls that should be considered in future work such as proper handling of misclassification errors, wrong assignments after occlusions, uniformly dressed people, etc.
Q3. How long does it take to remove particles from a tracker?
The authors keep particles of the totally occluded tracker for a short time (we use a value of 8 frames here) in situations when quick occlusions occur and the velocity of particles may allow resolution of this occlusion.
Q4. How did the authors obtain the ground truth data?
To obtain the ground truth data the authors used a flood-fill segmentation algorithm corrected afterwards by hand using the ViPER-GT tool [3].
Q5. How is the weight of the detection particles penalised?
To avoid multiple detections in the same or similar regions, the weight of detection particles is penalised by a factor ψd < 1 in cases where particles cross already detected areas.
Q6. What are the two kinds of metric used to indicate the quality of the tracking procedure?
The authors use two kinds of metrics that indicate the quality of the tracking procedure: detection metrics (counting persons) and localisation metrics (area matching).
Q7. What is the weight update equation for established tracking filters?
The weight update equation for established tracking filters is changed to wit ∝ p(zt|xt = xit)ψ, where ψ = e(−ρgim) and gim expresses the amount of overlap between particle i and region m, which is multiplied by a factor ρ in the exponent of the penalty term.
Q8. How long does it take to calculate a step of the tracking procedure?
It takes about two times longer to calculate one step of the tracking procedure when using all three moments compared to the tracker based on thermal information only (around 30Hz on a 2.00 GHz processor when using 1000 samples).
Q9. What is the weight update equation for the ith detection particle?
The weight update equation for the ith detection particle is modified to wit ∝ p(zt|xt = xit)ψ, where ψ = ψd if particle i overlaps with other detected regions and ψ = 1 otherwise.
Q10. What is the fitness value for each sample i?
A fitness value f i for each sample i is then calculated as the sum of all gradients multiplied with individual weights αj for each region: f i = ∑m j=1 αj∆ i j .
Q11. What is the importance of the ellipse?
To calculate the importance weight wit of a sample i with state xit the authors divide the ellipses into m = 7 different regions (see Fig. 2) and for each region j the image gradient ∆ij between pixels in the inner and outer parts of the ellipse is calculated.
Q12. How can the authors determine the region corresponding to a person on the colour image?
By using the affine transformation the authors are able to determine the region corresponding to a person on the colour image (see Fig. 3).
Q13. How is the order of persons determined?
The order ofthe persons from front-to-back is then determined by a sort procedure requiring MO · log(MO) comparisons where MO specifies the number of overlapping persons.
Q14. What is the trade-off between time requirements and performance of the tracker?
A good trade-off between time requirements and performance of the tracker for their setup is a representation using just the first moment of the colour distribution (46% more time compared to the gradient based tracker).
Q15. What features are used to indicate the order of overlapping persons in the image?
There are several features that could indicate the order of overlapping persons in the image, from which the authors have chosen a set of three thermal and three colour features.