PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
read more
Citations
Struck: Structured Output Tracking with Kernels
Staple: Complementary Learners for Real-Time Tracking
A Novel Performance Evaluation Methodology for Single-Target Trackers
Learning Video Object Segmentation from Static Images
The Visual Object Tracking VOT2014 challenge results
References
The Pascal Visual Object Classes (VOC) Challenge
C ONDENSATION —Conditional Density Propagation forVisual Tracking
"GrabCut": interactive foreground extraction using iterated graph cuts
Generalizing the hough transform to detect arbitrary shapes
Real-time human pose recognition in parts from single depth images
Related Papers (5)
Frequently Asked Questions (18)
Q2. How do they use the graph-cut algorithm?
By back-projecting the patches that voted for the object centre, the authors initialise a graph-cut algorithm to segment foreground from background.
Q3. What is the problem with tracking arbitrary objects?
Given a video stream, tracking arbitrary objects that are non-rigid, moving or static, rotating and deforming, partially occluded, under changing illumination and without any prior knowledge is a challenging task.
Q4. How many videos are in the second dataset?
The second dataset2 is composed of 11 videos (around 2 500 frames) showing moving objects that undergo considerable rigid and non-rigid deformations.
Q5. How do the authors make the estimation more robust?
In order to incorporate the segmentation of the previous video frame at time t− 1 and to make the estimation more robust, the authors use a recursive Bayesian formulation, where, at time t, each pixel (in the search window) is assigned the foreground probability:p(ct = 1|y1:t) = Z −1p(yt|ct = 1)∑c′ t−1p(ct = 1|c ′ t−1) p(c ′ t−1|y1:t−1) , (2)where Z is a normalisation constant to make the probabilities sum to 1.
Q6. What is the recent version of the VTD method?
A sparse set of templates has also been used by Liu et al. [27], but with smaller image patches of object parts, and by Kwon et al. [23] in their Visual Tracking Decomposition (VTD) method.
Q7. What is the main purpose of the l1 tracker?
In order to cope with changing appearance, Mei et al. [28] introduced the l1 tracker that is based on a sparse set of appearance templates that are collected during tracking and used in the observation model of a particle filter.
Q8. What is the common method used for visual object tracking?
Earlier works [21, 11, 32, 30, 41, 18] on visual object tracking mostly consider a bounding box (or some other simple geometric model) representation of the object to track, and often a global appearance model is used.
Q9. What is the way to track a non-rigid object?
Bibby et al. [8] propose an adaptive probabilistic framework separating the tracking of non-rigid objects into registration and level-set segmentation, where posterior probabilities are computed at the pixel level.
Q10. What are the advantages of using a classical method to track objects?
These classical methods are very robust to some degree of appearance change and local deformations (as in face tracking), and also allow for a fast implementation.
Q11. What is the advantage of using a pixel-based Hough model?
This has the following advantages:• pixel-based descriptors are more suitable for detecting objects that are extremely small in the image (e.g. for far-field vision),• the feature space is relatively small and does not (or very little) depend on spatial neighbourhood, which makes training and updating of the model easier and more coherent with the object’s appearance changes,• the training and the application of the detector is extremely fast as it can be implemented with look-up tables.
Q12. What is the p(ct) of the background histogram?
It is computed dynamically at each frame by a simple reliability measure that is defined as the proportion of pixels in the search window that change from foreground to background or vice versa, i.e. crossing the threshold p(cx = 1|y) = 0.5.
Q13. What is the training process for constructing D?
training consists in constructing D, where each pixel I(x) in the given bounding box produces a displacement vector dz (arrows in Fig. 2) corresponding to its quantised value zx and pointing to the centre of the bounding box.
Q14. How can a video frame be detected?
In a new video frame, the object can be detected by letting each pixel I(x) inside the search window vote according to Dz corresponding to its quantised value zx.
Q15. What is the main difficulty of learning a robust model from consecutive video frames?
When no prior knowledge about the object’s shape and appearance is available, one of the main difficulties is to incrementally learn a robust model from consecutive video frames.
Q16. what is the way to track objects in video?
Their algorithm is very fast compared to existing methods, which makes it suitable for realtime applications, or tasks where many objects need to be tracked at the same time, or where large amounts of data need to be processed (e.g. video indexation).
Q17. What is the drawback of using a pixel-based Hough model?
One drawback of using a pixel-based Hough model is when the object’s image region contains primarily pixels of very similar colours (and gradients).
Q18. What is the class of the pixel at position x at time t?
Let ct,x ∈ {0, 1} be the class of the pixel at position x at time t: 0 for background, and 1 for foreground, and let y1:t,x be the pixel’s colour observations from time 1 to t.