2.1 depth estimation of frames in image sequences using motion occlusions
Summary (2 min read)
1 Introduction
- Depth perception in human vision emerges from several depth cues.
- Most of the published approaches make use of two (or more) points of view to compute the disparity as it offers a reliable cue for depth estimation [2].
- Whereas, references [6, 7] attempt to retrieve a full depth map from a monocular image sequence, under some assumptions/restrictions about the scene structure which may not be fulfilled in typical sequences.
- The work [8] assigns figure/ground (f/g) labels to detected occlusion boundaries.
- First, the optical flow is used in Section 2 to introduce motion information for the BPT [9] construction and in Section 3 to estimate (dis)occluded points.
2 Optical Flow and Image Representation
- It, the previous It−1 and following It+1 frames are used.
- For two given temporal indices a, b, the optical flow vector wa,b maps each pixel of Ia to one pixel in Ib.
- Iteratively, the two most similar neighboring regions according to a predefined distance are merged and the process is repeated until only one region is left.
- The BPT describes a set of regions organized in a tree structure and this hierarchical structure represents the inclusion relationship between regions.
3 Motion Occlusions from Optical Flow
- When only one point of view is available, humans take profit of monocular depth cues to retrieve the scene structure: motion parallax and motion occlusions.
- Motion parallax assumes still scenes, and it is able to retrieve the absolute depth.
- Since motion occlusions appear in more situations and do not make any assumptions, they are selected here.
- The pixel with maximum D(px, pm) value is decided to be the occluded pixel.
- Occluded and disoccluded pixels may be useful to some extent (e.g. to improve optical flow estimation, [12]).
4 Depth Order Retrieval
- Once the optical flow is estimated and the BPT is constructed, the last step of the system is to retrieve a suitable partition to depth order its regions.
- There are many ways to obtain a partition from a hierarchical representation [14, 15, 9].
- Since raw optical flows are not reliable at (dis)occluded points, a first step allows us to find a partition.
- When the occlusion relations are estimated, the second step finds a second partition Pd attempting to maintain occluded-occluding pairs in different regions.
- Obtaining Pf and Pd is performed using the same energy minimization algorithm.
4.1 General Energy Minimization on BPTs
- If that is the case, Algorithm 1 uses dynamic programming (Viterbi like) to find the optimal x∗.
- Small BPT with green nodes marked forming the pruning x3, also known as Center.
- Keyframe with occluded (red) and occluding pixels overlaid.
- The modeled flow w̃t,qRi is estimated by robust regression [17] for each region Ri. Occlusion relations estimation.
- Pf and a flow model available for each region, occlusion relations can be reliably estimated.
4.2 Depth ordering
- The vertices V represent the regions of D and the edges E represent occlusion relations between regions.
- The weight pi = Nab/No where Nab is the number of occlusion relations between both regions.
- It iteratively finds low confident occlusion relations and breaks cycles.
- Once all cycles have been removed in G, a topological partial sort [18] is applied and each region is assigned a depth order.
- Regions which have no depth relation, are assigned the depth of their most similar adjacent region according to the distance in the BPT construction.
5 Results
- The evaluation of the system is performed at keyframes of several sequences, comparing the assigned f/g contours against the ground-truth assignments.
- When two depth planes meet, the part of the contour belonging to the closest region is assigned figure, or ground otherwise, see Figure 6.
- The datasets are the Carneige Mellon Dataset (CMU) [19] and the Berkeley Dataset (BDS) [8].
- It can be seen in Table 1 that the proposed system outperforms the one presented in [8], showing that motion occlusions are a reliable cue for depth ordering.
- In spite of the simplicity of the optical flow estimation algorithm, occlusion points were reliably estimated.
6 Conclusions
- A system inferring the relative depth order of the different regions of a frame relying only on motion occlusion has been described.
- Combining a variational approach for optical flow estimation and a region based representation of the image the authors have developed a reliable system to detect occlusion relations and to create depth ordered partitions using only these depth cues.
- Comparison with the state of the art shows that motion occlusions are very reliable cues.
- There are many possible extensions to the proposed system.
- The authors believe also that occlusions caused by motions can be propagated throughout the sequence to infer a consistent depth ordering across multiple frames.
Did you find this useful? Give us your feedback
Citations
51 citations
Cites background or methods from "2.1 depth estimation of frames in i..."
...Once the tree is constructed, it can be processed in many different ways through graph cut to extract several partitions or through region analysis to extract meaningful objects in the scene [2]....
[...]
...Therefore we consider the trajectory region color model to be an adaptive histogram (signature) described by at most n = 8 dominant colors in the Lab color space [2]....
[...]
...Nevertheless, instead of simply following the merging sequence, graph cut techniques can be applied on the tree to recover useful objects for a given application [2]....
[...]
2 citations
References
19 citations
"2.1 depth estimation of frames in i..." refers background in this paper
...References [4, 5] estimate a layered image representation of the scene....
[...]
12 citations
"2.1 depth estimation of frames in i..." refers methods in this paper
...The datasets are the Carneige Mellon Dataset (CMU) [19] and the Berkeley Dataset (BDS) [8]....
[...]
...Results contain sequences with ground-truth data (30 for the CMU, 42 for the BDS)....
[...]
7 citations
"2.1 depth estimation of frames in i..." refers background or methods in this paper
...To this purpose, the algorithm defined in [11] is used....
[...]
...Once the optical flows are computed, a BPT is built [11]....
[...]
...Although state of the art results on these cues [11] show that they are less reliable than motion occlusions, they could be a good complement to the system....
[...]
...Although the construction process is an active field of study, it is not the main purpose of this paper and we chose the distance defined in [11] to build the BPT: the region distance is defined using color, area, shape and motion information....
[...]
5 citations
"2.1 depth estimation of frames in i..." refers background in this paper
...References [4, 5] estimate a layered image representation of the scene....
[...]