TL;DR: This paper proposes a system to depth order regions of a frame belonging to a monocular image sequence, where regions are ordered according to their relative depth using the previous and following frames.
Abstract: This paper proposes a system to depth order regions of a frame belonging to a monocular image sequence. For a given frame, regions are ordered according to their relative depth using the previous and following frames. The algorithm estimates occluded and disoccluded pixels belonging to the central frame. Afterwards, a Binary Partition Tree (BPT) is constructed to obtain a hierarchical, region based representation of the image. The final depth partition is obtained by means of energy minimization on the BPT. To achieve a global depth ordering from local occlusion cues, a depth order graph is constructed and used to eliminate contradictory local cues. Results of the system are evaluated and compared with state of the art figure/ground labeling systems on several datasets, showing promising results.
Depth perception in human vision emerges from several depth cues.
Most of the published approaches make use of two (or more) points of view to compute the disparity as it offers a reliable cue for depth estimation [2].
Whereas, references [6, 7] attempt to retrieve a full depth map from a monocular image sequence, under some assumptions/restrictions about the scene structure which may not be fulfilled in typical sequences.
The work [8] assigns figure/ground (f/g) labels to detected occlusion boundaries.
First, the optical flow is used in Section 2 to introduce motion information for the BPT [9] construction and in Section 3 to estimate (dis)occluded points.
2 Optical Flow and Image Representation
It, the previous It−1 and following It+1 frames are used.
For two given temporal indices a, b, the optical flow vector wa,b maps each pixel of Ia to one pixel in Ib.
Iteratively, the two most similar neighboring regions according to a predefined distance are merged and the process is repeated until only one region is left.
The BPT describes a set of regions organized in a tree structure and this hierarchical structure represents the inclusion relationship between regions.
3 Motion Occlusions from Optical Flow
When only one point of view is available, humans take profit of monocular depth cues to retrieve the scene structure: motion parallax and motion occlusions.
Motion parallax assumes still scenes, and it is able to retrieve the absolute depth.
Since motion occlusions appear in more situations and do not make any assumptions, they are selected here.
The pixel with maximum D(px, pm) value is decided to be the occluded pixel.
Occluded and disoccluded pixels may be useful to some extent (e.g. to improve optical flow estimation, [12]).
4 Depth Order Retrieval
Once the optical flow is estimated and the BPT is constructed, the last step of the system is to retrieve a suitable partition to depth order its regions.
There are many ways to obtain a partition from a hierarchical representation [14, 15, 9].
Since raw optical flows are not reliable at (dis)occluded points, a first step allows us to find a partition.
When the occlusion relations are estimated, the second step finds a second partition Pd attempting to maintain occluded-occluding pairs in different regions.
Obtaining Pf and Pd is performed using the same energy minimization algorithm.
4.1 General Energy Minimization on BPTs
If that is the case, Algorithm 1 uses dynamic programming (Viterbi like) to find the optimal x∗.
Small BPT with green nodes marked forming the pruning x3, also known as Center.
Keyframe with occluded (red) and occluding pixels overlaid.
The modeled flow w̃t,qRi is estimated by robust regression [17] for each region Ri. Occlusion relations estimation.
Pf and a flow model available for each region, occlusion relations can be reliably estimated.
4.2 Depth ordering
The vertices V represent the regions of D and the edges E represent occlusion relations between regions.
The weight pi = Nab/No where Nab is the number of occlusion relations between both regions.
It iteratively finds low confident occlusion relations and breaks cycles.
Once all cycles have been removed in G, a topological partial sort [18] is applied and each region is assigned a depth order.
Regions which have no depth relation, are assigned the depth of their most similar adjacent region according to the distance in the BPT construction.
5 Results
The evaluation of the system is performed at keyframes of several sequences, comparing the assigned f/g contours against the ground-truth assignments.
When two depth planes meet, the part of the contour belonging to the closest region is assigned figure, or ground otherwise, see Figure 6.
The datasets are the Carneige Mellon Dataset (CMU) [19] and the Berkeley Dataset (BDS) [8].
It can be seen in Table 1 that the proposed system outperforms the one presented in [8], showing that motion occlusions are a reliable cue for depth ordering.
In spite of the simplicity of the optical flow estimation algorithm, occlusion points were reliably estimated.
6 Conclusions
A system inferring the relative depth order of the different regions of a frame relying only on motion occlusion has been described.
Combining a variational approach for optical flow estimation and a region based representation of the image the authors have developed a reliable system to detect occlusion relations and to create depth ordered partitions using only these depth cues.
Comparison with the state of the art shows that motion occlusions are very reliable cues.
There are many possible extensions to the proposed system.
The authors believe also that occlusions caused by motions can be propagated throughout the sequence to infer a consistent depth ordering across multiple frames.
TL;DR: The proposed algorithm outperforms existing hierarchical video segmentation algorithms and provides more stable and precise regions and relies on different models and associated metrics to deal with color and motion information.
Abstract: As early stage of video processing, we introduce an iterative trajectory merging algorithm that produces a region-based and hierarchical representation of the video sequence, called the Trajectory Binary Partition Tree (BPT). From this representation, many analysis and graph cut techniques can be used to extract partitions or objects that are useful in the context of specific applications. In order to define trajectories and to create a precise merging algorithm, color and motion cues have to be used. Both types of informations are very useful to characterize objects but present strong differences of behavior in the spatial and the temporal dimensions. On the one hand, scenes and objects are rich in their spatial color distributions, but these distributions are rather stable over time. Object motion, on the other hand, presents simple structures and low spatial variability but may change from frame to frame. The proposed algorithm takes into account this key difference and relies on different models and associated metrics to deal with color and motion information. We show that the proposed algorithm outperforms existing hierarchical video segmentation algorithms and provides more stable and precise regions.
51 citations
Cites background or methods from "2.1 depth estimation of frames in i..."
...Once the tree is constructed, it can be processed in many different ways through graph cut to extract several partitions or through region analysis to extract meaningful objects in the scene [2]....
[...]
...Therefore we consider the trajectory region color model to be an adaptive histogram (signature) described by at most n = 8 dominant colors in the Lab color space [2]....
[...]
...Nevertheless, instead of simply following the merging sequence, graph cut techniques can be applied on the tree to recover useful objects for a given application [2]....
TL;DR: A system to estimate the depth order of regions belonging to a monocular image sequence using a hierarchical region-based representation of the image by means of a binary tree and a depth order graph is constructed to achieve a global and consistent depth ordering.
Abstract: This study proposes a system to estimate the depth order of regions belonging to a monocular image sequence. For each frame, the regions are ordered according to their relative depth using information from the previous and following frames. The algorithm estimates occlusions relying on a hierarchical region-based representation of the image by means of a binary tree. This representation is used to define the final depth order partition which is obtained through an energy minimisation process. Finally, to achieve a global and consistent depth ordering, a depth order graph is constructed and used to eliminate contradictory local cues. The system is evaluated and compared with the state-of-the-art figure/ground labelling systems showing very good results.
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Abstract: From the Publisher:
The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures. Like the first edition,this text can also be used for self-study by technical professionals since it discusses engineering issues in algorithm design as well as the mathematical aspects.
In its new edition,Introduction to Algorithms continues to provide a comprehensive introduction to the modern study of algorithms. The revision has been updated to reflect changes in the years since the book's original publication. New chapters on the role of algorithms in computing and on probabilistic analysis and randomized algorithms have been included. Sections throughout the book have been rewritten for increased clarity,and material has been added wherever a fuller explanation has seemed useful or new information warrants expanded coverage.
As in the classic first edition,this new edition of Introduction to Algorithms presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers. Further,the algorithms are presented in pseudocode to make the book easily accessible to students from all programming language backgrounds.
Each chapter presents an algorithm,a design technique,an application area,or a related topic. The chapters are not dependent on one another,so the instructor can organize his or her use of the book in the way that best suits the course's needs. Additionally,the new edition offers a 25% increase over the first edition in the number of problems,giving the book 155 problems and over 900 exercises thatreinforcethe concepts the students are learning.
TL;DR: In this article, a language similar to logo is used to draw geometric pictures using this language and programs are developed to draw geometrical pictures using it, which is similar to the one we use in this paper.
Abstract: The primary purpose of a programming language is to assist the programmer in the practice of her art. Each language is either designed for a class of problems or supports a different style of programming. In other words, a programming language turns the computer into a ‘virtual machine’ whose features and capabilities are unlimited. In this article, we illustrate these aspects through a language similar tologo. Programs are developed to draw geometric pictures using this language.
TL;DR: By proving that this scheme implements a coarse-to-fine warping strategy, this work gives a theoretical foundation for warping which has been used on a mainly experimental basis so far and demonstrates its excellent robustness under noise.
Abstract: We study an energy functional for computing optical flow that combines three assumptions: a brightness constancy assumption, a gradient constancy assumption, and a discontinuity-preserving spatio-temporal smoothness constraint. In order to allow for large displacements, linearisations in the two data terms are strictly avoided. We present a consistent numerical scheme based on two nested fixed point iterations. By proving that this scheme implements a coarse-to-fine warping strategy, we give a theoretical foundation for warping which has been used on a mainly experimental basis so far. Our evaluation demonstrates that the novel method gives significantly smaller angular errors than previous techniques for optical flow estimation. We show that it is fairly insensitive to parameter variations, and we demonstrate its excellent robustness under noise.
2,902 citations
"2.1 depth estimation of frames in i..." refers background or methods in this paper
...flows wt,t−1, w can be estimated using [10]....
[...]
...Fitting the flows and finding occlusion relations As stated in Section 3, the algorithm [10] does not provide reliable flow values at (dis)occluded points....
TL;DR: A novel method for unsupervised class segmentation on a set of images that alternates between segmenting object instances and learning a class model based on a segmentation energy defined over all images at the same time, which can be optimized efficiently by techniques used before in interactive segmentation.
Abstract: We propose a novel method for unsupervised class segmentation on a set of images. It alternates between segmenting object instances and learning a class model. The method is based on a segmentation energy defined over all images at the same time, which can be optimized efficiently by techniques used before in interactive segmentation. Over iterations, our method progressively learns a class model by integrating observations over all images. In addition to appearance, this model captures the location and shape of the class with respect to an automatically determined coordinate frame common across images. This frame allows us to build stronger shape and location models, similar to those used in object class detection. Our method is inspired by interactive segmentation methods [1], but it is fully automatic and learns models characteristic for the object class rather than specific to one particular object/image. We experimentally demonstrate on the Caltech4, Caltech101, and Weizmann horses datasets that our method (a) transfers class knowledge across images and this improves results compared to segmenting every image independently; (b) outperforms Grabcut [1] for the task of unsupervised segmentation; (c) offers competitive performance compared to the state-of-the-art in unsupervised segmentation and in particular it outperforms the topic model [2].
Q1. What contributions have the authors mentioned in the paper "2.1 depth estimation of frames in image sequences using motion occlusions" ?
This paper proposes a system to depth order regions of a frame belonging to a monocular image sequence. For a given frame, regions are ordered according to their relative depth using the previous and following frames. Results of the system are evaluated and compared with state of the art figure/ground labeling systems on several datasets, showing promising results.