TL;DR: An environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation is proposed and an uncertainty error model propagation is formulated for outlier rejection and data fusion.
Abstract: This paper proposes an environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation. Focus is made on an ego-centric pose graph structure by the use of Keyframes to capture the local properties of the scene. With the aim of reducing data redundancy, suppress sensor noise whilst maintaining a dense compact representation of the environment, neighbouring augmented spheres are fused in a single representation. To this end, an uncertainty error model propagation is formulated for outlier rejection and data fusion, enhanced with the notion of landmark stability over time. Finally, our algorithm is tested thoroughly on a newly developed wide angle 360° field of view (FOV) spherical sensor where improvements such as trajectory drift, compactness and reduced tracking error are demonstrated.
Visual mapping is a required capability for autonomous robots and a key component for long term navigation and localisation.
On the other hand, this representation is prone to drift phenomena which becomes significant over extended trajectories.
This representation not only brings about a good abstraction level of the environment but common tasks such as homing, navigation, exploration and path planning become more efficient.
Accurate and compact 3D environment modelling and reconstruction has drawn increased interests within the vision and robotics community over the years as it is perceived as a vital tool for Visual SLAM techniques in realising tasks such as localisation, navigation, exploration and path planning [3].
II. METHODOLOGY AND CONTRIBUTIONS
The authors aim is concentrated around building ego-centric topometric maps represented as a graph of keyframes, spread by spherical RGBD nodes.
This not only reduces data redundancy but also help in suppressing sensor noise whilst contributing significantly in drift reduction.
This work is directly related to two previous syntheses of [4] and [12].
Then the uncertainty error model is presented followed by the fusion stage.
III. PRELIMINARIES
The basic environment representation consists of a set of spheres acquired over time together with a set of rigid transforms T ∈ SE(3) connecting adjacent spheres (e.g. Tij lies Sj and Si) – this representation is well described in [14].
The inverse transform g−1 corresponds to the spherical projection model.
Point coordinates correspondences between spheres are given by the warping function w, under observability conditions at different viewpoints.
In the following, spherical RGBD registration and keyframe based environment representations are introduced.
A. Spherical Registration and Keyframe Selection
The relative location between raw spheres is obtained using a visual spherical registration procedure [12] [7].
The linearization of the over-determined system of equations (2) leads to a classic iterative Least Mean Squares (ILMS) solution.
Furthermore for computational efficiency, one can choose a subset of more informative pixels (salient points) that yield enough constraints over the 6DOF, without compromising the accuracy of the pose estimate.
This simple registration procedure applied for each sequential pair of spheres allows to represent the scene structure, but subjected to cumulative VO errors and scalability issues due to long-term frame accumulation.
A criteria based on differential entropy approach [9] has been applied in this work for keyframe selection.
IV. SPHERICAL UNCERTAINTY PROPAGATION AND MODEL FUSION
The authors approach to topometric map building is an egocentric representation operating locally on sensor data.
The concept of proximity used to combine information is evaluated mainly with the entropy similarity criteria after the registration procedure.
Instead of performing a complete bundle adjustment along all parameters including poses and structure for the full set of close raw spheres.
S∗, the procedure is done incrementally in two stages.
A. Warped Sphere Uncertainty
This section aims to represent the confidence of the elements in Sw, which clearly depends on the combination of an apriori pixel position, the depth and the pose errors over a set of geometric and projective operations – the warping function as in (1).
Before introducing these two terms, let’s represent the uncertainty due to the combination of pose T and a cartesian 3D point q errors.
The uncertainty index σ2Dw is then the normalized covari- ance given by: σ2Dw(p) = σ 2 Dt(p)/(qw(p,T) ⊤qw(p,T)) 2 (8) Finally, under the assumption of Lambertian surfaces, the photometric component is simply Iw(p∗) = I(w(p∗,T)) and it’s uncertainty σ2I is set by a robust weighting function on the error using a Huber’s M-estimator as in [14].
B. Spherical Keyframe Fusion
S∗ with that of the transformed observation Sw, a probabilistic test is performed to exclude outlier pixel measurements from Sw, allowing fusion to occur only if the raw observation agrees with its corresponding value in S∗.
Hence, the tuple A = {D∗,Dw} and B = {I∗, Iw} are defined as the sets of model predicted and measured depth and intensity values respectively.
Finally, let a class c : D∗(p) = Dw(p) relate to the case where the measurement value agrees with its corresponding observation value.
C. Dynamic 3D points filtering
So far, the problem of data fusion of consistent estimates in a local model has been addressed.
These points exhibit erratic behaviours along the trajectory and as a matter of fact, they are highly unstable.
There are however different levels of “dynamicity” as mentioned in [11].
Points/landmarks observed can exhibit a gradual degradation over time, while others may undergo a sudden brutal change – the case of an occlusion for example.
The probabilistic framework for data association developed in the section IV-B is a perfect fit to filter out inconsistent data.
D. Application to Saliency map
Instead of naively dropping out points below a certain threshold, for e.g, p < 0.8, they are better pruned out of a saliency map [13].
The underlying algorithm is outlined in algorithm (1).
This happens when the Keyframe criteria based on an entropy ratio α [7][9] is reached.
Firstly, the notion of uncertainty is incorporated in spherical pixel tracking.
Eventually, between an updated model at time t0 and the following re-initialised one, at tn, an optimal mix of information sharing happens between the two.
V. EXPERIMENTS AND RESULTS
A new sensor for a large field of view RGBD image acquisition has been used in this work.
The chosen configuration offers the advantage of creating full 360 ◦ RGBD images of the scene isometrically, i.e. the same solid angle is assigned to each pixel.
The authors experimental test bench consists of the spherical sensor embarked on a mobile experimental platform and driven around in an indoor office building environment for a first learning phase whilst spherical RGBD data is acquired online and registered in a database.
This is even more emphasized by inspecting the 3D structure of the reconstructed environment as shown by figures (4), (4) where the two images correspond to the sequence with and without fusion; method 1 and method 2 respectively.
The threshold for α is generally heuristically tuned.
VI. CONCLUSION
A framework for hybrid metric and topological maps in a single compact skeletal pose graph representation has been proposed.
Two methods have been experimented and the importance of data fusion has been highlighted with the benefits of reducing data noise, redundancy, tracking drift as well as maintaining a compact environment representation using keyframes.
The authors activities are centered around building robust and stable environment representations for lifelong navigation and map building.
TL;DR: In this article, the relative poses between RGB-D cameras with minimal overlapping fields of view are estimated by using descriptor-based patterns to provide well-matched 2D keypoints in the case of a minimal overlapping field of view between cameras.
Abstract: This paper presents a novel method to estimate the relative poses between RGB-D cameras with minimal overlapping fields of view. This calibration problem is relevant to applications such as indoor 3D mapping and robot navigation that can benefit from a wider field of view using multiple RGB-D cameras. The proposed approach relies on descriptor-based patterns to provide well-matched 2D keypoints in the case of a minimal overlapping field of view between cameras. Integrating the matched 2D keypoints with corresponding depth values, a set of 3D matched keypoints are constructed to calibrate multiple RGB-D cameras. Experiments validated the accuracy and efficiency of the proposed calibration approach.
TL;DR: An approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and geometric information available from the augmented spherical database.
Abstract: Our aim is concentrated around building ego-centric topometric maps represented as a graph of keyframe nodes which can be efficiently used by autonomous agents. The keyframe nodes which combines a spherical image and a depth map (augmented visual sphere) synthesises information collected in a local area of space by an embedded acquisition system. The representation of the global environment consists of a collection of augmented visual spheres that provide the necessary coverage of an operational area. A "pose" graph that links these spheres together in six degrees of freedom, also defines the domain potentially exploitable for navigation tasks in real time. As part of this research, an approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and ; geometric information available from our augmented spherical database ; how to determine the quantity and optimal placement of these augmented spheres to cover an environment completely ; how tomodel sensor uncertainties and update the dense infomation of the augmented spheres ; how to compactly represent the information contained in the augmented sphere to ensure robustness, accuracy and stability along an explored trajectory by making use of saliency maps.
7 citations
Cites methods from "A compact spherical RGBD keyframe-b..."
...Part of the conceptual formulation and experimental evaluations of this chapter has been published in [Gokhool et al. 2015]....
[...]
...La formulation et l’évaluation expérimentale correspondante ont été en partie présentée dans une conférence internationale [Gokhool et al. 2015]....
TL;DR: This paper presents a methodology to combine information from a sequence of RGB-D spherical views acquired by a home-made multi-stereo device in order to improve the computed depth images both in terms of accuracy and completeness.
Abstract: This paper presents a methodology to combine information from a sequence of RGB-D spherical views acquired by a home-made multi-stereo device in order to improve the computed depth images both in terms of accuracy and completeness. This methodology is embedded in a larger visual mapping framework aiming to produce accurate and dense topometric urban maps. Our method is based on two main filtering stages. Firstly, we perform a segmentation process considering both geometric and photometric image constraints, followed by a regularization step (spatial-integration). We then proceed to a fusion stage where the geometric information is further refined by considering the depth images of nearby frames (temporal integration). This methodology can be applied to other projective models, such as perspective stereo images. Our approach is evaluated within the frameworks of image registration, localization and mapping, demonstrating higher accuracy and larger convergence domains over different datasets.
6 citations
Cites methods from "A compact spherical RGBD keyframe-b..."
...For this, we follow the approach of [5] which is briefly summarized for the sake of completeness....
[...]
...As stated previously, the presented methodology is directly related to [10] and [5]....
[...]
...The fusion relies on our previous work [5], where coherent regularized frames are merged in a single keyframe, taking into account the related uncertainties and their co-visibility....
[...]
...By last, we extend the method proposed in [5] by exploiting the rigidity of neighbourhoods through a joint depth – color segmentation, which has a clear improvement in the maps produced, specially when considering the more challenging data coming from stereo outdoor sequences....
TL;DR: This work proposes a new approach to update maps pertaining to large-scale dynamic environments with semantics with semantics which is able to build a stable representation with only two observations of the environment.
Abstract: Mapping evolving environments requires an update mechanism to efficiently deal with dynamic objects. In this context, we propose a new approach to update maps pertaining to large-scale dynamic environments with semantics. While previous works mainly rely on large amount of observations, the proposed framework is able to build a stable representation with only two observations of the environment. To do this, scene understanding is used to detect dynamic objects and to recover the labels of the occluded parts of the scene through an inference process which takes into account both spatial context and a class occlusion model. Our method was evaluated on a database acquired at two different times with an interval of three years in a large dynamic outdoor environment. The results point out the ability to retrieve the hidden classes with a precision score of 0.98. The performances in term of localisation are also improved.
5 citations
Additional excerpts
...At a larger scale, all submaps are positioned in the scene thanks to a dense visual odometry method presented in [11] and constitute a global graph of the environment....
TL;DR: This work proposes an activation function based on the conditioning of the RGB and ICP point-to-plane error terms that strengthens the geometric error influence in the first coarse iterations, while the intensity data term dominates in the finer increments.
Abstract: Dense direct RGB-D registration methods are widely used in tasks ranging from localization and tracking to 3D scene reconstruction. This work addresses a peculiar aspect which drastically limits the applicability of direct registration, namely the weakness of the convergence domain. First, we propose an activation function based on the conditioning of the RGB and ICP point-to-plane error terms. This function strengthens the geometric error influence in the first coarse iterations, while the intensity data term dominates in the finer increments. The information gathered from the geometric and photometric cost functions is not only considered for improving the system observability, but for exploiting the different convergence properties and convexity of each data term. Next, we develop a set of strategies as a flexible regularization and a pixel saliency selection to further improve the quality and robustness of this approach. The methodology is formulated for a generic warping model and results are given using perspective and spherical sensor models. Finally, our method is validated in different RGB-D spherical datasets, including both indoor and outdoor real sequences and using the KITTI VO/SLAM benchmark dataset. We show that the different proposed techniques (weighted activation function, regularization, saliency pixel selection), lead to faster convergence and larger convergence domains, which are the main limitations to the use of direct methods.
4 citations
Cites methods from "A compact spherical RGBD keyframe-b..."
...Finally, our regularization method, which is an extension of [24] [25], is directly related to [26] which perform a region growing using simultaneously intensity and geometric contours....
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.
4,184 citations
"A compact spherical RGBD keyframe-b..." refers methods in this paper
...Space discretisation using volumetric methods such as signed distance functions [17], with combined sparse representations using octrees [18], though they have received widespread attention recently due to their reconstruction quality do however present certain caveats....
TL;DR: The calibration of the Kinect sensor is discussed, and an analysis of the accuracy and resolution of its depth data is provided, based on a mathematical model of depth measurement from disparity.
Abstract: Consumer-grade range cameras such as the Kinect sensor have the potential to be used in mapping applications where accuracy requirements are less strict. To realize this potential insight into the geometric quality of the data acquired by the sensor is essential. In this paper we discuss the calibration of the Kinect sensor, and provide an analysis of the accuracy and resolution of its depth data. Based on a mathematical model of depth measurement from disparity a theoretical error analysis is presented, which provides an insight into the factors influencing the accuracy of the data. Experimental results show that the random error of depth measurement increases with increasing distance to the sensor, and ranges from a few millimeters up to about 4 cm at the maximum range of the sensor. The quality of the data is also found to be influenced by the low resolution of the depth measurements.
1,671 citations
"A compact spherical RGBD keyframe-b..." refers background in this paper
...The basic error model for the raw depth is proportional to its fourth degree itself: σ(2) ρ ∝ ρ(4), which can be applied to both stereopsis and active depth measurement systems (for instance see [10] for details)....
TL;DR: This paper proposes a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels, and proposes an entropy-based similarity measure for keyframe selection and loop closure detection.
Abstract: In this paper, we propose a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels. In contrast to sparse, feature-based methods, this allows us to better exploit the available information in the image data which leads to higher pose accuracy. Furthermore, we propose an entropy-based similarity measure for keyframe selection and loop closure detection. From all successful matches, we build up a graph that we optimize using the g2o framework. We evaluated our approach extensively on publicly available benchmark datasets, and found that it performs well in scenes with low texture as well as low structure. In direct comparison to several state-of-the-art methods, our approach yields a significantly lower trajectory error. We release our software as open-source.
897 citations
"A compact spherical RGBD keyframe-b..." refers background or methods in this paper
...This happens when the Keyframe criteria based on an entropy ratio α [7][9] is reached....
[...]
...Furthermore, performing frame to frame registration introduces drift in the trajectory due to uncertainty in the estimated pose as pointed out in [9]....
[...]
...A criteria based on differential entropy approach [9] has been applied in this work for keyframe selection....
TL;DR: A viewpoint-based approach for the quick fusion of multiple stereo depth maps by selecting depth estimates for each pixel that minimize violations of visibility constraints and thus remove errors and inconsistencies from the depth maps to produce a consistent surface.
Abstract: We present a viewpoint-based approach for the quick fusion of multiple stereo depth maps. Our method selects depth estimates for each pixel that minimize violations of visibility constraints and thus remove errors and inconsistencies from the depth maps to produce a consistent surface. We advocate a two-stage process in which the first stage generates potentially noisy, overlapping depth maps from a set of calibrated images and the second stage fuses these depth maps to obtain an integrated surface with higher accuracy, suppressed noise, and reduced redundancy. We show that by dividing the processing into two stages we are able to achieve a very high throughput because we are able to use a computationally cheap stereo algorithm and because this architecture is amenable to hardware-accelerated (GPU) implementations. A rigorous formulation based on the notion of stability of a depth estimate is presented first. It aims to determine the validity of a depth estimate by rendering multiple depth maps into the reference view as well as rendering the reference depth map into the other views in order to detect occlusions and free- space violations. We also present an approximate alternative formulation that selects and validates only one hypothesis based on confidence. Both formulations enable us to perform video-based reconstruction at up to 25 frames per second. We show results on the multi-view stereo evaluation benchmark datasets and several outdoors video sequences. Extensive quantitative analysis is performed using an accurately surveyed model of a real building as ground truth.
396 citations
"A compact spherical RGBD keyframe-b..." refers methods in this paper
...To provide alternatives, a different approach consisting of multi-view rendering was proposed in [15]....
TL;DR: This paper presents a novel method for real-time camera tracking and 3D reconstruction of static indoor environments using an RGB-D sensor that is more accurate and robust than the iterated closest point algorithm (ICP) used by KinectFusion, and yields often a comparable accuracy at much higher speed to feature-based bundle adjustment methods such asRGB-D SLAM.
Abstract: The ability to quickly acquire 3D models is an essential capability needed in many disciplines including robotics, computer vision, geodesy, and architecture. In this paper we present a novel method for real-time camera tracking and 3D
reconstruction of static indoor environments using an RGB-D sensor. We show that by representing the geometry with a signed distance function (SDF), the camera pose can be efficiently estimated by directly minimizing the error of the depth images on the SDF. As the SDF contains the distances to the surface for
each voxel, the pose optimization can be carried out extremely fast. By iteratively estimating the camera poses and integrating the RGB-D data in the voxel grid, a detailed reconstruction of an indoor environment can be achieved. We present reconstructions of several rooms using a hand-held sensor and from onboard an autonomous quadrocopter. Our extensive evaluation on publicly
available benchmark data shows that our approach is more accurate and robust than the iterated closest point algorithm (ICP) used by KinectFusion, and yields often a comparable accuracy at much higher speed to feature-based bundle adjustment methods such as RGB-D SLAM for up to medium-sized scenes.
234 citations
"A compact spherical RGBD keyframe-b..." refers methods in this paper
...In this context, accurate and compact 3D environment modelling and reconstruction has drawn increased interests within the vision and robotics community over the years as it is perceived as a vital tool for Visual SLAM techniques in realising tasks such as localisation, navigation, exploration and path planning [3]....
Q1. What contributions have the authors mentioned in the paper "A compact spherical rgbd keyframe-based representation" ?
This paper proposes an environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation. With the aim of reducing data redundancy, suppress sensor noise whilst maintaining a dense compact representation of the environment, neighbouring augmented spheres are fused in a single representation.