Dense Omnidirectional RGB-D Mapping of Large-scale Outdoor Environments for Real-time Localization and Autonomous Navigation
Summary (8 min read)
1 Introduction
- Autonomous navigation in complex and dynamic unstructured environments over large-scale distances, is an active field in robotics.
- This representation allows to efficiently perform online localisation and navigation using state of the art dense registration techniques (Comport et al., 2007; Meilland et al., 2010; Meilland et al., 2012).
- This paper has been structured in such a way that the relevant literature review is included at the beginning of each section.
- Finally, Section 6 will detail several field experiments.
2 Visual localisation
- This feature extraction and matching process is often badly conditioned, noisy and not robust, and therefore it must rely on higher level robust estimation techniques and on filtering.
- Direct approaches (image-based), however, do not rely on this feature extraction and matching process.
- The camera motion is directly estimated by minimising a non-linear intensity error between images, via a parametric warping function.
- In (Comport et al., 2007) direct approaches were generalised to use the full-image densely and track 6 dof pose using stereo cameras whilst mapping the environment through dense stereo matching.
2.1 Direct 3D registration
- Under full brightness consistency assumption, the current image intensities can be warped via novel view synthesis (Avidan and Shashua, 1997) onto the reference view such that I∗(P∗) = I (w(T;Z∗,P∗)) , (2) where the warping function w(·) transfers the current image intensities onto the reference pixels P∗ via the depth-map Z∗.
- This warping function depends on the acquisition sensor, and can be for example a perspective projection, an omnidirectional projection or a spherical projection ).
- Now supposing that only a close approximation.
- This function allows to rejects outliers such as moving objects and local illumination changes between the images.
2.2 Non-linear minimisation
- An efficient second order minimisation (ESM ) is employed (Malis, 2004), which allows to pre-compute most of the minimisation parts directly on the reference image.
- The pose estimate is updated at each step by an homogeneous transformation: T̂ ← T̂T(x), (7) and the minimisation is iterated until the increments on x are sufficiently small.
2.3 Multiple key-frame registration
- In Section 2.1, only one reference image was considered.
- This approach is highly incremental and prone to drift and linearisation approximation.
- If more than one reference image is available then it is possible to consider global bundle adjustment approaches (Triggs et al., 2000), however, these approaches are not computationally efficient for real-time applications.
- Furthermore, in large scale mapping approaches, the connectivity between the various camera poses is relatively sparse meaning that performing global bundle adjustment is not efficient.
- Alternatively, it is possible to consider sliding window bundle adjustment (Mouragnon et al., 2006; Sibley et al., 2008), which improves performance at little cost.
2.4 Multi-resolution approach
- One of the major drawbacks of direct approaches, and more generally of iterative approaches, is that the initial pose T̂must be close to the solution T̃ to converge.
- The convergence domain and the algorithm performance can be improved using a coarse-to-fine optimisation strategy.
- This is achieved using multi-resolution image pyramids (e.g. constructed by Gaussian filtering and sub-sampling (Burt and Adelson, 1983)).
- The minimisation begins at the lowest resolution and the result is used to initialise the next level repeatedly until the highest resolution is reached.
- This greatly improves the convergence domain/speed and some local minima can be avoided.
3.1 Parametric models
- Many approaches in the literature use parametric models to efficiently represent the environment and store prior information about the scene.
- Several studies have been conducted to improve localisation techniques for larger scale environments using parametric models.
- This type of representation is heavily influenced by 3D modelling tools provided by the computer graphics community who are only interested in rendering visual pleasing scenes but not in representing the sensor data as best as possible.
3.2 Image-based models
- Acquired during a learning phase, are stored in a database and the sensor data remains in its raw un-modified form.
- The local information contained in the key-frames is then used to estimate the position of a camera navigating within the pre-mapped environment.
- Unlike the parametric methods, these maps are much simpler to acquire and they provide greater accuracy and robustness since the raw sensor data has not been approximated and a direct image error can be minimised.
- In the literature, the proposed quantitative localisation methods using key-frames all rely on featurebased techniques ((Royer et al., 2005; Courbon et al., 2008)).
3.3 Local dense representation: augmented visual sphere
- In the representation proposed in this paper, a local area in space is defined by an augmented spherical key-frame which captures the light field of all viewing directions from a particular point in 3D space along with its depth map.
- This allows to extend robust and accurate state-of-the-art dense full-image approaches (Comport et al., 2007) to its most general form (i.e. all viewing directions are densely modelled).
- This image is obtained from the custom camera system presented in Section 4.3 by warping multiple images onto the sphere.
- These points have been sampled uniformly on the sphere as in (Meilland et al., 2010).
- - ZS are the depths associated with each pixel which have been obtained from dense stereo matching.
3.4 Global topologic representation: omnidirectional key-frame graph
- The main advantages of this spherical representation are: .
- This enhances the performance of direct registration techniques (accuracy, robustness, convergence speed and domain of convergence). .
- The map can be accessed in constant time independently of the graph size.
- A spherical representation provides all local view directions and therefore provides a general repre- sentation that can encode data from different kinds of sensors like perspective cameras, multi-view cameras or omnidirectional cameras and laser range finders.
- Full-view sensors condition the observability of 3D motion well (Baker et al., 2001) which greatly improves robustness.
3.5 Global spherical key-frame positioning
- Considering S∗, an augmented sphere defined in Section 3.3, the objective is to compute the pose between a reference sphere S∗ and the next one S.
- The localisation problem is solved using a dense visual odometry approach which uses a direct 3D image registration technique that provides accurate pose estimation.
- Since this is a local optimisation approach it is assumed that the camera frame-rate is high (30Hz) and that inter-frame displacements are small.
- The direct minimisation procedure was given in Section 2.
- The main difference with respect to equation (4) being that the sphere is sampled, uniformly if possible, within polar spherical coordinates.
3.5.1 Spherical node selection
- Indeed, the vertices of the graph should be carefully placed in the world so as to represent the environment with little redundancy.
- In practice, the trajectory of the acquisition system along a sequence is computed by integrating elementary displacements estimated from successive spherical registration.
- Therefore a new reference sphere is placed according to the Median Absolute Deviation (MAD): λ < Median(|e−Median(e)|), (12) where e is the vector of intensities error between the current sphere and the previous set of reference spheres defined in (11) at the end of the minimisation.
3.5.2 Loop closure detection and drift compensation
- Since a dense spherical visual odometry technique was used, small amounts of drift are integrated along the sequence (typically 1% which could cause the final graph to become inconsistent or redundant (i.e. a route mapped twice in opposite directions).
- To correct the drift and remove redundant spheres, the spherical loop closing technique proposed by (Chapoulie et al., 2011) was used.
- This method is based on SIFT descriptors augmented by their distribution across the sphere which is represented by histograms.
- Middle image corresponds to the corrected trajectory taking into account loop closures.
- Bottom image details a place where loop closures are detected from perpendicular points of view.
3.6 Information selection
- The essence of direct approaches is to minimise pixel intensities directly between images.
- One of them is that if spheres are constructed from a panoramic multi-camera system at high resolution, the number of pixels to warp at each iteration of the real-time minimisation process could be extremely large.
- The aim being to select points which best condition the six degrees of freedom of the pose.
- The first 3 images (|J1|,|J2| et |J3|) show the translational part: close 3D measurements in the scene have a strong gradient.
- During the learning phase, J(x̃) is pre-computed for each reference image and the entire pixels are sorted according to (15).
4 Spherical sensors
- The spherical key-frame model presented in the previous section requires an adequate system for acquiring these types of measurements.
- As such, several classic omnidirectional and panoramic camera systems are analysed for this purpose and two new multi-view RGB-D camera system are presented.
4.1 Photometric sphere
- Capturing omnidirectional views of the environment can be achieved using either omnidirectional catadioptric cameras (Nayar, 1997) or with multiple perspective cameras.
- It, however, suffers from a low and non-uniform spatial resolution, and a limited vertical field of view (half-sphere,<90◦).
- This kind of sensor is therefore not efficient for outdoor mapping.
- Multi-view stitching techniques allow to build high resolution panoramic images by stitching and blending perspective images together (Szeliski, 2006).
- Those images can be acquired using several cameras (Baker et al., 2001), or from a sequence of images (Lovegrove and Davison, 2010).
4.2 Depth sphere acquisition
- Passive systems are capable of simultaneously capturing a dense depth-map from cameras alone without any additional system.
- Multi-view spherical systems have been also widely studied in the literature, however, it is generally assumed that all cameras view the same object and that the cameras are all converging to that point (Kutulakos and Seitz, 2000).
- Active systems require projecting laser or light patterns onto the environment to obtain dense depth measurements.
- Active RGB-D sensors such as Microsoft Kinect and Asus Xtion are actually popular in the robotics community.
- This kind of sensor is, however, sensitive to sun light and is therefore limited to indoor mapping.
4.3 Spherical acquisition system
- To acquire augmented visual spheres, for large-scale environment mapping, a first multi-camera system has been initially proposed in (Meilland et al., 2011a).
- This system will not be detailed here but the interested reader can refer to that article.
- This compact configuration allowed to build spherical photometric panoramas with a depth information associated to each pixel of the sphere.
- Dense matching is much more robust since the sensor is composed of three classic fronto-parallel stereo pairs back to back.
- In practice those effects are only visible for very close objects (eg. < 1m), which is not an issue when mapping outdoor environments.
5 Real-time localisation
- This section aims at presenting several techniques that have been used to improve the real-time performance of 6 dof pose estimation with respect to the previously described dense key-frame graph of augmented spherical images.
- First of all, it will be shown how the graph is used for real-time monocular pose estimation in an asymmetric manner (i.e. a single perspective image is compared to the closest augmented spherical image).
- Following that a multiple reference sphere localisation that provides a smooth localisation will be proposed along with a technique that improves robustness in dynamic environments.
- Finally an approach to first estimating only rotation before refining the full pose will be given that greatly improves convergence speed.
5.1.1 Online sphere selection
- It is assumed that during online navigation, a current image I, captured by a generic camera (e.g. monocular, stereo or omnidirectional) and an initial guess T̂G of the current camera position within the graph are available.
- Contrary to non-spherical approaches, a sphere provides all viewing directions and therefore it is not necessary to consider the rotational distance (to ensure image overlap).
- In particular this avoids choosing a reference sphere that has similar rotation but large translational difference which induces self occlusions of buildings and also differences in image resolution caused by distance (which affects direct registration methods).
5.1.2 Efficient minimisation
- The warping function w(·) transfers the current image intensities onto the reference sphere via both the perspective and spherical warping functions given in equations (26) and (29), using the depth information ZS of the reference sphere.
- The function s(·), introduced in Section 3.6, selects only informative pixels, with respect to a saliency map WS which is already pre-computed on the reference sphere and stored in the graph.
- This selection speeds up the tracking algorithm without neither degrading observability of 3D motion nor accuracy.
5.1.3 Smooth localisation
- During this localisation phase, the live current image is permanently registered onto the closest reference sphere which is retained for localisation until a new sphere is selected according to criterion (17).
- This can create discontinuities in the trajectory when a new reference sphere is chosen .
- Note that (8) can easily be expanded to more reference spheres, but it appears experimentally that two spheres are sufficient for smooth and robust localisation.
- Indeed using reference images that are too distant from the current view introduce additional occlusions which reduce the efficiency of the minimisation.
- Figure 9 compares a single node localisation (red) with a multi-node localisation (blue).
5.2 Efficient estimation of local 3D rotations
- Indeed, for a pure rotation, the pixel’s motion is independent of the scene geometry.
- For a vision based localisation algorithm, large rotations between successive frames is often prone to failure.
- Since those images are generated from successive Gaussian filters and sub-sampling, rotations are clearly dominant and translations can be neglected.
- This error can be efficiently minimised as described in Section 2.2, and allows to quickly recover large angular motions.
5.3 Dynamic environment mapping
- In practice outdoor environments have very challenging dynamic effects that up until now have been handled as outliers, however, when both extreme illumination variations and dynamic moving objects are evolving within the scene the robust estimator breaks down.
- It with the model I ∗ and with the last registered image Iwt−1.
- In this case the model-based error (eMB) criteria ensures that the pose estimation does not drift but is not able to handle illumination change while the visual odometry error (eV O) is invariant to the large illumination change (variation interframe at 45 Hz is considered negligible).
- The scene contains both global illumination changes and local illumination changes, as well as dynamic occlusions (e.g. moving cars, bicycle, pedestrians).
- The associated M-estimator weights DMB are rejecting most of the information.
5.4 Initialisation and tracking failure
- To handle initialisation and tracking failure, a simple technique is employed.
- The current camera pose with respect to the reference sphere is supposed to be close to the identity.
- Dense registration is then performed with all the spheres of the graph, using the smallest image size in the pyramid.
- Even if this initialisation step does not fit well with large-scale databases, it needs less than 30ms per reference sphere, which is 9 seconds for a 300 spheres graph.
6 Field experiments
- Several large scale field experiments involving each step of the proposed framework are presented.
- Firstly dense spherical environment mapping experiments are presented.
- Following this, realtime localisation and autonomous navigation experiments are detailed in the context of a national urban autonomous navigation challenge.
- Due to the overall complexity of the system and the breakup into a map learning phase and an online real-time navigation phase, it is suggested that the reader view the video associated with this paper before proceeding.
- The video illustrating these experiments can be found with the paper submission or alternatively at the following address.
6.1 Implementation
- A real-time implementation of the online localisation algorithm has been made in C++.
- Most parts of the code are SSE optimised (Steaming SIMD extensions), in order to take advantages of native vector instructions of modern processors.
- It involves computing a Gaussian pyramid of the current image, warping the current image, computing robust statistics (median, MAD) using histograms, and computing the weighted Gauss-Newton Hessian and gradient approximations (see Section 2.2).
- The pose increment is then obtained using Cholesky factorisation of the 6× 6 matrix, and the minimisation is iterated until convergence.
- Since the localisation algorithm is based on an iterative minimisation, the number of iterations can be directly adjusted with the desired camera frame-rate.
6.2 Storage cost and memory management
- For each sphere a 3 level multi-resolution pyramid is also pre-computed.
- A better memory compression could be achieved using a lossless image compression algorithm.
- Whilst the camera moves in the environment, a parallel CPU thread is used to stream out the farthest spheres and to stream in new spheres with respect to the current camera pose.
- This allows to navigate seamlessly in very large maps.
6.3.3 Asymmetric real-time localisation
- One of the main advantages of a spherical key-frame graph is that any type of online sensor can be used to estimate the relative pose with respect to this environment model.
- It can be seen that using a spherical image allows to localise the camera in all viewing directions.
- The algorithm was then validated on a subset graph of 12 spheres extracted from the results presented in section 6.3.1.
- The real-time 6 dof localisation method was also validated on large-scale stereo sequences captured in uncontrolled urban environments in the XII th district of Paris for the purpose of the French national ANR-CityVIP project.
- The path is locally different from the learnt trajectory (the car overtakes, the turns are taken much more widely, etc).
6.4 CityVIP autonomous driving challenge
- This section reports the autonomous navigation results obtained during the final challenge of the French ANR CityVIP project.
- This project was conducted by several leading French research teams from June 2008 to December 2011 and its aim was to develop autonomous transportation vehicles for urban environments.
- The goal of the challenge was to autonomously follow a reference trajectory over large scales in a completely uncontrolled environment.
- The entry presented here was based on a learnt environment map and trajectory acquired during a manual teaching phase.
- In a first part, the platform will be presented before introducing the control law that was used for trajectory following.
6.4.1 Platform
- The platform used for the experiments is an electric Cycab vehicle ) which can be controlled using longitudinal velocities and steering angle.
- A single Firewire camera, placed on the top of the vehicle, is used for online localisation.
- All the computations and control commands are performed on a Core i7 standard laptop, embedding the pre-computed graph of augmented spheres.
6.4.2 Control aspects
- For autonomous driving, the aim is to follow automatically a reference trajectory U generated locally around the learnt graph.
- In the proposed case, a virtual vehicle is followed and used to generate an error in translation and orientation that can be regulated using state feedback.
- Longitudinal velocity is controlled using a proportional feedback on the longitudinal error and steering angle depends on yaw and transversal errors.
- These errors are obtained by projecting the current vehicle position onto the closest reference trajectory’s edge ).
- The reference position is then selected by translating the projected position along the trajectory by a distance d.
6.4.3 Challenge preparation and testing
- Preliminary autonomous navigation results were conducted at INRIA Sophia Antipolis in order to test the entire autonomous navigation pipeline.
- The following result reports autonomous driving on a small trajectory of 100 meters, which was manually generated around a pre-learnt key-frame graph.
- The reference vehicle longitudinal velocity was set to 1.4m/s. Figure 18 plots the longitudinal and transversal errors with respect to the time.
- It can be seen that the vehicle was able to follow the reference trajectory, keeping a longitudinal error close to zero, whilst the transversal error is less than 25cm.
- Note that the peaks in transversal errors correspond to the 90 degrees turns, which can be avoided with a better setting of the control gains of equation (25).
6.4.4 Final CityVIP challenge
- The following section will present one of the many experiments conducted during the final CityVIP challenge that lasted one whole week at Place Jaude in the city centre of Clermont Ferrand, France.
- Over the course of one week, various trials were conducted in a large range of highly varying situations, allowing validation of the proposed approach.
- Figure 19 shows the desired trajectory in black, and the trajectory followed autonomously by the vehicle in red.
- As it can be seen on Figures (a),(b) and (c), the accuracy of the localisation method allows to navigate in narrowed corridors, whilst the M-estimator ensure robustness to occlusions like pedestrians.
- These effects can be seen on the red trajectory around the landmark (e).
6.4.5 Discussions on the experiments
- As mentioned in the previous section, visual navigation in large open-spaces is challenging since translations are not accurately estimated.
- It could be interesting to combine a wide angle camera for robustness with a long focal camera for accuracy.
- A second scenario has shown one limitation of the visual localisation approach.
- The authors can see that the strong gradients generated by the shadows are selected by the saliency selection algorithm.
Did you find this useful? Give us your feedback
Citations
37 citations
26 citations
23 citations
Cites background from "Dense Omnidirectional RGB-D Mapping..."
...[14], for example, distinguish between 3D parametric models and image-based key-frame models, but they do not cover point cloud-based models, which play an important role in urban modeling....
[...]
...A recently published and similar approach uses collections of spherical RGB-D panoramas of large-scale urban environments for autonomous navigation and real-time localization [14]....
[...]
...In outdoor urban environments, the requirement of dense and temporally-coherent RGB-D values with a sufficiently high spatial resolution can currently only be met by stereo or trinocular camera setups [6,7,14,20] and a subsequent depth extraction using dense image matching (see Section 6)....
[...]
...By considering all sensors of a panoramic head to have a unique center of projection [14], both approaches reduce the relative and absolute 3D measurement accuracy from a potential cm-level to the dm-level or even lower....
[...]
...These include: monoscopic, stereoscopic [7], panoramic [23] or RGB-D imagery [7,8,14], 3D point clouds [15] or combinations thereof [24]....
[...]
17 citations
Cites background or methods from "Dense Omnidirectional RGB-D Mapping..."
...However, [21] [4] use panorama or spherical views for feature storage and matching which requires a large computation load; [31] [28] require expensive high dimensional image data, such as 3D point maps and 6 degree-of-freedom (6DOF) camera inputs for training and prediction....
[...]
...There are existing methods from [21] [4] [28] [31] that also leverage camera images to locate the mobile agent....
[...]
15 citations
Cites background from "Dense Omnidirectional RGB-D Mapping..."
...Visual sensors such as mono-camera [6] and RGB-D camera [7] can obtain abundant information for robot localization....
[...]
References
46,906 citations
13,993 citations
"Dense Omnidirectional RGB-D Mapping..." refers background or methods in this paper
...…and Murray, 2002; Nistér et al., 2004; Mouragnon et al., 2006; Mei et al., 2010; Kitt et al., 2010; Tardif et al., 2010)) rely on an intermediary estimation process based on thresholding (Harris and Stephens, 1988; Lowe, 2004) before requiring matching between frames to recover camera motion....
[...]
...The learning phase also allows to estimate the transformations between the key-frames and store their position in global world coordinates....
[...]
12,944 citations
8,432 citations
"Dense Omnidirectional RGB-D Mapping..." refers background in this paper
...Classically direct approaches have focused on region-of-interest tracking whether they be modelled by Affine (Hager and Belhumeur, 1998), planar (Lucas and Kanade, 1981; Irani and Anandan, 1998; Irani and Anandan, 2000; Baker and Matthews, 2001; Malis, 2004; Dame and Marchand, 2010), or multiple-plane tracking (Mei et al., 2006; Silveira et al., 2008; Shi and Tomasi, 1994; Caron et al., 2011)....
[...]
...…by Affine (Hager and Belhumeur, 1998), planar (Lucas and Kanade, 1981; Irani and Anandan, 1998; Irani and Anandan, 2000; Baker and Matthews, 2001; Malis, 2004; Dame and Marchand, 2010), or multiple-plane tracking (Mei et al., 2006; Silveira et al., 2008; Shi and Tomasi, 1994; Caron et al., 2011)....
[...]
6,425 citations
"Dense Omnidirectional RGB-D Mapping..." refers background in this paper
...In a similar way, simultaneous localisation and mapping approaches (Montemerlo et al., 2002; Davison and Murray, 2002; Thrun, 2002; Durrant-Whyte and Bailey, 2006; Klein and Murray, 2007) allow both localisation and cartography, giving alternative and promising solutions....
[...]
...In a similar way, simultaneous localization and mapping (SLAM) approaches (Davison & Murray, 2002; Durrant-Whyte & Bailey, 2006; Klein & Murray, 2007; Montemerlo, Thrun, Koller, & Wegbreit, 2002; Thrun, 2002) allow both localization and cartography, giving alternative and promising solutions....
[...]
Related Papers (5)
Frequently Asked Questions (16)
Q2. What future works have the authors mentioned in the paper "Dense omnidirectional rgb-d mapping of large scale outdoor environments for real-time localisation and autonomous navigation" ?
Future work will aim at improving the database construction, by geo-referencing the spherical graph in a GIS ( Geographic Information System ), which may contain higher level information such as free space. This geolocation will allow to use advanced path planning algorithms. To further improve the autonomous navigation solution, it should be interesting to fuse vision based localisation, with GPS signal and inertial measurements. Another important feature that will be studied is life long mapping, since the environment is not static it is necessary to update the database with new geometric and photometric information gathered during the online phase.
Q3. What is the way to recover the position of the spheres of the graph?
To accurately recover the position of the spheres of the graph with respect to one-another, a 6 degrees of freedom spherical localisation model is used based on an accurate dense localisation (Comport et al., 2010)(Meilland et al., 2010).
Q4. How do you obtain realistic computer graphics models?
Realistic computer graphics models are obtained by texture mapping onto 3D planar façades which have been extracted from the environment.
Q5. What is the way to reduce the number of iterations?
Since the localisation algorithm is based on an iterative minimisation, the number of iterations can be directly adjusted with the desired camera frame-rate.
Q6. Why is it suggested that the reader view the video associated with this paper before proceeding?
Due to the overall complexity of the system and the breakup into a map learning phase and an online real-time navigation phase, it is suggested that the reader view the video associated with this paper before proceeding.
Q7. How many iterations is used for each level of the pyramid?
The 6 dof localisation is performed using a 3 levels image pyramid with respectively 3, 5, 8 iterations for each level of the pyramid (from coarse to fine).
Q8. How does the MAD increase when the environment is closer to the camera?
When the environment is far from the camera viewpoint (0 < ∆ < 8 and 16 < ∆ < 25), the MAD increases slowly, due to very small changes between the images.
Q9. How many pixels are used in the localisation algorithm?
Using a small fraction of salient pixels (i.e. 60k pixels ), the localisation algorithm runs flawlessly at 45 Hz with an input image of dimensions 800×600 pixels.
Q10. What is the main drawback of global bundle adjustment?
in large scale mapping approaches, the connectivity between the various camera poses is relatively sparse meaning that performing global bundle adjustment is not efficient.
Q11. What is the proposed representation of the 3D world?
In particular, the proposed representation is composed of a graph of augmented visual spheres containing omnidirectional RGB-D information (colour and depth) for many spatial locations.
Q12. What is the effect of a high camera frame rate on the localisation algorithm?
As it was highlighted in (Handa et al., 2012), a high camera frame-rate reduces the inter-frame camera motion, allowing to perform less iterations until convergence.
Q13. How is the trajectory of the acquisition system calculated?
In practice, the trajectory of the acquisition system along a sequence is computed by integrating elementary displacements estimated from successive spherical registration.
Q14. What is the procedure to build an augmented sphere with this system?
The procedure to build an augmented sphere with this system can be summarized as:1. Calibration of the extrinsic and intrinsic parameters of each camera with a checker-board, via bundleadjustment: this calibration step is only performed once since the system is rigid.
Q15. How should the vertices of the graph be placed?
the vertices of the graph should be carefully placed in the world so as to represent the environment with little redundancy.
Q16. What is the way to estimate the 3D rotation of the camera?
To ensure a better initialisation, a direct registration technique is employed to estimate the local inter-frame 3D rotation of the camera as in (Mei et al., 2010; Lovegrove and Davison, 2010; Newcombe et al., 2011b).