Evaluating the Capability of OpenStreetMap for Estimating Vehicle Localization Error
Summary (5 min read)
Introduction
- Recent studies suggest that when using map-based localization methods, the representation and layout of real-world phenomena within the prebuilt map is a source of error.
- To date, the investigations have been limited to 3D point clouds and normal distribution (ND) maps.
- This paper explores the potential of using OpenStreetMap (OSM) as a proxy to estimate vehicle localization error.
- Specifically, the experiment uses random forest regression to estimate mean 3D localization error from map matching using LiDAR scans and.
ND maps. Six map evaluation factors were defined for 2D geographic information in a vector format. Initial results for a
- 2 km path in Shinjuku, Tokyo, show that vehicle localization error can be estimated with 56.3% model prediction accuracy with two existing OSM data layers only.
- Within a tunnel or urban canyon environment, even with a locally and globally accurate map, the lack of longitudinal features in the map can cause localization error in the moving direction.
- Secondly, the large size of the point cloud data can make it challenging to manage.
- The resulting work can enable the estimation of localization error without requiring the collection of data to create a prebuilt map.
II. BACKGROUND
- While Global Navigation Satellite Systems can provide a position with meter-level accuracy, this is insufficient for the application of autonomous driving.
- Subsequently, the LiDAR scan from an autonomous vehicle can then be used to match against the prebuilt map to obtain a position.
- There is yet to be an international standard for required accuracy.
- As guidance, the Japanese government’s Cross-ministerial Strategic Innovation Promotion (SIP) Program recommends an accuracy of less than 25 cm.
B. Sources of localization error for LiDAR map matching
- Sources of localization error for map matching using LiDAR can be divided broadly into four categories: 1) Input scan; 2) Matching algorithm; 3) Dynamic phenomena, and; 4) Prebuilt map.
- The lower-end Velodyne VLP-16 has 16 laser transmitting and receiving sensors with a range of up to 100m.
- Errors can also be introduced in the post-processing where the input scan is required to be downsampled for the matching algorithm.
- During the localization phase, some phenomena or features may have moved or shifted.
- In these cases, it is the physical attributes of the phenomena and its representation on the map which is the source of localization error.
C. Quantifying the sources of map-derived errors
- To quantify the sources of map-derived errors, Javanmardi et al. [4] defined four criteria to evaluate a map’s capability for vehicle localization: 1) feature sufficiency; 2) layout; 3) local similarity, and; 4) representation quality.
- An insufficient number of features in the vicinity (such as found in open rural areas) may result in lower localization accuracy.
- Even if there are lots of high-quality features nearby, if they are all concentrated in one direction, the quality of the matching degrades.
- The closer the map is to reality, the more accurate the prediction.
- A flat wall can be highly abstracted as a single line but still have high representation quality.
D. OpenStreetMap
- First introduced in 2004, OpenStreetMap is an open source collaborative project providing user-generated street maps of the world.
- In the past, commercial and governmental mapping organizations have also donated data towards the project.
- There is a set of commonly used tags for primary features which operate as an ‘informal standard’ [9].
- While fire departments, police stations and post offices are well represented, temples and shrines are underrepresented.
- In fact, in certain areas, OSM is more complete and more accurate (for both location and semantic information) than corresponding proprietary datasets [12], [13].
A. Study area
- The study area is Shinjuku, Tokyo, Japan.
- The architecture is relatively heterogeneous, with buildings ranging from lowrise shops to multi-story structures.
- Similarly, roads vary from single narrow lanes to wide multi-lane carriageways.
B. Mapping data
- Mapping data from OSM was extracted from the geofabrik.de server.
- Note that each tag can be represented by a maximum of three layers, one for each geometry primitive (point, line, polygon).
- On wide multi-lane roads or within the rural environment, buildings may not be present or visible by the LiDAR scanner.
- Linear features, such as guard rails and traffic barriers, are classed as ‘barrier_line’.
- By creating these ‘completed’ layers, it allowed the assessment of the capability of OSM, in a scenario where users had correctly and fully mapped all required features.
C. Localization error data
- For localization error, data from a previous experiment was used.
- From this point cloud map, an ND map was generated with a 2.0 m grid size.
- Next, during the localization phase, a second scan was obtained with the laser scan range limited to 20 m.
- Thirdly, to evaluate the map factors, sample points along the experiment path at 1.0m intervals were selected.
- For each sample point, mean 3D localization error was calculated by averaging the localization error from 441 initial guesses at different positions evenly distributed around the sample point, at 0.2 m intervals within a range of 2 m.
D. Formulating map evaluation factors
- Javanmardi et al. [4] proposed ten map evaluation factors to quantify the capability of ND map for vehicle selflocalization.
- Unlike the four map evaluation criteria, the map factors were designed specifically to evaluate 3D ND maps and cannot be easily transferred to another mapping format without modification.
- Local similarity and representation quality were not considered in this study, due to the high level of abstraction of OSM and the lack of a higher quality dataset available for comparison.
- To model the behavior of the laser scanner and localization process, two auxiliary layers were produced: view lines and view polygons.
E. Factors for feature sufficiency criterion
- Unlike 3D ND map formats, there are three geometry primitives for 2D vector mapping: points, lines, and polygons.
- To account for the differences between the geometric representations, feature count was calculated based on the intersection points between the layer and the view lines or view polygons.
- At these end points of the view lines, it is assumed that the sensor cannot ‘see’ any further past the opaque building walls.
- To remedy this, points were generated along the intersection with view lines at 10 cm intervals i.e. the portion of the view line which overlaps with any polygon barrier were converted into a series of points.
- This, however, does not affect the factors, as each layer is considered separately during the calculation.
1) Angular dispersion of features
- Angular dispersion is a measure of the arrangement of features around the sample point.
- For this factor, the dispersion of the intersection points between the view lines or view polygons and the respective layer was calculated.
2) View line mean length
- For each sample point, the mean length for every view line which intersected with the building layer was calculated.
- Nonintersecting view lines were not considered as they would skew the metric.
3) Area of view polygon
- It represents the theoretical area that the laser scanner can ‘see’.
- A small area suggests that there are lots of building features in the vicinity to localize against.
4) Compactness of view polygon
- A compact field of view could suggest that there are a lot of building features nearby to localize against.
- To measure compactness, the ratio between the area of the view polygon and the area of a circle with the same radius was calculated.
5) The variance of building face direction
- Javanmardi et al.’s [4] suggest that if the features in the local vicinity face a greater variety of directions, then the positioning uncertainty decreases.
- To measure this, at all points of intersection between the view lines and the building layer, the normal of the face was calculated.
- Subsequently, for each sample point, the variance of all the normal vectors was calculated.
G. Random forest regression
- To obtain a prediction of localization error, random forest regression was used.
- It uses multiple decision trees and bootstrap aggregation to form a prediction.
- Random forest regression aims to reduce variance in predictions while maintaining a low bias, thus controlling overfitting.
- For the implementation, scikit-learn (a machine learning Python library) was used.
A. General results
- Six models were evaluated, using different combinations of data layers from OSM.
- Model 1 used just the buildings layer directly from OSM.
- Model 3 included the completed polygon barriers.
- To evaluate the goodness of fit, the R-squared (R2), root mean square error (RMSE), mean absolute error (MAE), and accuracy are presented.
- The models and their prediction errors are presented in TABLE IV.
B. Factor importance
- One of the advantages of random forest regression is the ability to evaluate the importance of different variables used for prediction.
- TABLE V. shows an overview of the importance of each factor and data layer, with the two most important variables in bold.
- Note that for models 2 to 5, the feature count of polygon barriers is the first or second most important factor for predicting localization error (ranging from 0.18 to 0.38).
- With the inclusion of point barriers data in models 5 and 6, the angular dispersion factor has high importance (0.20 and 0.22).
- Aside from model 1, the importance of factors based on view line and view polygon geometry is below 0.1.
A. The capability of OpenStreetMap for estimating vehicle localization error
- Show that using both the buildings and polygon barriers layer from OSM, the model can achieve 56.5% prediction accuracy.
- By improving the completeness of the polygon barriers, the model prediction accuracy is then increased to 62.3%.
- The work described in this paper show that it is possible to estimate vehicle localization error with reasonable success.
- Further work is required to devise a more appropriate accuracy metric which better describes a model’s ability to detect these peaks in localization error, and thus its suitability for the autonomous driving application.
B. Comparison with other studies
- It is important to bear in mind the difference in methodology – while [5] uses principal component regression, this study uses random forest regression.
- Further work is therefore required to confirm if OpenStreetMap can truly achieve comparable results with existing approaches.
C. The importance of different factors and data layers
- Of the factors, feature count is the most important for all six models.
- There are several possible explanations for this result.
- Firstly, the middle half of the experiment route is populated by many polygon barriers .
- In addition, in this central section of the experiment path, the combination of wide multi-lane roads, restricted 20m scanner range, and building setback means that buildings cannot be necessarily ‘seen’ by the LiDAR sensor.
- In other areas where there are no point or polygon barriers, other data layers such as buildings and line barriers could become more important.
D. The impact of data quality of OSM
- As discussed in Section II.D, OpenStreetMap suffers from data quality issues such as inconsistency and incompleteness.
- This is encouraging and could suggest that as the quality of OSM improves over time, its capability to estimate localization error also increases.
- From a data handling and computation perspective, it is much ‘lighter’.
- This means feature scale phenomena such as local similarity cannot be easily evaluated.
- Beyond OpenStreetMap, the same factors could be used with better quality 2D GI.
F. Generalizability of the model
- The model is currently only trained on data from a relatively short path, resulting in relatively low generalizability.
- The addition of more data from multiple paths should reduce any random patterns that may previously have appeared predictive.
- This would, in turn, enable the use of OSM to predict localization error beyond the modeled area.
- Coupled with the wide coverage of OSM, it could theoretically be possible to predict localization error for large areas, anywhere in the world, without collecting data to create a prebuilt map as the mapping data is readily available.
G. Alternatives to random forest regression
- Random forest regression is only one machine learning approach for prediction.
- In some cases, alternative approaches such as support vector machine or gradient boosting machines could outperform random forest regression and provide a more generalizable model.
- These approaches, however, also require additional manual hyperparameter tuning.
H. Source of localization error
- As described in Section II.B, there are many sources of localization error for LiDAR map matching.
- The precise sources of localization error, however, remain difficult to ascertain.
- Within the experiment, multiple measures were taken to ensure the reduction of non-map errors, so that any remaining localization error evaluated was directly related to the map as much as possible.
- Alternatively, artificial objects could be installed in the environment to improve map matching performance.
- Furthermore, knowing the specific location of areas of high localization error could inform semi-autonomous vehicle systems of where to disengage autopilot and to prepare for human driver takeover in a timely manner.
Did you find this useful? Give us your feedback
Citations
4 citations
References
615 citations
"Evaluating the Capability of OpenSt..." refers methods in this paper
...To improve map-based localization accuracy, research so far has focused on refining the map matching algorithms [2] as well as producing increasingly accurate High Definition maps [3]....
[...]
313 citations
"Evaluating the Capability of OpenSt..." refers background in this paper
...In general, Normal Distribution Transform (NDT) is more resilient to sensor noise and sampling resolution (than other algorithms such as Iterative Closest Point) but suffers from slower convergence speed [8]....
[...]
313 citations
"Evaluating the Capability of OpenSt..." refers background in this paper
...In fact, in certain areas, OSM is more complete and more accurate (for both location and semantic information) than corresponding proprietary datasets [12], [13]....
[...]
259 citations
"Evaluating the Capability of OpenSt..." refers background in this paper
...example, Barrington-Leigh and Millard-Ball [10] estimate the global OSM street layer to be ∼83% complete, with more than 40% of countries having fully mapped networks....
[...]
...For example, Barrington-Leigh and Millard-Ball [10] estimate the global OSM street layer to be ∼83% complete, with more than 40% of countries having fully mapped networks....
[...]
84 citations
"Evaluating the Capability of OpenSt..." refers methods in this paper
...For both studies [4], [6], the estimation of localization error used 3D point cloud and ND maps....
[...]
...Further, the estimated error model can be used in conjunction with other sensor information to improve the accuracy of localization [6]....
[...]
...[6] estimate the error of 3D NDT scan matching in a pre-experiment, which was then subsequently used in the localization phase for pose correction....
[...]
Related Papers (5)
Frequently Asked Questions (2)
Q2. What have the authors stated for future works in "Evaluating the capability of openstreetmap for estimating vehicle localization error" ?
Future work is required in adding more data layers from OSM, as well as other layers which may not currently be available, such as curb information, road markings, and landmarks.