An Automated Method for Large-Scale, Ground-Based City Model Acquisition
Summary (4 min read)
1. Introduction
- Three-dimensional models of urban environments are used in applications such as urban planning, virtual reality, and propagation simulation of radio waves for the cell phone industry.
- The resulting resolution of the models is only in the meter range, and without manual intervention, the resulting accuracy is often poor.
- More specifically, it is necessary to determine the pose of successive laser scans and camera images in a global coordinate system with centimeter and sub-degree accuracy in order to reconstruct a consistent model.
- What makes their problem different from indoor localization is the scale of the environment, because distances involved in making 3D models for cities are large compared to the range of the laser scans.
2. System Overview
- The data acquisition system is mounted on a truck and consists of two parts: a sensor module and a processing unit (Früh and Zakhor, 2001a).
- The processing unit consists of a dual processor PC, large hard disk drives, and additional electronics for power supply and signal shaping; the sensor module consists of two SICK 2D laser scanners, a digital camera, and a heading sensor.
- Both 2D scanners face the same side of the street.
- Figure 1 shows the experimental setup for their data acquisition.
- Figure 2 shows a picture of the truck with rack and equipment.
3. Relative Position Estimation and Path Computation
- The authors compute relative pose estimates and an initial path by matching successive horizontal laser scans.
- First, the authors use both lines and single points of the reference scan for matching; second, they do not treat the problem of eliminating erroneous correspondences separately from the matching; rather, they consider it directly in the computation of a match quality function by using robust least squares, as will be seen later.
- Thus, only the ‘good’ scan points contribute to this quality function, and the authors do not have to eliminate outliers prior to the matching.
- Even small errors in the relative estimates, especially inaccurate angles, accumulate to significant global pose errors over long driving periods.
4. Global Maps from Aerial Images or DSM
- The authors derive a global edge map either from an aerial photo or from a DSM, and define a congruence coefficient as measure for the match between groundbased scans and global edge map.
- The basic idea behind their position correction is that objects seen from the road-based data acquisition must in principle also be visible in the aerial photos or the DSM.
- Making the assumption that the position of building facades and building footprints are identical or at least sufficiently similar, one can expect that the shapes of the horizontal laser scans match edges in the aerial image or the DSM.
- Essentially, the authors use the airborne edge maps as an occupancy grid for global localization, as it has been done for mobile robots with floor occupancy grids in indoor environments (Konolige and Chou, 1999; Thrun, 2001).
4.1. Edge Maps from Aerial Photos
- While perspective corrected photos with a 1-meter resolution are readily available from USGS, the authors choose to use a higher contrast aerial photograph obtained by Vexcel Corporation, CO, with 1-foot resolution.
- (2) The photos and the scans were not taken at the same time, so the content can potentially be different.
- In particular dynamic objects such as cars or buses can cause mismatches.
- (3) Visible in the image are not only the building edges, but also many non-3D edges such as road stripes or crosswalk borders.
- Especially problematic are shadows, because they result in very strong edges.
4.2. Edge Map from DSM
- In their case, a DSM with a one-meter resolution was created using airborne laser scans acquired by Airborne 1 Corp., CA.
- Using a DSM as a source of a global edge map has several advantages over aerial images:.
- Furthermore, the intensity of an edge is not dependent on the altitude of the building; what matters is whether a discontinuity exceeds the threshold zedge.
- Nevertheless, it is not advisable to directly use the z value at a DSM location, since the airborne laser captures overhanging trees and cars on the road at the time of the data acquisition, resulting in z-values of up to several meters above the actual street level for some locations.
- Thus, the authors need to create a smooth, dense Digital Terrain Map (DTM) that contains only the altitude of the street level, and they do so in the following manner: Starting with a blank DTM, they first copy all available ground pixels into the DTM.
4.3. Congruence Coefficient
- Note that in the edge map created from aerial images, I (x, y) is obtained with a Sobel edge detector and thus proportional to the intensity discontinuity in the image.
- While this is in accordance with the observation that depth discontinuities often result in sharp intensity discontinuities, it is important that no thresholding is applied to the edge image, so as to also utilize depth discontinuities that are less visible in the images.
- One can regard the ensemble of laser scan points in the local coordinate system as a second edge image; from this point of view, Eq. (7) essentially computes the correlation between the two edge images as a function of translation and rotation.
- For the same scan, different global position parameters (x, y, θ ) yield different coefficients c(x, y, θ ); the largest coefficient is obtained for the parameter set with the best match between scan and edge map.
- Figure 10 shows an example for a congruence coefficient for two different pose parameters.
5. Global Correction Based on Monte-Carlo-Localization
- The authors propose MCL as a robust way to obtain globally correct pose estimates for the acquisition vehicle by using the edge map and the horizontal laser scans.
- A motion phase and a perception phase are performed iteratively, both modeled as a stochastic process.
- These particles are propagated over time using a combination of sequential importance sampling and resampling steps, in short referred to as samplingimportance-resampling.
- Since orientation angle errors have already been corrected, this new path has considerably more accurate x and y values than the initial scan matching path.
- If the authors use a DSM as a global reference, one can extend the path computation to the 6 DOF necessary in hill areas: Utilizing the additional altitude information the DTM provides, altitude and pitch can be estimated in a simple manner:.
6. 3D Model Generation
- Once the pose of the vehicle and thus the laser scanners is known, the generation of a 3D point cloud is straightforward.
- The authors calculate the 3D coordinates of the vertical scan points by applying a coordinate transformation from the local to the world coordinate system.
- The structure of the resulting point cloud is given by scan number and angle, and therefore each vertex has defined neighbors, thus facilitating further processing significantly.
- The authors calibrate the camera before their measurements and determine the transformation between its coordinate system and the laser coordinate system.
- Since these facade models have been brought into perfect registration with either aerial photo or DSM, they can eventually be merged with models derived from this same airborne data.
7. Results
- The ground-based data was acquired during a 37- minute-drive in Berkeley, California, for which the speed was only limited by the normal traffic conditions during business hours and the speed limit of 25 mph imposed by the city of Berkeley.
- The authors have applied scan matching and initial path computation to the entire driven path.
- Since superimposing a digital roadmap revealed that the photo can only be considered a metric map in the rather flat part within the dashed rectangle, the authors can only correct the 6.7 km long path segment in that area.
- The authors have applied the MCL correction with different number of particles, and found that in areas with clear building structures, it is possible to track the path with 5,000 to 10,000 particles.
- Furthermore, as seen in Fig. 18 in a close-up view, scan points for the same area align with each other even if they are taken during two different passes.
8. Conclusions
- The authors have proposed a method for acquiring groundbased 3D building facade models, which uses acquisition vehicle equipped with two 2D laser scanners.
- The authors have demonstrated that scan matching and MCL techniques in conjunction with an aerial photo or a DSM as global map are capable of accurately localizing their acquisition vehicle in complex urban environments.
- Furthermore, with their truck-based system, the authors are only capable of driving on roads; hence the facades on the backsides of buildings cannot be captured at all.
- Finally, the reconstructed raw models are visually not perfect.
- Foreground objects appear cluttered and visually not pleasing since only their front side is captured, and facades contain large holes due to occlusions or reflecting glass surfaces.
Did you find this useful? Give us your feedback
Citations
1,454 citations
Cites methods from "An Automated Method for Large-Scale..."
...City-scale 3D reconstruction has been explored previously in the computer vision literature [12, 2, 6 , 21] and is now widely deployed e.g., in Google Earth and Microsoft’s Virtual Earth....
[...]
1,307 citations
Additional excerpts
...edu City-scale 3D reconstruction has been explored previously in the computer vision literature [12, 2, 6, 21] and is now widely deployed e....
[...]
846 citations
445 citations
381 citations
Cites background from "An Automated Method for Large-Scale..."
...[5] C. Früh and A. Zakhor....
[...]
...Früh and Zakhor [5] apply a similar idea to the problem of learning large-scale models of outdoor environments....
[...]
...Früh and Zakhor [5] apply a similar idea to the problem of learning large-scale models of outdoor environments....
[...]
References
17,598 citations
3,521 citations
2,159 citations
"An Automated Method for Large-Scale..." refers background in this paper
...Debevec et al. (1996) proposes to reconstruct buildings based on few camera images in a semiautomated way....
[...]
1,945 citations
"An Automated Method for Large-Scale..." refers background in this paper
...In MCL, also known as particle filtering or as condensation algorithm, a large number of random samples, or particles, is utilized to represent probability distributions (Fox et al., 2000; Thrun et al., 2001)....
[...]
1,452 citations
"An Automated Method for Large-Scale..." refers background in this paper
...Lu and Milios (1997b), Thrun et al. (1998b) and Gutmann and Konolige (1999) have investigated simultaneous map building and localization in indoor environments by establishing cross-consistency over multiple 2D laser scans, without the use of a global map....
[...]
...In principle, it is possible to extend the consistent pose estimation idea of Lu and Milios (1997b) by the additional constraint that the resulting global pose must be within the range of Sk ....
[...]
Related Papers (5)
Frequently Asked Questions (17)
Q2. What future works have the authors mentioned in the paper "An automated method for large-scale, ground-based city model acquisition" ?
If desired, it is straightforward to extend their approach to rural areas by including GPS information during the MCL perception phase. Furthermore, with their truck-based system, the authors are only capable of driving on roads ; hence the facades on the backsides of buildings can not be captured at all. For downtown areas which can be separated into a facade and a foreground layer, the authors have proposed processing algorithms for removing cluttered foreground objects and completing occlusion holes in order to obtain visually pleasing facade models ( Früh and Zakhor, 2002 ).
Q3. What is the advantage of using an aerial photo or a DSM over GPS?
Another advantage of using an aerial photo or a DSM over GPS is that the airborne data can potentially be used to derive 3D models of a city from abird’s eye view, which can then be merged with the 3D facade models obtained from ground level laser scans.
Q4. What are some of the applications of 3D models of urban environments?
Three-dimensional models of urban environments are used in applications such as urban planning, virtual reality, and propagation simulation of radio waves for the cell phone industry.
Q5. What is the way to determine the quality of alignment?
Taking one scan as reference scan, the authors maximize Q = f ( u, v, ϕ), which computes the quality of alignment as a function of a given displacement u, v and rotation ϕ of the scans against each other.
Q6. How are the camera and laser scanners synchronized?
camera and laser scanners are synchronized by trigger signals and are mounted in a rigid configuration on the sensor platform.
Q7. What is the simplest way to reconstruct the driven 2D path?
1.To reconstruct the driven 2D path, the authors start with an initial position (x0, y0, θ0), perform a scan match for each step k, and concatenate the steps ( uk, vk, ϕk) to form a path.
Q8. What is the advantage of using a photo or a DSM as a global map?
Since both photos and DSM can provide a geometrically correct view of an entire city, it is conceivable to use them as a global map in order to arrive at global position without use of GPS devices.
Q9. How can the authors use the MCL technique to localize buildings?
The authors have demonstrated that scan matching and MCL techniques in conjunction with an aerial photo or a DSM as global map are capable of accurately localizing their acquisition vehicle in complex urban environments.
Q10. How can the authors determine the relative pose of a truck?
1.Since horizontal scans are taken continuously during driving and hence overlap substantially, the relative pose between the two capture positions can be determined by matching their corresponding laser scans, as shown in Fig.
Q11. Why are the facades on the backside of buildings not captured?
with their truck-based system, the authors are only capable of driving on roads; hence the facades on the backsides of buildings cannot be captured at all.
Q12. What is the way to limit the position of the particles?
Especially for the edge map from the aerial photo, it is reasonable to use the registered digital roadmap in order to restrict positions of the particles to within a fewmeter-wide strip around roads.
Q13. how do the authors suppress erroneous point-to-line correspondences?
in order to suppress erroneous point-to-line correspondences, the authors use robust least squares (Triggs et al., 2000) and compute Q as follows:Q( u, v, ϕ)= ∑jexp ( −dmin( p ′ j ( u, v, ϕ)) 22 · σ 2s) (3)where σ 2s is the variance of the laser distance measurement, specified by the manufacturer.
Q14. What is the significance factor of each particle in set Sk?
As such, the importance factor of each particle is used in the resampling phase to compute the set Sk+1 from set Sk in the following way: A given particle in set Sk is passed along to set Sk+1 with probability proportional to its importance factor.
Q15. Why is it not necessary to impose a constraint of cross consistency across all scan points?
Due to the fact that the authors have used the same global map for correcting all parts of the path, it is not necessary to explicitly impose the constraint of cross consistency across all scan points; this justifies their computationally simple approach of correcting the relative path estimates with the particle sets Sk .
Q16. How do the authors find the pose of a truck?
The authors will address the first issue by linear interpolation between scan points, and the two others by utilizing robust least squares as an outlier-tolerant way to find the pose with the smallest possible discrepancy.
Q17. What is the correlation coefficient for the same scan?
For the same scan, different global position parameters (x, y, θ ) yield different coefficients c(x, y, θ ); the largest coefficient is obtained for the parameter set with the best match between scan and edge map.