Journal Article•DOI•

An Automated Method for Large-Scale, Ground-Based City Model Acquisition

Christian Früh¹, Avideh Zakhor¹•Institutions (1)

01 Oct 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 1, pp 5-24

TL;DR: An automated method for fast, ground-based acquisition of large-scale 3D city models by utilizing an aerial photograph or a Digital Surface Model as a global map, to which the ground- based horizontal laser scans are matched.

read less

Abstract: In this paper, we describe an automated method for fast, ground-based acquisition of large-scale 3D city models. Our experimental set up consists of a truck equipped with one camera and two fast, inexpensive 2D laser scanners, being driven on city streets under normal traffic conditions. One scanner is mounted vertically to capture building facades, and the other one is mounted horizontally. Successive horizontal scans are matched with each other in order to determine an estimate of the vehicle's motion, and relative motion estimates are concatenated to form an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view, this initial path is globally corrected by Monte-Carlo Localization techniques. Specifically, the final global pose is obtained by utilizing an aerial photograph or a Digital Surface Model as a global map, to which the ground-based horizontal laser scans are matched. A fairly accurate, textured 3D cof the downtown Berkeley area has been acquired in a matter of minutes, limited only by traffic conditions during the data acquisition phase. Subsequent automated processing time to accurately localize the acquisition vehicle is 235 minutes for a 37 minutes or 10.2 km drive, i.e. 23 minutes per kilometer.

...read moreread less

Summary (4 min read)

Jump to: [1. Introduction] – [2. System Overview] – [3. Relative Position Estimation and Path Computation] – [4. Global Maps from Aerial Images or DSM] – [4.1. Edge Maps from Aerial Photos] – [4.2. Edge Map from DSM] – [4.3. Congruence Coefficient] – [5. Global Correction Based on Monte-Carlo-Localization] – [6. 3D Model Generation] – [7. Results] and [8. Conclusions]

1. Introduction

Three-dimensional models of urban environments are used in applications such as urban planning, virtual reality, and propagation simulation of radio waves for the cell phone industry.
The resulting resolution of the models is only in the meter range, and without manual intervention, the resulting accuracy is often poor.
More specifically, it is necessary to determine the pose of successive laser scans and camera images in a global coordinate system with centimeter and sub-degree accuracy in order to reconstruct a consistent model.
What makes their problem different from indoor localization is the scale of the environment, because distances involved in making 3D models for cities are large compared to the range of the laser scans.

2. System Overview

The data acquisition system is mounted on a truck and consists of two parts: a sensor module and a processing unit (Früh and Zakhor, 2001a).
The processing unit consists of a dual processor PC, large hard disk drives, and additional electronics for power supply and signal shaping; the sensor module consists of two SICK 2D laser scanners, a digital camera, and a heading sensor.
Both 2D scanners face the same side of the street.
Figure 1 shows the experimental setup for their data acquisition.
Figure 2 shows a picture of the truck with rack and equipment.

3. Relative Position Estimation and Path Computation

The authors compute relative pose estimates and an initial path by matching successive horizontal laser scans.
First, the authors use both lines and single points of the reference scan for matching; second, they do not treat the problem of eliminating erroneous correspondences separately from the matching; rather, they consider it directly in the computation of a match quality function by using robust least squares, as will be seen later.
Thus, only the ‘good’ scan points contribute to this quality function, and the authors do not have to eliminate outliers prior to the matching.
Even small errors in the relative estimates, especially inaccurate angles, accumulate to significant global pose errors over long driving periods.

4. Global Maps from Aerial Images or DSM

The authors derive a global edge map either from an aerial photo or from a DSM, and define a congruence coefficient as measure for the match between groundbased scans and global edge map.
The basic idea behind their position correction is that objects seen from the road-based data acquisition must in principle also be visible in the aerial photos or the DSM.
Making the assumption that the position of building facades and building footprints are identical or at least sufficiently similar, one can expect that the shapes of the horizontal laser scans match edges in the aerial image or the DSM.
Essentially, the authors use the airborne edge maps as an occupancy grid for global localization, as it has been done for mobile robots with floor occupancy grids in indoor environments (Konolige and Chou, 1999; Thrun, 2001).

4.1. Edge Maps from Aerial Photos

While perspective corrected photos with a 1-meter resolution are readily available from USGS, the authors choose to use a higher contrast aerial photograph obtained by Vexcel Corporation, CO, with 1-foot resolution.
(2) The photos and the scans were not taken at the same time, so the content can potentially be different.
In particular dynamic objects such as cars or buses can cause mismatches.
(3) Visible in the image are not only the building edges, but also many non-3D edges such as road stripes or crosswalk borders.
Especially problematic are shadows, because they result in very strong edges.

4.2. Edge Map from DSM

In their case, a DSM with a one-meter resolution was created using airborne laser scans acquired by Airborne 1 Corp., CA.
Using a DSM as a source of a global edge map has several advantages over aerial images:.
Furthermore, the intensity of an edge is not dependent on the altitude of the building; what matters is whether a discontinuity exceeds the threshold zedge.
Nevertheless, it is not advisable to directly use the z value at a DSM location, since the airborne laser captures overhanging trees and cars on the road at the time of the data acquisition, resulting in z-values of up to several meters above the actual street level for some locations.
Thus, the authors need to create a smooth, dense Digital Terrain Map (DTM) that contains only the altitude of the street level, and they do so in the following manner: Starting with a blank DTM, they first copy all available ground pixels into the DTM.

4.3. Congruence Coefficient

Note that in the edge map created from aerial images, I (x, y) is obtained with a Sobel edge detector and thus proportional to the intensity discontinuity in the image.
While this is in accordance with the observation that depth discontinuities often result in sharp intensity discontinuities, it is important that no thresholding is applied to the edge image, so as to also utilize depth discontinuities that are less visible in the images.
One can regard the ensemble of laser scan points in the local coordinate system as a second edge image; from this point of view, Eq. (7) essentially computes the correlation between the two edge images as a function of translation and rotation.
For the same scan, different global position parameters (x, y, θ ) yield different coefficients c(x, y, θ ); the largest coefficient is obtained for the parameter set with the best match between scan and edge map.
Figure 10 shows an example for a congruence coefficient for two different pose parameters.

5. Global Correction Based on Monte-Carlo-Localization

The authors propose MCL as a robust way to obtain globally correct pose estimates for the acquisition vehicle by using the edge map and the horizontal laser scans.
A motion phase and a perception phase are performed iteratively, both modeled as a stochastic process.
These particles are propagated over time using a combination of sequential importance sampling and resampling steps, in short referred to as samplingimportance-resampling.
Since orientation angle errors have already been corrected, this new path has considerably more accurate x and y values than the initial scan matching path.
If the authors use a DSM as a global reference, one can extend the path computation to the 6 DOF necessary in hill areas: Utilizing the additional altitude information the DTM provides, altitude and pitch can be estimated in a simple manner:.

6. 3D Model Generation

Once the pose of the vehicle and thus the laser scanners is known, the generation of a 3D point cloud is straightforward.
The authors calculate the 3D coordinates of the vertical scan points by applying a coordinate transformation from the local to the world coordinate system.
The structure of the resulting point cloud is given by scan number and angle, and therefore each vertex has defined neighbors, thus facilitating further processing significantly.
The authors calibrate the camera before their measurements and determine the transformation between its coordinate system and the laser coordinate system.
Since these facade models have been brought into perfect registration with either aerial photo or DSM, they can eventually be merged with models derived from this same airborne data.

7. Results

The ground-based data was acquired during a 37- minute-drive in Berkeley, California, for which the speed was only limited by the normal traffic conditions during business hours and the speed limit of 25 mph imposed by the city of Berkeley.
The authors have applied scan matching and initial path computation to the entire driven path.
Since superimposing a digital roadmap revealed that the photo can only be considered a metric map in the rather flat part within the dashed rectangle, the authors can only correct the 6.7 km long path segment in that area.
The authors have applied the MCL correction with different number of particles, and found that in areas with clear building structures, it is possible to track the path with 5,000 to 10,000 particles.
Furthermore, as seen in Fig. 18 in a close-up view, scan points for the same area align with each other even if they are taken during two different passes.

8. Conclusions

The authors have proposed a method for acquiring groundbased 3D building facade models, which uses acquisition vehicle equipped with two 2D laser scanners.
The authors have demonstrated that scan matching and MCL techniques in conjunction with an aerial photo or a DSM as global map are capable of accurately localizing their acquisition vehicle in complex urban environments.
Furthermore, with their truck-based system, the authors are only capable of driving on roads; hence the facades on the backsides of buildings cannot be captured at all.
Finally, the reconstructed raw models are visually not perfect.
Foreground objects appear cluttered and visually not pleasing since only their front side is captured, and facades contain large holes due to occlusions or reflecting glass surfaces.

Did you find this useful? Give us your feedback

Figures (24)

Figure 5. Reconstructed initial path from scan-to-scan matching, and superimposed horizontal scan points for each position.

Figure 6. Initial path obtained by adding relative steps, overlaid on top of a digital road map.

Figure 16. MCL-corrected path superimposed over DSM.

Figure 15. Assigned z coordinates as altitude above sea level.

Figure 7. Edge map from aerial photo: (a) aerial photo and (b) edge map.

Figure 11. Sets of MCL particles superimposed over edge map from aerial photo. Particles are shown in black and edges are shown in gray. (a) S0, (b) S30, (c) S100.

Figure 14. Global correction along the traveled path: (a) yaw angle difference between initial pose and intermediate pose before and after correction, (b) differences of x and y coordinates between angle-corrected pose and intermediate pose, before and after correction. In both diagrams, the differences after corrections are the curves close to the horizontal axis.

Figure 21. Surface mesh by triangulating the point cloud.

Figure 22. Alignment of two independently acquired facades.

Figure 12. Initial path computed by concatenating relative pose estimates obtained in the scan-to-scan matching process, superimposed on top of the DSM.

Figure 13. Laser scans projected onto aerial edge image after Monte Carlo localization.

Figure 8. Edge map obtained from DSM with a discontinuity filter. (a) DSM and (b) edge map.

Figure 9. DTM computation: (a) Original DSM, (b) estimated DTM, with blank spots remaining at building locations.

Figure 2. Truck with acquisition equipment.

Figure 3. Two scans before matching (left) and after matching (right).

Figure 23. Geometry of facade models overlaid on top of the aerial photo.

Figure 20. Details of structured point cloud.

Figure 19. Structured point cloud for a city block.

Figure 10. Edge image with scan points superimposed. The gray pixels denote the edge map and the black pixels denote scan points. (a) pose (xa, ya, qa), with c(xa, ya, qa) = 0.377 and (b) pose (xb, yb, qb), with c(xb, yb, qb) = 0.527, matching the airborne edge map best.

Figure 4. Block diagram of quality computation.

Content maybe subject to copyright Report

International Journal of Computer Vision 60(1), 5–24, 2004

 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

An Automated Method for Large-Scale, Ground-Based City

Model Acquisition

CHRISTIAN FR

UH AND AVIDEH ZAKHOR

Video and Image Processing Laboratory, University of California, Berkeley

frueh@eecs.berkeley.edu

avz@eecs.berkeley.edu

Received December 27, 2002; Revised December 11, 2003; Accepted December 11, 2003

Abstract. In this paper, we describe an automated method for fast, ground-based acquisition of large-scale 3D city

models. Our experimental set up consists of a truck equipped with one camera and two fast, inexpensive 2D laser

scanners, being driven on city streets under normal trafﬁc conditions. One scanner is mounted vertically to capture

building facades, and the other one is mounted horizontally. Successive horizontal scans are matched with each

other in order to determine an estimate of the vehicle’s motion, and relative motion estimates are concatenated to

form an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view,

this initial path is globally corrected by Monte-Carlo Localization techniques. Speciﬁcally, the ﬁnal global pose is

obtained by utilizing an aerial photograph or a Digital Surface Model as a global map, to which the ground-based

horizontal laser scans are matched. A fairly accurate, textured 3D cof the downtown Berkeley area has been acquired

in a matter of minutes, limited only by trafﬁc conditions during the data acquisition phase. Subsequent automated

processing time to accurately localize the acquisition vehicle is 235 minutes for a 37 minutes or 10.2 km drive, i.e.

23 minutes per kilometer.

Keywords: laser scanning, navigation, self-localization, mobile robots, 3D modeling, Monte-Carlo localization

1. Introduction

Three-dimensional models of urban environments are

used in applications such as urban planning, virtual re-

ality, and propagation simulation of radio waves for

the cell phone industry. Currently, acquisition of 3D

city models is difﬁcult and time consuming. Existing

large-scale models typically take months to create, and

usually require signiﬁcant manual intervention (Chan

et al., 1998). This process is not only prohibitively ex-

pensive, but also is unsuitable in applications where

a3Dsnapshot of a city is needed within a short time,

e.g. for disaster management or for monitoring changes

over time.

There exist a variety of approaches to creating 3D

models of cities from an airborne view via remote sens-

ing. Manual or automated matching of stereo images

can be used to obtain a Digital Surface Model (DSM),

i.e. a grid-like representation of the elevation level, and

3D models (Frere et al., 1998; Huertas et al., 1999).

In recent years, advances in resolution and accuracy

of Synthetic Aperture Radar (SAR) and airborne laser

scanners have also rendered them suitable for the gen-

eration of DSMs and 3D models (Brenner et al., 2001;

Maas, 2001). Although these methods can be reason-

ably fast, the resulting resolution of the models is only

in the meter range, and without manual intervention,

the resulting accuracy is often poor. Speciﬁcally, they

lack the level of detail that is required for realistic vir-

tual walk-throughs or drive-throughs.

While acquiring detailed building models from a

ground-level view has been addressed in previous

work, these attempts have been limited to one or a few

buildings. Debevec et al. (1996) proposes to reconstruct

6 Fr

uh and Zakhor

buildings based on few camera images in a semi-

automated way. Similarly, it is conceivable to apply

other vision-based approaches mainly designed for in-

door scenes to outdoor environments, such as structure-

from-motion methods (Koch et al., 1999) or variations

(Dellaert et al., 2000; Szeliski and Kang, 1995), or

voxel-based approaches (Seitz and Dyer, 1997), but

varying lighting conditions, the scale of the environ-

ment, and the complexity of outdoor scenes with many

trees and glass surfaces pose enormous challenges to

purely vision-based methods.

Stamos and Allen (2000) use a 3D laser scanner and

Thrun et al. (2000) and H¨ahnel et al. (2001) use 2D

laser scanners mounted on a mobile robot to achieve

complete automation, but the time required for data ac-

quisition of an entire city is prohibitively large; in addi-

tion, the reliability of autonomous mobile robots in out-

door environments is a critical issue. Antone and Teller

(2000) propose an approach based on high-resolution

half-spherical images, but data has to be acquired in

a stop-and-go fashion. While pose computation is so-

phisticated in this approach, the purely image-based

model reconstruction is difﬁcult and the obtained mod-

els are rather simple. Kawasaki et al. (1999) suggest a

method, which uses a video stream for texturing an al-

ready existing 3D model. Zhao and Shibasaki (1999)

use a vertical laser scanner in a car–based approach.

While this enables the traversal of large-scale city envi-

ronments in reasonable time, their localization is based

on the Global Positioning System (GPS). GPS is by

far the most common source of global position esti-

mates in outdoor environments; however, it has sev-

eral drawbacks: First, GPS tends to fail in dense ur-

ban environments, particularly in urban canyons where

few satellites are visible. Second, multi-path reﬂec-

tions can result in completely erroneous readouts, or

can decrease accuracy substantially. Third, a differ-

ential GPS system that could fulﬁll the hard accu-

racy requirements of ground-based modeling is quite

expensive.

On the other hand, digital roadmaps and perspective-

corrected aerial photos are widely available; similarly,

for an increasing number of urban areas, DSMs can

be found. Since both photos and DSM can provide

a geometrically correct view of an entire city, it is

conceivable to use them as a global map in order

to arrive at global position without use of GPS de-

vices. Another advantage of using an aerial photo or

a DSM over GPS is that the airborne data can poten-

tially be used to derive 3D models of a city from a

bird’s eye view, which can then be merged with the

3D facade models obtained from ground level laser

scans.

In this paper, we propose “drive-by scanning” as a

method that is capable of rapidly acquiring 3D geome-

try and texture data of an entire city at the ground level

by using a conﬁguration of two fast, inexpensive 2D

laser scanners and a digital camera. This data acquisi-

tion system is mounted on a truck moving at normal

speed on public roads, collecting data to be processed

ofﬂine. This approach has the advantage that data can

be acquired continuously, rather than in a stop-and-

go fashion, and is therefore much faster than existing

methods based on 3D scanners. While 3D scans overlap

and can hence be accurately registered with each other

if an initial pose is known approximately, this is not

possible in our approach, since vertical 2D scans are

parallel and do not overlap, hence posing more strin-

gent accuracy requirements for the localization of the

vehicle and its acquisition devices. More speciﬁcally,

it is necessary to determine the pose of successive laser

scans and camera images in a global coordinate sys-

tem with centimeter and sub-degree accuracy in order

to reconstruct a consistent model.

In our approach, rather than GPS, we use an aerial

image or an airborne DSM to precisely reconstruct

the path of the acquisition vehicle in ofﬂine compu-

tations: First, relative position changes are computed

with centimeter accuracy by matching successive hor-

izontal laser scans against each other, and are concate-

nated to reconstruct an initial path estimate. Since small

errors and occasional mismatches can accumulate to a

signiﬁcantly large level after longer driving periods,

Monte-Carlo-Localization (MCL) is then used in con-

junction with an airborne map to correct global pose.

Our approach is to match features observed in both the

laser scans and the aerial image or the DSM, whereby

the airborne data can be regarded as a global map onto

which the ground-based scan points have to be regis-

tered.

This problem is in many ways similar to the localiza-

tion of mobile robots, and as such, several approaches

have been developed in this context: Cox (1991) pro-

posed to match scan points from a laser scanner with

the lines of a manually created a-priori-map of an

indoor environment. Lu and Milios (1994) proposed

the iterative dual correspondence or IDC algorithm,

which is based on the matching-range-point rule. Gut-

mann and Schlegel (1996) focused on matching two

scans, and proposed the use of a line ﬁlter for both

An Automated Method for Large-Scale, Ground-Based City Model Acquisition 7

reference and second scans. There are several prob-

abilistic attempts to localize a robot in a given edge

map of the environment: Jensfelt and Kristensen (1999)

and Roumeliotis and Bekey (2000) propose multi-

hypotheses Kalman ﬁlters, which represent the pose

belief as mixtures of Gaussians. Markov Localization,

which assumes static environments, has been applied

in Russell and Norvig (1995), Simmons and Koenig

(1995) and Fox et al. (1999). In grid-based Markov lo-

calization, the parameter space is partitioned into grid

cells, each representing the probability in a parameter

“cube” by a ﬂoating point value, e.g. in Burgard (1996)

and Fox et al. (1999). In MCL, also known as particle

ﬁltering or as condensation algorithm, a large number

of random samples, or particles, is utilized to represent

probability distributions (Fox et al., 2000; Thrun et al.,

2001).

Lu and Milios (1997b), Thrun et al. (1998b) and Gut-

mann and Konolige (1999) have investigated simulta-

neous map building and localization in indoor environ-

ments by establishing cross-consistency over multiple

2D laser scans, without the use of a global map. How-

ever, these methods are not applicable to outdoor scale,

since their complexity usually increases as O(n

) with

n denoting the number of scans. Additionally, cities are

extremely cyclic environments, with often no cross-

correspondences to other scans during long driving pe-

riods. What makes our problem different from indoor

localization is the scale of the environment, because

distances involved in making 3D models for cities are

large compared to the range of the laser scans. Fur-

thermore, our approach is to obtain relative motion not

from odometry, but from scan-to-scan matching, and

the global map derived from a photo or a DSM does

not have the same quality as a CAD ground plan of an

indoor environment. Finally, while indoor localization

is usually a 2D problem, in this paper, we recover a

6-degree-of-freedom (DOF) pose in case of a DSM as

a global map.

The outline of this paper is as follows: Section 2

describes the system overview and Section 3 the data

acquisition system. Section 4 is devoted to the relative

pose estimation and initial path computation based on

laser scan matching. In Section 5, we address global

map generation from an aerial image or a DSM, and in

Section 6, we discuss pose correction based on regis-

tration of laser scans with the map by means of MCL.

Section 7 brieﬂy outlines the model generation, and we

ﬁnally show the results for a 10-kilometer drive in an

actual city environment in Section 8.

Figure 1. Experimental setup.

2. System Overview

The data acquisition system is mounted on a truck and

consists of two parts: a sensor module and a process-

ing unit (Fr¨uh and Zakhor, 2001a). The processing unit

consists of a dual processor PC, large hard disk drives,

and additional electronics for power supply and signal

shaping; the sensor module consists of two SICK 2D

laser scanners, a digital camera, and a heading sensor.

It is mounted on a rack at a height of approximately 3.6

meters, in order to avoid moving obstacles such as cars

and pedestrians in the direct view. The scanners have

a 180

◦

ﬁeld of view with a resolution of 1

◦

,arange of

80 meters and an accuracy of ±3.5 centimeters. Both

2D scanners face the same side of the street. One is

mounted vertically with the scanning plane orthogonal

to the driving direction, and the other is mounted hori-

zontally with the scanning plane parallel to the ground.

Figure 1 shows the experimental setup for our data ac-

quisition.

The vertical scanner detects the shape of the build-

ing facades as we drive by, and therefore we refer to

our method as drive-by scanning; the horizontal scan-

ner operates in a plane parallel to the ground and is

used for pose estimation as described in this paper. The

camera’s line of sight is the intersection between the

orthogonal scanning planes. All sensors are synchro-

nized with each other and acquire data at prespeciﬁed

times. Figure 2 shows a picture of the truck with rack

and equipment.

3. Relative Position Estimation

and Path Computation

In this section, we compute relative pose estimates and

an initial path by matching successive horizontal laser

scans. A popular approach for matching two 3D scans

8 Fr

uh and Zakhor

Figure 2.Truck with acquisition equipment.

is the Iterative Closest Point (ICP) algorithm (Besl and

McKay, 1992), and its 2D equivalent Iterative Dual

Correspondence (IDC) (Lu and Milios, 1994). Both al-

gorithms work directly on the scan points rather than on

extracted features, and iteratively reﬁne pose and point-

to-point correspondences in order to converge to a ﬁnal

pose estimate. Our scan matching approach is similar

to the line-based ones in Cox (1991) and Gutmann and

Schlegel (1996), except for two modiﬁcations: First,

we use both lines and single points of the reference

scan for matching; second, we do not treat the problem

of eliminating erroneous correspondences separately

from the matching; rather, we consider it directly in

the computation of a match quality function by using

robust least squares, as will be seen later.

We ﬁrst introduce a Cartesian world coordinate

system [x, y, z] where x, y deﬁnes the geographical

Figure 3.Two scans before matching (left) and after matching (right).

ground plane and z the altitude as shown in Fig. 1. We

also deﬁne a truck coordinate system [u,v] which is

implied by the horizontal laser scanner. In a ﬂat envi-

ronment, a 2D pose of the truck and its local coordinate

system can be entirely described by the two coordinates

x, y and the yaw orientation angle θ;inthis case, the

u-v coordinate system is assumed to be parallel to the

x-y plane as shown in Fig. 1.

Since horizontal scans are taken continuously dur-

ing driving and hence overlap substantially, the relative

pose between the two capture positions can be deter-

mined by matching their corresponding laser scans,

as shown in Fig. 3. We assume that the two scans

have maximum congruence for the particular transla-

tion



t = (u,v) and rotation ϕ, which exactly

compensate for the motion between the two acquisition

times. In practice however, scans taken from different

positions do not match perfectly because of different

sample density on objects, occlusions, and measure-

ment noise. We will address the ﬁrst issue by linear in-

terpolation between scan points, and the two others by

utilizing robust least squares as an outlier-tolerant way

to ﬁnd the pose with the smallest possible discrepancy.

Taking one scan as reference scan, we maximize

Q = f (u,v,ϕ), which computes the quality

of alignment as a function of a given displacement

u,v and rotation ϕ of the scans against each

other. More speciﬁcally, we perform the following

steps: First, we compute a set of lines l

from the refer-

ence scan by connecting adjacent scan points provided

their discontinuity is below a threshold; this results in a

line strip approximation that may also contain isolated

points as inﬁnitely short lines. In this fashion, the ref-

erence scan is transformed into an edge map to which

the second scan can be registered.

Given a translation vector



t = (u,v)anda2×

2 rotation matrix R(ϕ) with rotation angle ϕ,we

An Automated Method for Large-Scale, Ground-Based City Model Acquisition 9

transform the points p

of the second scan to the points



according to

p



= (u,v,ϕ) = R(ϕ) ·p



t (1)

Then, for each point p



,wecompute the Euclidean

distance d( p



, l

)toeach line segment l

and set d

min

to:

min

( p



(u,v,ϕ)) = min

{d( p



, l

)}. (2)

Intuitively, d

min

is the distance between p



and the clos-

est point on any of the lines in the reference scan.

Distance measurement noise of the scanners can be

approximately modeled as Gaussian, but there are ad-

ditionally extreme errors due to occlusion effects or

multi-reﬂections, which prohibit using the sum of dis-

tance squares as an error function. Thus, in order to

suppress erroneous point-to-line correspondences, we

use robust least squares (Triggs et al., 2000) and com-

pute Q as follows:

Q(u,v,ϕ)



exp



−

min

( p



(u,v,ϕ))

2 · σ



(3)

where σ

is the variance of the laser distance mea-

surement, speciﬁed by the manufacturer. This equation

takes into account the distribution of the distance mea-

surement values, while suppressing deviations beyond

this distribution as outliers. It has least squares-like be-

havior in the near range, but does not take into account

points that are far away from any line. Thus, only the

‘good’ scan points contribute to this quality function,

and we do not have to eliminate outliers prior to the

matching. The block diagram of this quality computa-

tion is shown in Fig. 4.

Figure 4. Block diagram of quality computation.

The parameters (u,v,ϕ) for the best match

between a scan pair are found by optimizing Q. Steep-

est decent search methods have the advantage of ﬁnd-

ing the minimum fast, but due to noise and erroneous

point-to-line assignments, they can become trapped in

local minima if not started from a “good” initial point.

Therefore, we use a combined method of sampling the

parameter space and discrete steepest decent, where

we ﬁrst sample the parameter space in coarse steps and

then reﬁne the search around the minimum by steep-

est decent. Hence, we obtain a relative pose estimate

(u,v,ϕ) between two scans, to which we refer

in the following as a “step”. For a series of successive

scans indexed by k,wecan compute a series of steps

(u

,v

,ϕ

), denoting the relative pose between

scan k and scan k + 1.

To reconstruct the driven 2D path, we start with an

initial position (x

, y

,θ

), perform a scan match for

each step k, and concatenate the steps (u

,v

,ϕ

)

to form a path. For non-ﬂat areas, this 2D computation

can result in an apparent source of error in length: The

scan-to-scan matching estimates for the 3-DOF relative

motion, i.e. 2D translation and rotation, are given in

the scanner’s local scanning plane. If the vehicle is

on a slope, this local coordinate system is tilted at an

angle towards the global (x, y) plane, and hence the

translation should strictly speaking be corrected with

the cosine of the pitch angle. Fortunately, the stretching

effect is small, and the relative length error is given by:

l

err

= 1 − cos(pitch) ≈ 1 − (1 − pitch

) = pitch

(4)

While for an impressive 10% slope, the relative error

is 0.5%, for a moderate 2% slope, it only amounts to

0.06%. Thus, it turns out that this error is easily within

the correction capability of our global localization in-

troduced in the next section. Hence, we utilize the rel-

ative estimates from the scan-to-scan matching as if

they were given parallel to the ground plane, and con-

catenate the steps (u

,v

,ϕ

), so that the next

global pose (x

k+1

, y

k+1

,θ

k+1

)ofthe path is computed

iteratively as follows:

k+1

= x

+ u

· cos(θ

) − v

· sin(θ

)

= y

+ u

· sin(θ

) + v

· cos(θ

) (5)

k+1

= θ

+ ϕ

As the frequency of horizontal scans is 75 Hz, assum-

ing the maximum city driving speed of 25 miles per

HTML Viewer

Frequently Asked Questions (17)

Q1. What are the contributions mentioned in the paper "An automated method for large-scale, ground-based city model acquisition" ?

In this paper, the authors describe an automated method for fast, ground-based acquisition of large-scale 3D city models.

Q2. What future works have the authors mentioned in the paper "An automated method for large-scale, ground-based city model acquisition" ?

If desired, it is straightforward to extend their approach to rural areas by including GPS information during the MCL perception phase. Furthermore, with their truck-based system, the authors are only capable of driving on roads ; hence the facades on the backsides of buildings can not be captured at all. For downtown areas which can be separated into a facade and a foreground layer, the authors have proposed processing algorithms for removing cluttered foreground objects and completing occlusion holes in order to obtain visually pleasing facade models ( Früh and Zakhor, 2002 ).

Q3. What is the advantage of using an aerial photo or a DSM over GPS?

Another advantage of using an aerial photo or a DSM over GPS is that the airborne data can potentially be used to derive 3D models of a city from abird’s eye view, which can then be merged with the 3D facade models obtained from ground level laser scans.

Q4. What are some of the applications of 3D models of urban environments?

Three-dimensional models of urban environments are used in applications such as urban planning, virtual reality, and propagation simulation of radio waves for the cell phone industry.

Q5. What is the way to determine the quality of alignment?

Taking one scan as reference scan, the authors maximize Q = f ( u, v, ϕ), which computes the quality of alignment as a function of a given displacement u, v and rotation ϕ of the scans against each other.

Q6. How are the camera and laser scanners synchronized?

camera and laser scanners are synchronized by trigger signals and are mounted in a rigid configuration on the sensor platform.

Q7. What is the simplest way to reconstruct the driven 2D path?

1.To reconstruct the driven 2D path, the authors start with an initial position (x0, y0, θ0), perform a scan match for each step k, and concatenate the steps ( uk, vk, ϕk) to form a path.

Q8. What is the advantage of using a photo or a DSM as a global map?

Since both photos and DSM can provide a geometrically correct view of an entire city, it is conceivable to use them as a global map in order to arrive at global position without use of GPS devices.

Q9. How can the authors use the MCL technique to localize buildings?

The authors have demonstrated that scan matching and MCL techniques in conjunction with an aerial photo or a DSM as global map are capable of accurately localizing their acquisition vehicle in complex urban environments.

Q10. How can the authors determine the relative pose of a truck?

1.Since horizontal scans are taken continuously during driving and hence overlap substantially, the relative pose between the two capture positions can be determined by matching their corresponding laser scans, as shown in Fig.

Q11. Why are the facades on the backside of buildings not captured?

with their truck-based system, the authors are only capable of driving on roads; hence the facades on the backsides of buildings cannot be captured at all.

Q12. What is the way to limit the position of the particles?

Especially for the edge map from the aerial photo, it is reasonable to use the registered digital roadmap in order to restrict positions of the particles to within a fewmeter-wide strip around roads.

Q13. how do the authors suppress erroneous point-to-line correspondences?

in order to suppress erroneous point-to-line correspondences, the authors use robust least squares (Triggs et al., 2000) and compute Q as follows:Q( u, v, ϕ)= ∑jexp ( −dmin( p ′ j ( u, v, ϕ)) 22 · σ 2s) (3)where σ 2s is the variance of the laser distance measurement, specified by the manufacturer.

Q14. What is the significance factor of each particle in set Sk?

As such, the importance factor of each particle is used in the resampling phase to compute the set Sk+1 from set Sk in the following way: A given particle in set Sk is passed along to set Sk+1 with probability proportional to its importance factor.

Q15. Why is it not necessary to impose a constraint of cross consistency across all scan points?

Due to the fact that the authors have used the same global map for correcting all parts of the path, it is not necessary to explicitly impose the constraint of cross consistency across all scan points; this justifies their computationally simple approach of correcting the relative path estimates with the particle sets Sk .

Q16. How do the authors find the pose of a truck?

The authors will address the first issue by linear interpolation between scan points, and the two others by utilizing robust least squares as an outlier-tolerant way to find the pose with the smallest possible discrepancy.

Q17. What is the correlation coefficient for the same scan?

For the same scan, different global position parameters (x, y, θ ) yield different coefficients c(x, y, θ ); the largest coefficient is obtained for the parameter set with the best match between scan and edge map.

An Automated Method for Large-Scale, Ground-Based City Model Acquisition

Summary (4 min read)

1. Introduction

2. System Overview

3. Relative Position Estimation and Path Computation

4. Global Maps from Aerial Images or DSM

4.1. Edge Maps from Aerial Photos

4.2. Edge Map from DSM

4.3. Congruence Coefficient

5. Global Correction Based on Monte-Carlo-Localization

6. 3D Model Generation

7. Results

8. Conclusions

Figures (24)

Citations

Cites methods from "An Automated Method for Large-Scale..."

Additional excerpts

Cites background from "An Automated Method for Large-Scale..."

References

"An Automated Method for Large-Scale..." refers background in this paper

"An Automated Method for Large-Scale..." refers background in this paper

"An Automated Method for Large-Scale..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (17)

Q1. What are the contributions mentioned in the paper "An automated method for large-scale, ground-based city model acquisition" ?

Q2. What future works have the authors mentioned in the paper "An automated method for large-scale, ground-based city model acquisition" ?

Q3. What is the advantage of using an aerial photo or a DSM over GPS?

Q4. What are some of the applications of 3D models of urban environments?

Q5. What is the way to determine the quality of alignment?

Q6. How are the camera and laser scanners synchronized?

Q7. What is the simplest way to reconstruct the driven 2D path?

Q8. What is the advantage of using a photo or a DSM as a global map?

Q9. How can the authors use the MCL technique to localize buildings?

Q10. How can the authors determine the relative pose of a truck?

Q11. Why are the facades on the backside of buildings not captured?

Q12. What is the way to limit the position of the particles?

Q13. how do the authors suppress erroneous point-to-line correspondences?

Q14. What is the significance factor of each particle in set Sk?

Q15. Why is it not necessary to impose a constraint of cross consistency across all scan points?

Q16. How do the authors find the pose of a truck?

Q17. What is the correlation coefficient for the same scan?