scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Automated Method for Large-Scale, Ground-Based City Model Acquisition

01 Oct 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 1, pp 5-24
TL;DR: An automated method for fast, ground-based acquisition of large-scale 3D city models by utilizing an aerial photograph or a Digital Surface Model as a global map, to which the ground- based horizontal laser scans are matched.
Abstract: In this paper, we describe an automated method for fast, ground-based acquisition of large-scale 3D city models. Our experimental set up consists of a truck equipped with one camera and two fast, inexpensive 2D laser scanners, being driven on city streets under normal traffic conditions. One scanner is mounted vertically to capture building facades, and the other one is mounted horizontally. Successive horizontal scans are matched with each other in order to determine an estimate of the vehicle's motion, and relative motion estimates are concatenated to form an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view, this initial path is globally corrected by Monte-Carlo Localization techniques. Specifically, the final global pose is obtained by utilizing an aerial photograph or a Digital Surface Model as a global map, to which the ground-based horizontal laser scans are matched. A fairly accurate, textured 3D cof the downtown Berkeley area has been acquired in a matter of minutes, limited only by traffic conditions during the data acquisition phase. Subsequent automated processing time to accurately localize the acquisition vehicle is 235 minutes for a 37 minutes or 10.2 km drive, i.e. 23 minutes per kilometer.

Summary (4 min read)

1. Introduction

  • Three-dimensional models of urban environments are used in applications such as urban planning, virtual reality, and propagation simulation of radio waves for the cell phone industry.
  • The resulting resolution of the models is only in the meter range, and without manual intervention, the resulting accuracy is often poor.
  • More specifically, it is necessary to determine the pose of successive laser scans and camera images in a global coordinate system with centimeter and sub-degree accuracy in order to reconstruct a consistent model.
  • What makes their problem different from indoor localization is the scale of the environment, because distances involved in making 3D models for cities are large compared to the range of the laser scans.

2. System Overview

  • The data acquisition system is mounted on a truck and consists of two parts: a sensor module and a processing unit (Früh and Zakhor, 2001a).
  • The processing unit consists of a dual processor PC, large hard disk drives, and additional electronics for power supply and signal shaping; the sensor module consists of two SICK 2D laser scanners, a digital camera, and a heading sensor.
  • Both 2D scanners face the same side of the street.
  • Figure 1 shows the experimental setup for their data acquisition.
  • Figure 2 shows a picture of the truck with rack and equipment.

3. Relative Position Estimation and Path Computation

  • The authors compute relative pose estimates and an initial path by matching successive horizontal laser scans.
  • First, the authors use both lines and single points of the reference scan for matching; second, they do not treat the problem of eliminating erroneous correspondences separately from the matching; rather, they consider it directly in the computation of a match quality function by using robust least squares, as will be seen later.
  • Thus, only the ‘good’ scan points contribute to this quality function, and the authors do not have to eliminate outliers prior to the matching.
  • Even small errors in the relative estimates, especially inaccurate angles, accumulate to significant global pose errors over long driving periods.

4. Global Maps from Aerial Images or DSM

  • The authors derive a global edge map either from an aerial photo or from a DSM, and define a congruence coefficient as measure for the match between groundbased scans and global edge map.
  • The basic idea behind their position correction is that objects seen from the road-based data acquisition must in principle also be visible in the aerial photos or the DSM.
  • Making the assumption that the position of building facades and building footprints are identical or at least sufficiently similar, one can expect that the shapes of the horizontal laser scans match edges in the aerial image or the DSM.
  • Essentially, the authors use the airborne edge maps as an occupancy grid for global localization, as it has been done for mobile robots with floor occupancy grids in indoor environments (Konolige and Chou, 1999; Thrun, 2001).

4.1. Edge Maps from Aerial Photos

  • While perspective corrected photos with a 1-meter resolution are readily available from USGS, the authors choose to use a higher contrast aerial photograph obtained by Vexcel Corporation, CO, with 1-foot resolution.
  • (2) The photos and the scans were not taken at the same time, so the content can potentially be different.
  • In particular dynamic objects such as cars or buses can cause mismatches.
  • (3) Visible in the image are not only the building edges, but also many non-3D edges such as road stripes or crosswalk borders.
  • Especially problematic are shadows, because they result in very strong edges.

4.2. Edge Map from DSM

  • In their case, a DSM with a one-meter resolution was created using airborne laser scans acquired by Airborne 1 Corp., CA.
  • Using a DSM as a source of a global edge map has several advantages over aerial images:.
  • Furthermore, the intensity of an edge is not dependent on the altitude of the building; what matters is whether a discontinuity exceeds the threshold zedge.
  • Nevertheless, it is not advisable to directly use the z value at a DSM location, since the airborne laser captures overhanging trees and cars on the road at the time of the data acquisition, resulting in z-values of up to several meters above the actual street level for some locations.
  • Thus, the authors need to create a smooth, dense Digital Terrain Map (DTM) that contains only the altitude of the street level, and they do so in the following manner: Starting with a blank DTM, they first copy all available ground pixels into the DTM.

4.3. Congruence Coefficient

  • Note that in the edge map created from aerial images, I (x, y) is obtained with a Sobel edge detector and thus proportional to the intensity discontinuity in the image.
  • While this is in accordance with the observation that depth discontinuities often result in sharp intensity discontinuities, it is important that no thresholding is applied to the edge image, so as to also utilize depth discontinuities that are less visible in the images.
  • One can regard the ensemble of laser scan points in the local coordinate system as a second edge image; from this point of view, Eq. (7) essentially computes the correlation between the two edge images as a function of translation and rotation.
  • For the same scan, different global position parameters (x, y, θ ) yield different coefficients c(x, y, θ ); the largest coefficient is obtained for the parameter set with the best match between scan and edge map.
  • Figure 10 shows an example for a congruence coefficient for two different pose parameters.

5. Global Correction Based on Monte-Carlo-Localization

  • The authors propose MCL as a robust way to obtain globally correct pose estimates for the acquisition vehicle by using the edge map and the horizontal laser scans.
  • A motion phase and a perception phase are performed iteratively, both modeled as a stochastic process.
  • These particles are propagated over time using a combination of sequential importance sampling and resampling steps, in short referred to as samplingimportance-resampling.
  • Since orientation angle errors have already been corrected, this new path has considerably more accurate x and y values than the initial scan matching path.
  • If the authors use a DSM as a global reference, one can extend the path computation to the 6 DOF necessary in hill areas: Utilizing the additional altitude information the DTM provides, altitude and pitch can be estimated in a simple manner:.

6. 3D Model Generation

  • Once the pose of the vehicle and thus the laser scanners is known, the generation of a 3D point cloud is straightforward.
  • The authors calculate the 3D coordinates of the vertical scan points by applying a coordinate transformation from the local to the world coordinate system.
  • The structure of the resulting point cloud is given by scan number and angle, and therefore each vertex has defined neighbors, thus facilitating further processing significantly.
  • The authors calibrate the camera before their measurements and determine the transformation between its coordinate system and the laser coordinate system.
  • Since these facade models have been brought into perfect registration with either aerial photo or DSM, they can eventually be merged with models derived from this same airborne data.

7. Results

  • The ground-based data was acquired during a 37- minute-drive in Berkeley, California, for which the speed was only limited by the normal traffic conditions during business hours and the speed limit of 25 mph imposed by the city of Berkeley.
  • The authors have applied scan matching and initial path computation to the entire driven path.
  • Since superimposing a digital roadmap revealed that the photo can only be considered a metric map in the rather flat part within the dashed rectangle, the authors can only correct the 6.7 km long path segment in that area.
  • The authors have applied the MCL correction with different number of particles, and found that in areas with clear building structures, it is possible to track the path with 5,000 to 10,000 particles.
  • Furthermore, as seen in Fig. 18 in a close-up view, scan points for the same area align with each other even if they are taken during two different passes.

8. Conclusions

  • The authors have proposed a method for acquiring groundbased 3D building facade models, which uses acquisition vehicle equipped with two 2D laser scanners.
  • The authors have demonstrated that scan matching and MCL techniques in conjunction with an aerial photo or a DSM as global map are capable of accurately localizing their acquisition vehicle in complex urban environments.
  • Furthermore, with their truck-based system, the authors are only capable of driving on roads; hence the facades on the backsides of buildings cannot be captured at all.
  • Finally, the reconstructed raw models are visually not perfect.
  • Foreground objects appear cluttered and visually not pleasing since only their front side is captured, and facades contain large holes due to occlusions or reflecting glass surfaces.

Did you find this useful? Give us your feedback

Figures (24)

Content maybe subject to copyright    Report

International Journal of Computer Vision 60(1), 5–24, 2004
c
2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
An Automated Method for Large-Scale, Ground-Based City
Model Acquisition
CHRISTIAN FR
¨
UH AND AVIDEH ZAKHOR
Video and Image Processing Laboratory, University of California, Berkeley
frueh@eecs.berkeley.edu
avz@eecs.berkeley.edu
Received December 27, 2002; Revised December 11, 2003; Accepted December 11, 2003
Abstract. In this paper, we describe an automated method for fast, ground-based acquisition of large-scale 3D city
models. Our experimental set up consists of a truck equipped with one camera and two fast, inexpensive 2D laser
scanners, being driven on city streets under normal traffic conditions. One scanner is mounted vertically to capture
building facades, and the other one is mounted horizontally. Successive horizontal scans are matched with each
other in order to determine an estimate of the vehicle’s motion, and relative motion estimates are concatenated to
form an initial path. Assuming that features such as buildings are visible from both ground-based and airborne view,
this initial path is globally corrected by Monte-Carlo Localization techniques. Specifically, the final global pose is
obtained by utilizing an aerial photograph or a Digital Surface Model as a global map, to which the ground-based
horizontal laser scans are matched. A fairly accurate, textured 3D cof the downtown Berkeley area has been acquired
in a matter of minutes, limited only by traffic conditions during the data acquisition phase. Subsequent automated
processing time to accurately localize the acquisition vehicle is 235 minutes for a 37 minutes or 10.2 km drive, i.e.
23 minutes per kilometer.
Keywords: laser scanning, navigation, self-localization, mobile robots, 3D modeling, Monte-Carlo localization
1. Introduction
Three-dimensional models of urban environments are
used in applications such as urban planning, virtual re-
ality, and propagation simulation of radio waves for
the cell phone industry. Currently, acquisition of 3D
city models is difficult and time consuming. Existing
large-scale models typically take months to create, and
usually require significant manual intervention (Chan
et al., 1998). This process is not only prohibitively ex-
pensive, but also is unsuitable in applications where
a3Dsnapshot of a city is needed within a short time,
e.g. for disaster management or for monitoring changes
over time.
There exist a variety of approaches to creating 3D
models of cities from an airborne view via remote sens-
ing. Manual or automated matching of stereo images
can be used to obtain a Digital Surface Model (DSM),
i.e. a grid-like representation of the elevation level, and
3D models (Frere et al., 1998; Huertas et al., 1999).
In recent years, advances in resolution and accuracy
of Synthetic Aperture Radar (SAR) and airborne laser
scanners have also rendered them suitable for the gen-
eration of DSMs and 3D models (Brenner et al., 2001;
Maas, 2001). Although these methods can be reason-
ably fast, the resulting resolution of the models is only
in the meter range, and without manual intervention,
the resulting accuracy is often poor. Specifically, they
lack the level of detail that is required for realistic vir-
tual walk-throughs or drive-throughs.
While acquiring detailed building models from a
ground-level view has been addressed in previous
work, these attempts have been limited to one or a few
buildings. Debevec et al. (1996) proposes to reconstruct

6 Fr
¨
uh and Zakhor
buildings based on few camera images in a semi-
automated way. Similarly, it is conceivable to apply
other vision-based approaches mainly designed for in-
door scenes to outdoor environments, such as structure-
from-motion methods (Koch et al., 1999) or variations
(Dellaert et al., 2000; Szeliski and Kang, 1995), or
voxel-based approaches (Seitz and Dyer, 1997), but
varying lighting conditions, the scale of the environ-
ment, and the complexity of outdoor scenes with many
trees and glass surfaces pose enormous challenges to
purely vision-based methods.
Stamos and Allen (2000) use a 3D laser scanner and
Thrun et al. (2000) and H¨ahnel et al. (2001) use 2D
laser scanners mounted on a mobile robot to achieve
complete automation, but the time required for data ac-
quisition of an entire city is prohibitively large; in addi-
tion, the reliability of autonomous mobile robots in out-
door environments is a critical issue. Antone and Teller
(2000) propose an approach based on high-resolution
half-spherical images, but data has to be acquired in
a stop-and-go fashion. While pose computation is so-
phisticated in this approach, the purely image-based
model reconstruction is difficult and the obtained mod-
els are rather simple. Kawasaki et al. (1999) suggest a
method, which uses a video stream for texturing an al-
ready existing 3D model. Zhao and Shibasaki (1999)
use a vertical laser scanner in a car–based approach.
While this enables the traversal of large-scale city envi-
ronments in reasonable time, their localization is based
on the Global Positioning System (GPS). GPS is by
far the most common source of global position esti-
mates in outdoor environments; however, it has sev-
eral drawbacks: First, GPS tends to fail in dense ur-
ban environments, particularly in urban canyons where
few satellites are visible. Second, multi-path reflec-
tions can result in completely erroneous readouts, or
can decrease accuracy substantially. Third, a differ-
ential GPS system that could fulfill the hard accu-
racy requirements of ground-based modeling is quite
expensive.
On the other hand, digital roadmaps and perspective-
corrected aerial photos are widely available; similarly,
for an increasing number of urban areas, DSMs can
be found. Since both photos and DSM can provide
a geometrically correct view of an entire city, it is
conceivable to use them as a global map in order
to arrive at global position without use of GPS de-
vices. Another advantage of using an aerial photo or
a DSM over GPS is that the airborne data can poten-
tially be used to derive 3D models of a city from a
bird’s eye view, which can then be merged with the
3D facade models obtained from ground level laser
scans.
In this paper, we propose “drive-by scanning” as a
method that is capable of rapidly acquiring 3D geome-
try and texture data of an entire city at the ground level
by using a configuration of two fast, inexpensive 2D
laser scanners and a digital camera. This data acquisi-
tion system is mounted on a truck moving at normal
speed on public roads, collecting data to be processed
offline. This approach has the advantage that data can
be acquired continuously, rather than in a stop-and-
go fashion, and is therefore much faster than existing
methods based on 3D scanners. While 3D scans overlap
and can hence be accurately registered with each other
if an initial pose is known approximately, this is not
possible in our approach, since vertical 2D scans are
parallel and do not overlap, hence posing more strin-
gent accuracy requirements for the localization of the
vehicle and its acquisition devices. More specifically,
it is necessary to determine the pose of successive laser
scans and camera images in a global coordinate sys-
tem with centimeter and sub-degree accuracy in order
to reconstruct a consistent model.
In our approach, rather than GPS, we use an aerial
image or an airborne DSM to precisely reconstruct
the path of the acquisition vehicle in offline compu-
tations: First, relative position changes are computed
with centimeter accuracy by matching successive hor-
izontal laser scans against each other, and are concate-
nated to reconstruct an initial path estimate. Since small
errors and occasional mismatches can accumulate to a
significantly large level after longer driving periods,
Monte-Carlo-Localization (MCL) is then used in con-
junction with an airborne map to correct global pose.
Our approach is to match features observed in both the
laser scans and the aerial image or the DSM, whereby
the airborne data can be regarded as a global map onto
which the ground-based scan points have to be regis-
tered.
This problem is in many ways similar to the localiza-
tion of mobile robots, and as such, several approaches
have been developed in this context: Cox (1991) pro-
posed to match scan points from a laser scanner with
the lines of a manually created a-priori-map of an
indoor environment. Lu and Milios (1994) proposed
the iterative dual correspondence or IDC algorithm,
which is based on the matching-range-point rule. Gut-
mann and Schlegel (1996) focused on matching two
scans, and proposed the use of a line filter for both

An Automated Method for Large-Scale, Ground-Based City Model Acquisition 7
reference and second scans. There are several prob-
abilistic attempts to localize a robot in a given edge
map of the environment: Jensfelt and Kristensen (1999)
and Roumeliotis and Bekey (2000) propose multi-
hypotheses Kalman filters, which represent the pose
belief as mixtures of Gaussians. Markov Localization,
which assumes static environments, has been applied
in Russell and Norvig (1995), Simmons and Koenig
(1995) and Fox et al. (1999). In grid-based Markov lo-
calization, the parameter space is partitioned into grid
cells, each representing the probability in a parameter
“cube” by a floating point value, e.g. in Burgard (1996)
and Fox et al. (1999). In MCL, also known as particle
filtering or as condensation algorithm, a large number
of random samples, or particles, is utilized to represent
probability distributions (Fox et al., 2000; Thrun et al.,
2001).
Lu and Milios (1997b), Thrun et al. (1998b) and Gut-
mann and Konolige (1999) have investigated simulta-
neous map building and localization in indoor environ-
ments by establishing cross-consistency over multiple
2D laser scans, without the use of a global map. How-
ever, these methods are not applicable to outdoor scale,
since their complexity usually increases as O(n
2
) with
n denoting the number of scans. Additionally, cities are
extremely cyclic environments, with often no cross-
correspondences to other scans during long driving pe-
riods. What makes our problem different from indoor
localization is the scale of the environment, because
distances involved in making 3D models for cities are
large compared to the range of the laser scans. Fur-
thermore, our approach is to obtain relative motion not
from odometry, but from scan-to-scan matching, and
the global map derived from a photo or a DSM does
not have the same quality as a CAD ground plan of an
indoor environment. Finally, while indoor localization
is usually a 2D problem, in this paper, we recover a
6-degree-of-freedom (DOF) pose in case of a DSM as
a global map.
The outline of this paper is as follows: Section 2
describes the system overview and Section 3 the data
acquisition system. Section 4 is devoted to the relative
pose estimation and initial path computation based on
laser scan matching. In Section 5, we address global
map generation from an aerial image or a DSM, and in
Section 6, we discuss pose correction based on regis-
tration of laser scans with the map by means of MCL.
Section 7 briefly outlines the model generation, and we
finally show the results for a 10-kilometer drive in an
actual city environment in Section 8.
Figure 1. Experimental setup.
2. System Overview
The data acquisition system is mounted on a truck and
consists of two parts: a sensor module and a process-
ing unit (Fr¨uh and Zakhor, 2001a). The processing unit
consists of a dual processor PC, large hard disk drives,
and additional electronics for power supply and signal
shaping; the sensor module consists of two SICK 2D
laser scanners, a digital camera, and a heading sensor.
It is mounted on a rack at a height of approximately 3.6
meters, in order to avoid moving obstacles such as cars
and pedestrians in the direct view. The scanners have
a 180
field of view with a resolution of 1
,arange of
80 meters and an accuracy of ±3.5 centimeters. Both
2D scanners face the same side of the street. One is
mounted vertically with the scanning plane orthogonal
to the driving direction, and the other is mounted hori-
zontally with the scanning plane parallel to the ground.
Figure 1 shows the experimental setup for our data ac-
quisition.
The vertical scanner detects the shape of the build-
ing facades as we drive by, and therefore we refer to
our method as drive-by scanning; the horizontal scan-
ner operates in a plane parallel to the ground and is
used for pose estimation as described in this paper. The
camera’s line of sight is the intersection between the
orthogonal scanning planes. All sensors are synchro-
nized with each other and acquire data at prespecified
times. Figure 2 shows a picture of the truck with rack
and equipment.
3. Relative Position Estimation
and Path Computation
In this section, we compute relative pose estimates and
an initial path by matching successive horizontal laser
scans. A popular approach for matching two 3D scans

8 Fr
¨
uh and Zakhor
Figure 2.Truck with acquisition equipment.
is the Iterative Closest Point (ICP) algorithm (Besl and
McKay, 1992), and its 2D equivalent Iterative Dual
Correspondence (IDC) (Lu and Milios, 1994). Both al-
gorithms work directly on the scan points rather than on
extracted features, and iteratively refine pose and point-
to-point correspondences in order to converge to a final
pose estimate. Our scan matching approach is similar
to the line-based ones in Cox (1991) and Gutmann and
Schlegel (1996), except for two modifications: First,
we use both lines and single points of the reference
scan for matching; second, we do not treat the problem
of eliminating erroneous correspondences separately
from the matching; rather, we consider it directly in
the computation of a match quality function by using
robust least squares, as will be seen later.
We first introduce a Cartesian world coordinate
system [x, y, z] where x, y defines the geographical
Figure 3.Two scans before matching (left) and after matching (right).
ground plane and z the altitude as shown in Fig. 1. We
also define a truck coordinate system [u,v] which is
implied by the horizontal laser scanner. In a flat envi-
ronment, a 2D pose of the truck and its local coordinate
system can be entirely described by the two coordinates
x, y and the yaw orientation angle θ;inthis case, the
u-v coordinate system is assumed to be parallel to the
x-y plane as shown in Fig. 1.
Since horizontal scans are taken continuously dur-
ing driving and hence overlap substantially, the relative
pose between the two capture positions can be deter-
mined by matching their corresponding laser scans,
as shown in Fig. 3. We assume that the two scans
have maximum congruence for the particular transla-
tion
t = (u,v) and rotation ϕ, which exactly
compensate for the motion between the two acquisition
times. In practice however, scans taken from different
positions do not match perfectly because of different
sample density on objects, occlusions, and measure-
ment noise. We will address the first issue by linear in-
terpolation between scan points, and the two others by
utilizing robust least squares as an outlier-tolerant way
to find the pose with the smallest possible discrepancy.
Taking one scan as reference scan, we maximize
Q = f (u,v,ϕ), which computes the quality
of alignment as a function of a given displacement
u,v and rotation ϕ of the scans against each
other. More specifically, we perform the following
steps: First, we compute a set of lines l
i
from the refer-
ence scan by connecting adjacent scan points provided
their discontinuity is below a threshold; this results in a
line strip approximation that may also contain isolated
points as infinitely short lines. In this fashion, the ref-
erence scan is transformed into an edge map to which
the second scan can be registered.
Given a translation vector
t = (u,v)anda2×
2 rotation matrix R(ϕ) with rotation angle ϕ,we

An Automated Method for Large-Scale, Ground-Based City Model Acquisition 9
transform the points p
j
of the second scan to the points
p
j
according to
p
j
= (u,v,ϕ) = R(ϕ) ·p
j
+
t (1)
Then, for each point p
j
,wecompute the Euclidean
distance d( p
j
, l
i
)toeach line segment l
i
and set d
min
to:
d
min
( p
j
(u,v,ϕ)) = min
i
{d( p
j
, l
i
)}. (2)
Intuitively, d
min
is the distance between p
j
and the clos-
est point on any of the lines in the reference scan.
Distance measurement noise of the scanners can be
approximately modeled as Gaussian, but there are ad-
ditionally extreme errors due to occlusion effects or
multi-reflections, which prohibit using the sum of dis-
tance squares as an error function. Thus, in order to
suppress erroneous point-to-line correspondences, we
use robust least squares (Triggs et al., 2000) and com-
pute Q as follows:
Q(u,v,ϕ)
=
j
exp
d
min
( p
j
(u,v,ϕ))
2
2 · σ
2
s
(3)
where σ
2
s
is the variance of the laser distance mea-
surement, specified by the manufacturer. This equation
takes into account the distribution of the distance mea-
surement values, while suppressing deviations beyond
this distribution as outliers. It has least squares-like be-
havior in the near range, but does not take into account
points that are far away from any line. Thus, only the
‘good’ scan points contribute to this quality function,
and we do not have to eliminate outliers prior to the
matching. The block diagram of this quality computa-
tion is shown in Fig. 4.
Figure 4. Block diagram of quality computation.
The parameters (u,v,ϕ) for the best match
between a scan pair are found by optimizing Q. Steep-
est decent search methods have the advantage of find-
ing the minimum fast, but due to noise and erroneous
point-to-line assignments, they can become trapped in
local minima if not started from a “good” initial point.
Therefore, we use a combined method of sampling the
parameter space and discrete steepest decent, where
we first sample the parameter space in coarse steps and
then refine the search around the minimum by steep-
est decent. Hence, we obtain a relative pose estimate
(u,v,ϕ) between two scans, to which we refer
in the following as a “step”. For a series of successive
scans indexed by k,wecan compute a series of steps
(u
k
,v
k
,ϕ
k
), denoting the relative pose between
scan k and scan k + 1.
To reconstruct the driven 2D path, we start with an
initial position (x
0
, y
0
0
), perform a scan match for
each step k, and concatenate the steps (u
k
,v
k
,ϕ
k
)
to form a path. For non-flat areas, this 2D computation
can result in an apparent source of error in length: The
scan-to-scan matching estimates for the 3-DOF relative
motion, i.e. 2D translation and rotation, are given in
the scanner’s local scanning plane. If the vehicle is
on a slope, this local coordinate system is tilted at an
angle towards the global (x, y) plane, and hence the
translation should strictly speaking be corrected with
the cosine of the pitch angle. Fortunately, the stretching
effect is small, and the relative length error is given by:
l
err
l
= 1 cos(pitch) 1 (1 pitch
2
) = pitch
2
(4)
While for an impressive 10% slope, the relative error
is 0.5%, for a moderate 2% slope, it only amounts to
0.06%. Thus, it turns out that this error is easily within
the correction capability of our global localization in-
troduced in the next section. Hence, we utilize the rel-
ative estimates from the scan-to-scan matching as if
they were given parallel to the ground plane, and con-
catenate the steps (u
k
,v
k
,ϕ
k
), so that the next
global pose (x
k+1
, y
k+1
k+1
)ofthe path is computed
iteratively as follows:
x
k+1
= x
k
+ u
k
· cos(θ
k
) v
k
· sin(θ
k
)
y
k
+1
= y
k
+ u
k
· sin(θ
k
) + v
k
· cos(θ
k
) (5)
θ
k+1
= θ
k
+ ϕ
k
As the frequency of horizontal scans is 75 Hz, assum-
ing the maximum city driving speed of 25 miles per

Citations
More filters
Proceedings ArticleDOI
01 Sep 2009
TL;DR: A system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city on Internet photo sharing sites and is designed to scale gracefully with both the size of the problem and the amount of available computation.
Abstract: We present a system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo sharing sites. Our system uses a collection of novel parallel distributed matching and reconstruction algorithms, designed to maximize parallelism at each stage in the pipeline and minimize serialization bottlenecks. It is designed to scale gracefully with both the size of the problem and the amount of available computation. We have experimented with a variety of alternative algorithms at each stage of the pipeline and report on which ones work best in a parallel computing environment. Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores.

1,454 citations


Cites methods from "An Automated Method for Large-Scale..."

  • ...City-scale 3D reconstruction has been explored previously in the computer vision literature [12, 2, 6 , 21] and is now widely deployed e.g., in Google Earth and Microsoft’s Virtual Earth....

    [...]

Journal ArticleDOI
TL;DR: A system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city on Internet photo sharing sites and is designed to scale gracefully with both the size of the problem and the amount of available computation.
Abstract: We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and 3D reconstruction, designed to maximize parallelism at each stage of the pipeline and to scale gracefully with both the size of the problem and the amount of available computation. Our experimental results demonstrate that it is now possible to reconstruct city-scale image collections with more than a hundred thousand images in less than a day.

1,307 citations


Additional excerpts

  • ...edu City-scale 3D reconstruction has been explored previously in the computer vision literature [12, 2, 6, 21] and is now widely deployed e....

    [...]

Journal ArticleDOI
TL;DR: A system for automatic, geo-registered, real-time 3D reconstruction from video of urban scenes that extends existing algorithms to meet the robustness and variability necessary to operate out of the lab and shows results on real video sequences comprising hundreds of thousands of frames.
Abstract: The paper presents a system for automatic, geo-registered, real-time 3D reconstruction from video of urban scenes. The system collects video streams, as well as GPS and inertia measurements in order to place the reconstructed models in geo-registered coordinates. It is designed using current state of the art real-time modules for all processing steps. It employs commodity graphics hardware and standard CPU's to achieve real-time performance. We present the main considerations in designing the system and the steps of the processing pipeline. Our system extends existing algorithms to meet the robustness and variability necessary to operate out of the lab. To account for the large dynamic range of outdoor videos the processing pipeline estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in stereo estimation without impacting the real-time performance. The required accuracy for many applications is achieved with a two-step stereo reconstruction process exploiting the redundancy across frames. We show results on real video sequences comprising hundreds of thousands of frames.

846 citations

Journal ArticleDOI
TL;DR: The goal is to provide a survey that will help researchers to better position their own work in the context of existing solutions, and to help newcomers and practitioners in computer graphics to quickly gain an overview of this vast field.
Abstract: This paper provides a comprehensive overview of urban reconstruction. While there exists a considerable body of literature, this topic is still under active research. The work reviewed in this survey stems from the following three research communities: computer graphics, computer vision and photogrammetry and remote sensing. Our goal is to provide a survey that will help researchers to better position their own work in the context of existing solutions, and to help newcomers and practitioners in computer graphics to quickly gain an overview of this vast field. Further, we would like to bring the mentioned research communities to even more interdisciplinary work, since the reconstruction problem itself is by far not solved.

445 citations

Proceedings ArticleDOI
01 Oct 2006
TL;DR: This paper proposes a new representation denoted as multi-level surface maps (MLS maps) which allows to store multiple surfaces in each cell of the grid and is well-suited for representing large-scale outdoor environments.
Abstract: To operate outdoors or on non-flat surfaces, mobile robots need appropriate data structures that provide a compact representation of the environment and at the same time support important tasks such as path planning and localization. One such representation that has been frequently used in the past are elevation maps which store in each cell of a discrete grid the height of the surface in the corresponding area. Whereas elevation maps provide a compact representation, they lack the ability to represent vertical structures or even multiple levels. In this paper, we propose a new representation denoted as multi-level surface maps (MLS maps). Our approach allows to store multiple surfaces in each cell of the grid. This enables a mobile robot to model environments with structures like bridges, underpasses, buildings or mines. Additionally, they allow to represent vertical structures. Throughout this paper we present algorithms for updating these maps based on sensory input, to match maps calculated from two different scans, and to solve the loop-closing problem given such maps. Experiments carried out with a real robot in an outdoor environment demonstrate that our approach is well-suited for representing large-scale outdoor environments.

381 citations


Cites background from "An Automated Method for Large-Scale..."

  • ...[5] C. Früh and A. Zakhor....

    [...]

  • ...Früh and Zakhor [5] apply a similar idea to the problem of learning large-scale models of outdoor environments....

    [...]

  • ...Früh and Zakhor [5] apply a similar idea to the problem of learning large-scale models of outdoor environments....

    [...]

References
More filters
Journal ArticleDOI
Paul J. Besl1, H.D. McKay1
TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.
Abstract: The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces. >

17,598 citations

Book ChapterDOI
21 Sep 1999
TL;DR: A survey of the theory and methods of photogrammetric bundle adjustment can be found in this article, with a focus on general robust cost functions rather than restricting attention to traditional nonlinear least squares.
Abstract: This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and robustness; numerical optimization including sparse Newton methods, linearly convergent approximations, updating and recursive methods; gauge (datum) invariance; and quality control. The theory is developed for general robust cost functions rather than restricting attention to traditional nonlinear least squares.

3,521 citations

Proceedings ArticleDOI
01 Aug 1996
TL;DR: This work presents a new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs, which combines both geometry-based and imagebased techniques, and presents view-dependent texture mapping, a method of compositing multiple views of a scene that better simulates geometric detail on basic models.
Abstract: We present a new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs. Our modeling approach, which combines both geometry-based and imagebased techniques, has two components. The first component is a photogrammetricmodeling method which facilitates the recovery of the basic geometry of the photographed scene. Our photogrammetric modeling approach is effective, convenient, and robust because it exploits the constraints that are characteristic of architectural scenes. The second component is a model-based stereo algorithm, which recovers how the real scene deviates from the basic model. By making use of the model, our stereo technique robustly recovers accurate depth from widely-spaced image pairs. Consequently, our approach can model large architectural environments with far fewer photographs than current image-based modeling approaches. For producing renderings, we present view-dependent texture mapping, a method of compositing multiple views of a scene that better simulates geometric detail on basic models. Our approach can be used to recover models for use in either geometry-based or image-based rendering systems. We present results that demonstrate our approach’s ability to create realistic renderings of architectural scenes from viewpoints far from the original photographs. CR Descriptors: I.2.10 [Artificial Intelligence]: Vision and Scene Understanding Modeling and recovery of physical attributes; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism Color, shading, shadowing, and texture I.4.8 [Image Processing]: Scene Analysis Stereo; J.6 [Computer-Aided Engineering]: Computer-aided design (CAD).

2,159 citations


"An Automated Method for Large-Scale..." refers background in this paper

  • ...Debevec et al. (1996) proposes to reconstruct buildings based on few camera images in a semiautomated way....

    [...]

Journal ArticleDOI
TL;DR: A more robust algorithm is developed called MixtureMCL, which integrates two complimentary ways of generating samples in the estimation of Monte Carlo Localization algorithms, and is applied to mobile robots equipped with range finders.

1,945 citations


"An Automated Method for Large-Scale..." refers background in this paper

  • ...In MCL, also known as particle filtering or as condensation algorithm, a large number of random samples, or particles, is utilized to represent probability distributions (Fox et al., 2000; Thrun et al., 2001)....

    [...]

Journal ArticleDOI
TL;DR: The problem of consistent registration of multiple frames of measurements (range scans), together with therelated issues of representation and manipulation of spatialuncertainties are studied, to maintain all the local frames of data as well as the relative spatial relationships between localframes.
Abstract: A robot exploring an unknown environment may need to build a world model from sensor measurements. In order to integrate all the frames of sensor data, it is essential to align the data properly. An incremental approach has been typically used in the past, in which each local frame of data is aligned to a cumulative global model, and then merged to the model. Because different parts of the model are updated independently while there are errors in the registration, such an approach may result in an inconsistent model. In this paper, we study the problem of consistent registration of multiple frames of measurements (range scans), together with the related issues of representation and manipulation of spatial uncertainties. Our approach is to maintain all the local frames of data as well as the relative spatial relationships between local frames. These spatial relationships are modeled as random variables and are derived from matching pairwise scans or from odometry. Then we formulate a procedure based on the maximum likelihood criterion to optimally combine all the spatial relations. Consistency is achieved by using all the spatial relations as constraints to solve for the data frame poses simultaneously. Experiments with both simulated and real data will be presented.

1,452 citations


"An Automated Method for Large-Scale..." refers background in this paper

  • ...Lu and Milios (1997b), Thrun et al. (1998b) and Gutmann and Konolige (1999) have investigated simultaneous map building and localization in indoor environments by establishing cross-consistency over multiple 2D laser scans, without the use of a global map....

    [...]

  • ...In principle, it is possible to extend the consistent pose estimation idea of Lu and Milios (1997b) by the additional constraint that the resulting global pose must be within the range of Sk ....

    [...]

Frequently Asked Questions (17)
Q1. What are the contributions mentioned in the paper "An automated method for large-scale, ground-based city model acquisition" ?

In this paper, the authors describe an automated method for fast, ground-based acquisition of large-scale 3D city models. 

If desired, it is straightforward to extend their approach to rural areas by including GPS information during the MCL perception phase. Furthermore, with their truck-based system, the authors are only capable of driving on roads ; hence the facades on the backsides of buildings can not be captured at all. For downtown areas which can be separated into a facade and a foreground layer, the authors have proposed processing algorithms for removing cluttered foreground objects and completing occlusion holes in order to obtain visually pleasing facade models ( Früh and Zakhor, 2002 ). 

Another advantage of using an aerial photo or a DSM over GPS is that the airborne data can potentially be used to derive 3D models of a city from abird’s eye view, which can then be merged with the 3D facade models obtained from ground level laser scans. 

Three-dimensional models of urban environments are used in applications such as urban planning, virtual reality, and propagation simulation of radio waves for the cell phone industry. 

Taking one scan as reference scan, the authors maximize Q = f ( u, v, ϕ), which computes the quality of alignment as a function of a given displacement u, v and rotation ϕ of the scans against each other. 

camera and laser scanners are synchronized by trigger signals and are mounted in a rigid configuration on the sensor platform. 

1.To reconstruct the driven 2D path, the authors start with an initial position (x0, y0, θ0), perform a scan match for each step k, and concatenate the steps ( uk, vk, ϕk) to form a path. 

Since both photos and DSM can provide a geometrically correct view of an entire city, it is conceivable to use them as a global map in order to arrive at global position without use of GPS devices. 

The authors have demonstrated that scan matching and MCL techniques in conjunction with an aerial photo or a DSM as global map are capable of accurately localizing their acquisition vehicle in complex urban environments. 

1.Since horizontal scans are taken continuously during driving and hence overlap substantially, the relative pose between the two capture positions can be determined by matching their corresponding laser scans, as shown in Fig. 

with their truck-based system, the authors are only capable of driving on roads; hence the facades on the backsides of buildings cannot be captured at all. 

Especially for the edge map from the aerial photo, it is reasonable to use the registered digital roadmap in order to restrict positions of the particles to within a fewmeter-wide strip around roads. 

in order to suppress erroneous point-to-line correspondences, the authors use robust least squares (Triggs et al., 2000) and compute Q as follows:Q( u, v, ϕ)= ∑jexp ( −dmin( p ′ j ( u, v, ϕ)) 22 · σ 2s) (3)where σ 2s is the variance of the laser distance measurement, specified by the manufacturer. 

As such, the importance factor of each particle is used in the resampling phase to compute the set Sk+1 from set Sk in the following way: A given particle in set Sk is passed along to set Sk+1 with probability proportional to its importance factor. 

Due to the fact that the authors have used the same global map for correcting all parts of the path, it is not necessary to explicitly impose the constraint of cross consistency across all scan points; this justifies their computationally simple approach of correcting the relative path estimates with the particle sets Sk . 

The authors will address the first issue by linear interpolation between scan points, and the two others by utilizing robust least squares as an outlier-tolerant way to find the pose with the smallest possible discrepancy. 

For the same scan, different global position parameters (x, y, θ ) yield different coefficients c(x, y, θ ); the largest coefficient is obtained for the parameter set with the best match between scan and edge map.