scispace - formally typeset
Open AccessJournal ArticleDOI

Kalman Filter-based Algorithms for Estimating Depth from Image Sequences

TLDR
A new, pixel-based (iconic) algorithm that estimates depth and depth uncertainty at each pixel and incrementally refines these estimates over time and can serve as a useful and general framework for low-level dynamic vision.
Abstract
Using known camera motion to estimate depth from image sequences is an important problem in robot vision. Many applications of depth-from-motion, including navigation and manipulation, require algorithms that can estimate depth in an on-line, incremental fashion. This requires a representation that records the uncertainty in depth estimates and a mechanism that integrates new measurements with existing depth estimates to reduce the uncertainty over time. Kalman filtering provides this mechanism. Previous applications of Kalman filtering to depth-from-motion have been limited to estimating depth at the location of a sparse set of features. In this paper, we introduce a new, pixel-based (iconic) algorithm that estimates depth and depth uncertainty at each pixel and incrementally refines these estimates over time. We describe the algorithm and contrast its formulation and performance to that of a feature-based Kalman filtering algorithm. We compare the performance of the two approaches by analyzing their theoretical convergence rates, by conducting quantitative experiments with images of a flat poster, and by conducting qualitative experiments with images of a realistic outdoor-scene model. The results show that the new method is an effective way to extract depth from lateral camera translations. This approach can be extended to incorporate general motion and to integrate other sources of information, such as stereo. The algorithms we have developed, which combine Kalman filtering with iconic descriptions of depth, therefore can serve as a useful and general framework for low-level dynamic vision.

read more

Content maybe subject to copyright    Report

International Journal
of
Computer Vision.
3.
209-236
(1989)
'3
1989
Kluwer Academic Publishers. Manufactured
in
The Netherlands.
I
Kalman Filter-based Algorithms for Estimating Depth from Image Sequences
LARRY MATTHIES AND TAKE0 KANADE
Department
of
Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213;
Schlumberger Palo Alto Research, 3340 Hillview Ave., Palo Alto, CA 94304
RICHARD SZELISKI
Digital Equipment Corporation,
1
Kendall Square, Building
700,
Cambridge, MA 02139
Abstract
Using known camera motion
to
estimate depth from image sequences is an important problem
in
robot vision. Many
applications
of
depth-from-motion, including navigation and manipulation, require algorithms that can estimate depth
in
an on-line, incremental fashion. This requires a representation that records the uncertainty in depth estimates and
a mechanism that integrates new measurements with existing depth estimates
to
reduce the uncertainty over time.
Kalman filtering provides this mechanism. Previous applications of Kalman filtering to depth-from-motion have been
limited
to
estimating depth at the location of a sparse set of features. In this paper, we introduce a new, pixel-based
(iconic) algcrithm that estimates depth and depth uncertainty at each pixel and incrementally refines these estimates
over time. We describe the algorithm and contrast its formulation and performance to that of a feature-based Kalman
filtering algorithm. We compare the performance
of
the two approaches by analyzing their theoretical convergence
rates, by conducting quantitative experiments with images of a flat poster, and by conducting qualitative experiments
with images of a realistic outdoor-scene model. The results show that the new method is an effective way to extract
depth from lateral camera translations. This approach can be extended
to
incorporate general motion and to integrate
other sources
of
information, such as stereo. The algorithms we have developed, which combine Kalman filtering with
iconic descriptions of depth, therefore can serve as a useful and general framework for low-level dynamic vision.
1
Introduction
Using known camera motion
to
estimate depth from
image sequences is important
in
many applications
of
computer vision
to
robot navigation and manipulation.
In
these applications, depth-from-motion can be used
by itself, as part of a multimodal sensing strategy,
or
as a way
to
guide stereo matching. Many applications
require a depth estimation algorithm that operates
in
an
on-line,
incremental fashion. To develop such an
algorithm, we require a depth representation that
in-
cludes not only the current depth estimate, but also an
estimate of the uncertainty
in
the current depth estimate.
Previous work
[3,
5,
9,
10,
16,
17.
251
has identified
Kalman filtering as a viable framework for this prob-
lem. because
it
incorporates representations of uncer-
tainty
and provides a mechanism for incrementally
reducing uncertainty over time. To date, applications
of
this framework have largely been restricted
to
estimating the positions of a sparse set of trackable
features. such as points
or
line segments. While this
is adequate for many robotics applications,
it
requires
reliable feature extraction and
it
fails
to
describe large
areas
of
the image. Another line of work has addressed
the problem
of
extracting dense displacement
or
depth
estimates from image sequences. However, these
previous approaches have either been restricted to two-
frame analysis
[l]
or
have used batch processing
of
the
image sequence, for example via spatiotemporal filter-
ing
[ll].
In this paper we introduce a new, pixed-based
(iconic) approach
to
incremental depth estimation and
compare
it
mathematically and experimentally
to
a
feature-based approach we developed previously
[16].
The new approach represents depth and depth variance
at every pixel and uses Kalman filtering
to
extrapolate
and update the pixel-based depth representation. The
algorithm uses correlation
to
measure the optical flow
and
to
estimate the variance
in
the flow, then uses the
known camera motion
to
convert the flow field into a
depth map. It then uses the Kalman filter
to
generate
an updated depth map from a weighted combination
of
the new measurements and the prior depth estimates.
Regularization
is
employed
to
smooth the depth map

210
Matthies, Kanade, Szeliski
and to fill in the underconstrained areas. The resulting
algorithm is parallel, uniform, and can take advantage
of mesh-connected
or
multiresolution (pyramidal) proc-
essing architectures.
The remainder of this paper is structured as follows.
In the next section, we give a brief review of Kalman
filtering and introduce our overall approach to Kalman
filtering of depth. Next, we review the equations of mo-
tion, present a simple camera model, and examine the
potential accuracy of the method by analyzing its sen-
sitivity to the direction of camera motion. We then
describe
our
new, pixel-based depth-from-motion algor-
ithm and review the formulation of the feature-based
algorithm. Next, we analyze the theoretical accuracy
of both methods, compare them both to the theoretical
accuracy of stereo matching, and verify this analysis
experimentally using images of a flat scene. We then
show the performance of both methods on images of
realistic outdoor scene models. In the final section, we
discuss the promise and the problems involved in ex-
tending the method to arbitrary motion. We also con-
clude that the ideas and results presented apply directly
to the much broader problem of integrating depth infor-
mation from multiple sources.
2
Estimation
Framework
The depth-from-motion algorithms described
in
this
paper use image sequences with small frame-to-frame
camera motion
[4].
Small motion minimizes the corres-
pondence problem between successive images, but sac-
rifices depth resolution because of the small baseline
between consecutive image pairs. This problem can be
overcome by integrating information over the course of
the image sequence. For many applications, it is desir-
able to process the images incrementally by generating
updated depth estimates after each new image is ac-
quired, instead of processing many images together
in
a batch. The incremental approach offers real-time
operation and requires less storage, since only the cur-
rent estimates of depth and depth uncertainty need to
be stored.
The Kalman filter is a powerful technique for doing
incremental, real-time estimation in dynamic systems. It
allows for the integration of information over time and
is robust with respect to both system and sensor noise.
In this section, we first present the notation and the
equations of the Kalman filter, along with a simple ex-
ample. We then sketch the application of this frame-
work to motion-sequence processing and discuss those
parts of the framework that are common to both the
iconic and the feature-based algorithms. The details
of these algorithms are given in sections
4
and
5,
respectively.
2.1.
Kalmun Filter
The Kalman filter is a Bayesian estimation technique
used
to track stochastic dynamic systems being observed
with noisy sensors. The filter is based on three separate
probabilistic models, as shown in table
1.
The first
model, the
system model,
describes
the
evolution over
time of the current state vector
u,.
The transition be-
tween states is characterized by the known transition
matrix
@,
and the addition of Gaussian noise with a
covariance
Q,.
The second model, the
measurement
(0;
sensor) model,
relates the measurement vector
d,
to the current state through a measurement matrix
HI
and the addition of Gaussian noise with a covariance
R,.
The third model, the
prior model,
describes the
knowledge about the system state
io
and its covariance
Po
before the first measurement is taken. The sensor
and process noise are assumed to be uncorrelated.
Table
1.
Kalman filter equations.
Models system model
UI
=
+I-~u,-I
+
v,.
'I,
-
N(O.Q,)
E[UJ
=
&,
COV[U,]
=
Po
measurement model
dl
=
Hp,
+
E,.(,
-
N(O.R,)
(other assumptions)
E[v,€:l
=
0
i,-
=
i+
Prediction phase state estimate extrapolation
1-1 1-1
state covariance extrapolation
prior model
P;=
+f-lf~l+~-i
+
Qf-l
Update phase state estimate update
;:=
it-
+
K,[d,
-
H,iC,-]
P:=
[I
-
K,H,]Pt-
K,
=
P;H:IH,P;H:R,)
state coviarance update
Kalman gain matrix

Kalman Filter-based Algorithms for Estimating Depth
from
Image
Sequences
2
11
-
-
100At
0
0 0
010
0
At
0
0
001
0
0
At
0
@,=
000
-p
0
0
0
0 0
0
0
-@
0
-gAf
000
0
0
-0
0
-
000
0
0
0
1
i
z
5
1
1000000
0100000
HI=
[
which maps the state
u
to
the measurement
d.
The
uncertainty
in
the sensed ball position can be modeled
by a
2x2
covariance matrix
R,.
Once the system. measurement, and prior models
have been specified (i.e., the upper third of table
l),
the Kalman filter algorithm follows from the formula-
tion
in
the lower two thirds
of
table
1.
The algorithm
operates
in
two phases: extrapolation (prediction) and
update (correction).
At
time
t,
the previous state and
covariance estimates,
GLl
and
PcI,
are extrapolated
to
predict the current state G;and covariance
PI-.
The
predicted covariance is used
to
compute the new Kal-
man gain matrix K, and the updated covariance matrix
P:
Finally, the measurement residual d,
-
H,;,-is
weighted by the gain matrix K, and added
to
the
predicted state
ul-
to
yield the updated state
u:.
A
block diagram for the Kalman filter is given
in
figure
1.
r-
1
;rJqZ+p
-
Fix.
1.
Kalnian tiller
hlock
diagram
2.2.
Application to Depth
from
Motion
To apply the Kalman filter estimation framework
to
the
depth-from-motion problem, we specialize each
of
the
three models (system, measurement, and prior) and
define the implementations
of
the extrapolation and up-
date stages. This section briefly previews how these
components are chosen for the two depth-from-motion
algorithms described in this paper. The details of the
implementation are left
to
sections
4
and
5.
The first step in designing a Kalman filter is
to
specify the elements
of
the state vector. The iconic
depth-from-motion algorithm estimates the depth at
each pixel in the current image,
so
the state vector
in
this case is the entire depth map.' Thus, the diagonal
elements of the state covariance matrix
PI
are the vari-
ances of the depth estimates at each pixel.
As
discussed
shortly, we implicitly use off-diagonal elements of the
inverse covariance matrix
Ptl
as part of the update
stage of the filter, but do not explicitly model them
anywhere in the algorithm because of the large size of
the matrix. For the feature-based approach. which
tracks edge elements through the image sequence. thc
state consists of a 3D position vector for each feature.
We model the
full
covariance matrix
of
each individual
feature, but treat separate features as independent.
The system model
in
both approaches is based on
the same motion equations (section 3.1), but the imple-
mentations of the extrapolation and update stages dif-
fer because of the differences
in
the underlying repre-
sentations.
For
the iconic method. the extrapolation
stage uses the depth map estimated for the current
frame, together with knowledge
of
the
camera motion.
to
predict the depth and depth variance at each pixel
in the next frame. Similarly, the update stage
uscs
measurements of depth at each pixel to update the depth
and variance estimates at each pixel. For the feature-
based method, the extrapolation stage predicts the posi-
tion
vector and covariance matrix of each feature for
the next image, then uses measurements of the image
coordinates of the feature
to
update the position vector
and the covariance matrix. Details
of
the measurement
models for each algorithm will be discussed later.
Finally, the prior model can be used
to
embed prior
knowledge about the scene. For the iconic method. for
example, smoothness constraints requiring nearby im-
age points
to
have similar disparity can be modeled eas-
ily by off-diagonal elements of thc inverse of the prior
covariance matrix
Po
[29].
Our algorithm incorporates
'Our
actual implementation
uses
inverse depth (called "disparity")
See section
4.

212
Matthies, Kanade, Szeliski
this knowledge
as
part
of
a
smoothing operation that
follows the state update stage. Similar concepts may be
applicable to modeling figural continuity [20,24] in the
edge-tracking approach, that is, the constraint that con-
nected edges must match connected edges; however,
we have
not
pursued this possibility.
3
Motion Equations and Camera Model
Our system and measurement models are based on the
equations relating scene depth and camera motion to
the induced image flow. In this section, we review these
equations for an idealized camera (focal length
=
1)
and show how
to
use a simple calibration model to
relate the idealized equations to real cameras. We also
derive an expression for the relative uncertainty in depth
estimates obtained from lateral versus forward camera
translation. This expression shows concretely the effects
of
camera motion on depth uncertainty and reinforces
the need for modeling the uncertainty
in
computed
depth.
3.1.
Equations
of
Motion
If the inter-frame camera motion is sufficiently small,
the resulting optical flow can be expressed to a good
approximation
in
terms of the instantaneous camera
velocity
[6,
13, 331. We will specify this
in
terms
of
a translational velocity
T
and an angular velocity
R.
In the camera coordinate frame (figure 2), the motion
of a 3D point
P
is described by the equation
Expanding this into components yields
dX/dt
=
-T,
-
R,.Z
+
R,Y
dY/dt
=
-T,
-
R,X
+
R,Z
[ll
Now, projecting
(X,
Y,
Z)
onto an ideal, unit focal length
image,
dZ/dt
=
-T,
-
R,Y
+
R,X
X
Y
Z
’=z
x=-
taking the derivatives of (x, y) with respect to time, and
substituting
in
from equation
(1)
leads to the familiar
equations of optical flow [33]:
These equations relate the depth
Z
of the point to the
camera motion
T,R
and the induced image displace-
ments or optical flow [Ax
d~]~.
We will use these
equations to measure depth, given the camera motion
and optical flow, and to predict the change
in
the depth
map between frames. Note that parameterizing
(2)
in
terms of the inverse depth
d
=
1/Z
makes the equa-
tions linear in the “depth” variable. Since this leads
to a simpler estimation formulation, we will use this
parameterization in the balance of the paper.
Fig.
2.
Camera
model.
CP
IS
the center
of
projection
3.2.
Camera Model
Relating the ideal flow equations to real measurements
requires a camera model. If optical distortions are
not
severe, a pin-hole camera model will suffice.
In
this
paper we adopt a model similar to that originated by
Sobel [n] (figure
2).
This model specifies the origin
(c,,c,)
of the image coordinate system and a pair of
scale- factors
(s.,, s,.)
that combine the focal length and
image aspect ratio. Denoting the actual image coordin-
ates with a subscript
a,
the projection onto the actual
image is summarized by the equation
-_
-
CP
Z

Kalman Filter-based Algorithms for Estimating Depth from Image Sequences
2
13
C
is known as the collimation matrix. Thus, the ideal
image coordinates
(x,y)
are related
to
the actual image
coordinates by
x,,
=
s,x
+
c,
yu
=
s,y
+
c,
Equations
in
the balance
of
the paper will primarily
use ideal image coordinates for clarity. These equations
can be re-expressed
in
terms
of
actual coordinates using
the transformations above.
3.3.
Sensitivity Analysis
Before describing
our
Kalman filter algorithms, we will
analyze the effect
of
different camera motions on the
uncertainty
in
depth estimates. Given specific descrip-
tions of real cameras and scenes, we can obtain bounds
on
the estimation accuracy
of
depth-from-motion algor-
ithms using perturbation
or
covariance analysis tech-
niques based on first-order Taylor expansions
[8].
For
example,
if
we solve the motion equations for the in-
verse depth d
in
terms of the optical flow, camera mo-
tion, and camera model,
d
=
F(Ax,Ay,
T,
R,
e,,
e,,
s,, s,)
[41
then the uncertainty
in
depth arising from uncertainty
in
flow, motion, and calibration can be expressed by
6d
=
Jf
6f
+
J,,,
6m
+
J,.
6c
where
JP
J,,,,
and
J,.
are the Jacobians
of
(4)
with
respect
to
the flow, motion, and calibration parameters,
respectively, and
6f;
6m,
and
6c
are perturbations
of
the respective parameters. We will use this methodology
to
draw some conclusions about the relative accuracy
of
depth estimates obtained from different classes
of
motion.
It
is well known that camera rotation provides
no
depth information. Furthermore, for a translating
camera, the accuracy
of
depth estimates increases with
increasing distance
of
image features from the focus
of
expunsion
(FOE),
the point
in
the image where the
translation vector
(T)
pierces the image. This implies
that the ‘best’ translations are parallel
to
the image plane
and that the ‘worst’ are forward along the camera axis.
We
will give a short derivation that demonstrates the
relative accuracy obtainable from forward and lateral
camera translation. The effects of measurement uncer-
tainty
on
depth-from-motion calculations is also exam-
ined
in
(261.
For
clarity, we consider only one-dimensional
flow
induced by translation along the
X
or
Z
axes. For an
ideal camera. lateral motion induces the
flow
whereas forward motion induces the flow
xT,
Axf
=
z
The inverse depth
(or
disparity)
in
each case is
161
[71
d
-
“f
f--
XT:
Therefore, perturbations of
6xl
and
6xf
in
the flow
measurements Axl and Axfyield the following pertur-
bations
in
the disparity estimates:
These equations give the error
in
the inverse depth as
a function
of
the error
in
the measured image displace-
ment, the amount
of
camera motion, and position
ot
the feature
in
the field of view. Since we are interested
in comparing forward and lateral motions, a good way
to
visualize these equations is
to
plot the relative depth
uncertainty, 6df/6d/. Assuming that the flow pertur-
bations
rSx/
and 6xfare equal, the relative uncertainty is
The image coordinate
x
indicates where the object ap-
pears
in
the field of view. Figure
3
shows that
x
equals
the tangent of the angle
0
between the object and the
camera axis. The formula for the relative uncertainty
is thus
191

Citations
More filters
Journal ArticleDOI

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Journal ArticleDOI

C ONDENSATION —Conditional Density Propagation forVisual Tracking

TL;DR: The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set.
Journal ArticleDOI

Object tracking: A survey

TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
Book

Computer Vision: Algorithms and Applications

TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Journal ArticleDOI

Shape and motion from image streams under orthography: a factorization method

TL;DR: In this paper, the singular value decomposition (SVDC) technique is used to factor the measurement matrix into two matrices which represent object shape and camera rotation respectively, and two of the three translation components are computed in a preprocessing stage.
References
More filters
Journal ArticleDOI

A Computational Approach to Edge Detection

TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Journal ArticleDOI

Determining optical flow

TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Book

Applied Optimal Estimation

Arthur Gelb
TL;DR: This is the first book on the optimal estimation that places its major emphasis on practical applications, treating the subject more from an engineering than a mathematical orientation, and the theory and practice of optimal estimation is presented.
BookDOI

Spacecraft attitude determination and control

TL;DR: In this paper, the first comprehensive presentation of data, theory, and practice in attitude analysis is presented, including orthographic globe projections to eliminate confusion in vector drawings and a presentation of new geometrical procedures for mission analysis and attitude accuracy studies which can eliminate many complex simulations.
Related Papers (5)