International Journal of Computer Vision
. 2
. 51-76 (1988
)
© 1988 Kluwer Academic Publishers
. Manufactured in The Netherland
s
Direct Methods for Recovering Motion
*
BERTHOLD K
.P
. HORN AND E
.J
. WELDON JR
.
Department of Electrical Engineering, University of Hawaii at Manoa,
Honolulu, Hawaii
9682
2
Abstrac
t
We have developed direct methods for recovering the motion of an observer in a static environment in th
e
case of pure rotation, pure translation, and arbitrary motion when the rotation is known
. Some of thes
e
methods are based on the minimization of the difference between the observed time derivative of bright
-
ness and that predicted from the spatial brightness gradient, given the estimated motion
. We minimiz
e
the square of the integral of this difference taken over the image region of interest
. Other methods presen
-
ted here exploit the fact that surfaces have to be in front of the observer in order to be seen
.
We do not establish point correspondences, nor do we estimate the optical flow
. We use only first-orde
r
derivatives of the image brightness, and we do not assume an analytic form for the surface
. We show tha
t
the field of view should be large to accurately recover the components of motion in the direction towar
d
the image region
. We also demonstrate the importance of points where the time derivative of brightness i
s
small and discuss difficulties resulting from very large depth ranges
. We emphasize the need for adequat
e
filtering of the image data before sampling to avoid aliasing, in both the spatial and tempora
l
dimensions
.
I
. Introductio
n
In this paper we consider the problem of deter
-
mining the motion of a monocular observer mov-
ing with respect to a
rigid
.
unknown world
. W
e
use a least-squares, as opposed to a discrete
,
method of solving for the motion parameters
; ou
r
method uses all of the points in a two-image se-
quence and does not attempt to establish corres-
pondence between the images
. Hence the metho
d
is relatively robust to quantization error, noise, il
-
lumination gradients, and other effects
.
So far, we can determine the observer motion i
n
two special cases
:
*This research was supported by the National Science Foun
-
dation under Grant No
. DMC85-11966
. Additional suppor
t
was provided by NASA (Grant No
. GSFC 5-1162) and by th
e
Veteran's Administration
.
$BKPH on leave from the Department of Electrical En-
gineering and Computer Science, Massachusetts institute o
f
Technology
. Cambridge, Massachusetts 02139
.
• when the motion is pure rotation
,
• when the motion is pure translation or whe
n
the rotational component of the motion i
s
known
.
At this writing we have not developed a direc
t
method that is applicable to arbitrary motion
.
1
.1 Earlier Wor
k
In the continuous or least-squares approach t
o
motion vision, motion parameters are found tha
t
are consistent with the observed motion of the en
-
tire image
. Bruss and Horn [ 1 ] use this approac
h
to calculate motion parameters assuming that th
e
optical flow is known at each point
. Adiv [2] use
s
the approach of Bruss and Horn to segment th
e
scene into independently moving planar objects
:
he shows that given the optical flow, segmenta-
tion can be performed and the motion calculated
.
Nagahdaripour and Horn 13] eschew the use o
f
optical flow and calculate the observer's motion
52
Horn
and Weldo
n
directly from the spatial and temporal derivative
s
of the image brightness, assuming a planar world
.
The advantage of this direct approach
. which w
e
also use here, is that certain computational dif-
ficulties inherent in the calculation of optical flo
w
are avoided
. In particular
. it is not necessary t
o
make the usual assumption that the optical flo
w
field is smooth
: an assumption that is violate
d
near object boundaries, necessitating flo
w
segmentation
.
Waxman and Ullman [41 and Waxman an
d
Wohn
[51
also avoid the discrete approach to mo-
tion vision
: their techniques make use of first an
d
second derivatives of the optical flow to comput
e
both the motion parameters and the structure o
f
the imaged world
. In the interests of developin
g
methods that can be implemented, the technique
s
presented in this paper avoid the use of second
-
and higher-order derivatives
.
1
.2 Summary of tfae Pape
r
One of our approaches to the motion vision prob-
lem can be summarized as follows
: Given the ob-
server motion and the spatial brightness functio
n
of the image one can predict the time derivative o
f
brightness at each point in the image
. We find th
e
motion that minimizes the integral of the squar
e
of the difference between this predicted value an
d
the observed time derivative
. The integral is take
n
over the image region of interest, which
. in th
e
discussion here, is usually taken to be th
e
whole image
.
We use auxiliary vectors derived from th
e
derivatives of brightness and the image positio
n
that occur in the basic brightness change con-
straint equation
. Study of the distribution of the
directions of these vectors on the unit sphere sug-
gests specific algorithms and also helps uncove
r
relationships between accuracy and parameter
s
of the imaging situation
.
We have developed a simple robust algorith
m
for recovering the angular velocity vector in th
e
case of pure rotation
. This algorithm involve
s
solving three linear equations in the three un-
known components of the rotation vector
. Th
e
coefficients of the equations are moments ofcom-
ponents of one of the axuiliary vectors over the
given image region
. We show that the accuracy o
f
the recovered component of rotation about th
e
direction toward the image region is poor relativ
e
to the other components
. unless the image regio
n
subtends a substantial solid angle
.
We have developed several algorithms fo
r
recovering the translational velocity in the case o
f
pure translation
. These algorithms exploit th
e
constraint that objects have to he in front of th
e
camera in order to be imaged
. This constrain
t
leads to a nonlinear constrained optimizatio
n
problem
. The performance of these algorithm
s
depends on a number of factors including
:
•
the angle subtended by the image
. i
.e
., the fiel
d
of view
,
•
the direction of motion relative to the optica
l
axis
,
•
the depth range
.
•
the distribution of brightness gradients
.
•
the noise in the estimated time derivative o
f
brightness
,
•
the noise in the estimated spatial gradient o
f
brightness, an
d
•
the number of picture cells considered
.
We have not yet been able to select a "best
"
algorithm from the set developed, since one ma
y
be more accurate under one set of circumstance
s
while another is better in a different situation
.
Also, the better algorithms tend to require mor
e
computation, and some do not lend themselves t
o
parallel implementation
. Further study using rea
l
image data will be needed to determine the rang
e
of applicability of each algorithm
.
We found a strong dependence of the accurac
y
of recovery of certain components of the motio
n
on the size of the field of view
. This is in concer
t
with other reports describing difficulties wit
h
small fields of view, such as references [?J an
d
[5J
.
1
.3
Comments on Sampling
. Filtering
. and Aliasin
g
Work with real image data has demonstrated th
e
need to take care in filtering and sampling
. Th
e
estimates of spatial gradient and time derivative
s
are sensitive to aliasing effects resulting from in
-
adequate low-pass filtering before sampling
. This
Direct Methods for Recovering Motion
5
3
A
is easily overlooked, particularly in the tim
e
direction
. It is usually a mistake, for example, t
o
simply pick every nth frame out of an image se-
quence
. At the very least, )i consecutive frame
s
should be averaged hefore sampling in order t
o
reduce the high-frequency components
. One ma
y
object to the "smearing" introduced by this tech
-
nique, but a series of widely separated snapshot
s
typically do not obey the conditions of the sam
-
pling theorem, and as a result the estimates of th
e
derivatives may contain large errors
.
This, ofcourse, is nothing new
. since the sam
e
considerations apply when one tries to estimat
e
the optical flow using first derivatives of imag
e
brightness (Horse and Schunck
161)
.
It is impor-
tant to remember that the filtering must he ap-
plied heforc sampling-once the data has bee
n
sampled, the damage has been done
.
2 The Brightness-Change Constraint Equatio
n
Following Longuet-Higgins and Prazdny
171
an
d
Bruss and Horn [ll we use a viewer-based coor-
dinate system. Figure 1 depicts
the system
unde
r
consideration
. A world poin
t
R
=
(X, Y,Z)
r
(1
)
is imaged a
t
r = (x
.y,l)'
(2
)
Fig
. I
.
The viewer-centered coordinate system
. The trans-
lational velocity of the camera is
t
=
(U
.KW)
r
.
while th
e
rotational component is m =
(A,8,C)
r
.
That is, the image plane has equation Z = 1
. Th
e
origin is at the projection center and the Z-axi
s
runs along the
optical
axis
. The X- and Y-axes ar
e
parllel to the x- and y-axes of the image plane
.
Image coordinates are measured relative to th
e
principal point, the point
(0,0,1)
T
where the opti
-
cal axis pierces the image plane
. The points r an
d
R are related by the perspective projectio
n
equatio
n
r=
(x
.y
.l)r
X
Y
Z
R
R
i
(3
)
wit
h
Z =
R
• i (4
)
and where i denotes the unit vector in the
Z
direction
,
Suppose the observer moves with instan-
taneous translational velocity t =
(U, V,
W)
T
an
d
instantaneous rotational velocity co =
(A
.B
.C)
T
relative to a fixed environment, then the tim
e
derivative of the vector R can be written a
s
R,=-t-u)XR
(5
)
The motion of the world point R results in motio
n
of the corresponding image point
: the value o
f
this
motion field
is giver,
b
y
dr
_
d
(
R
)
r`
dt - dt
R -
il
=
RJR
•
i}
-
(R,
•
i)R
(
6
)
(R
.
z)
'
This can also be expressed a
s
i
x
(R,
X
r)
(7
)
r` -
R
i
since a X
(b X c)
=
(c • a)b - (a
•
b)c
.
Substitutin
g
equation (5) into othis result gives (see Negah-
daripour and Horn [31)
:
1
1
r,=-2XIrX
/
{rXw-
R
t
}
J
[8
}
`1
i
1
In component form this can be expressed as
54
Horn and Weldo
n
-U+xW+Axy-B(x'`+
1)+C
y
z
_
-17
+yW_
Bxy+A(y
2
+1)-C
x
Z
0
(9
)
a result first obtained by Longuet-Higgins an
d
Prazdny [7]
.
This shows how, given the world motion, th
e
motion field can be calculated for every imag
e
point
. If we assume that tha brightness of a smal
l
surface patch is not changed by motion, then ex
-
pansion of the total derivative of brightness
E
leads t
o
aE
dx +
aE
dy
+
aE
= 0
(10
)
Ox dr
ay
dt
a
t
(The applicability of the constant brightnes
s
assumption is discussed in Appendix A-) Denot
-
ing the vector
(Mlax,
aE143y,0)
r
by E,
and
Mlat
b
y
E„ permits us to express this result
morecompact-
ly in the for
m
E,•r,+E,=
0 (11
)
Substituting equation (8) into this result an
d
rearranging give
s
E
r
-][(E,Xi)Xr]Xr]•
m
+
[(E,
x
Z)
X
rL
t
= 0
(12
)
R
2
To simplify this expression we le
t
s=(E,X1)Xr
(13
)
an
d
v=-sXr
(14
)
so equation (12) reduces to the
brightness chang
e
constraint equation
of Negahdaripour and Hor
n
[3], namel
y
v -
w +
s
t
= R i -E, (15
)
The vectors s and
v
can be expressed in compo-
nent form as
-E
x
s =
-E,
an
d
(XE
.+
yE
y
+E
l
.
+
y(xE
,r
+
yE
y
)
V
-
-
E
x
-
x(xE
x
+
yE
y
)
(16
)
Y
E
.Y -
xE,
.
Note that s • r = 0,
v
• r = 0 and s • v = 0
. Thes
e
three vectors thus form an orthogonal triad
. Th
e
vectors s and
v
are inherent properties of th
e
image
. Note that the projection ofs into the imag
e
plane is just the (negative) gradient of the image
.
Also, the quantity s indicates the directions i
n
which translation of a given magnitude will con
-
tribute maximally to the temporal brightnes
s
change of a given
picture cell
.
The quantity
v
plays a similar role for rotation
.
3 Solving the Brightness Chang
e
Constraint Equatio
n
Equation (15) relates observer motion (t,o), th
e
depth of the world R • z = Z(x,y) and certai
n
measurable quantities of the image (s,v)
.
I
n
general, it is not possible to solve for the first tw
o
of these given the last
. Some interesting specia
l
cases are addressed in
this paper
and in Negah
-
daripour and Horn [3]
; these are
:
'
i.
Known depth
: In section 3
.1 we show tha
t
given Z, s, and
v,
the quantities, t and
(D
can b
e
calculated in closed form using a least
-
squares method
.
ii.
Pure rotation
(11
t
11
= 0)
: In section 3
.2 w
e
show that given
v,
the rotation vector
W
can b
e
calculated in closed form
.
iii.
Pure translation or known rotation
: In sec-
tion 3
.3 we present a least-squares method fo
r
determining t
. Once t is known, the brightnes
s
change constraint equation can be used t
o
'We do not discuss here related methods using optical flow
,
such as those of Bross and Horn (l]
.
find the depth at each picture cell
:
Z =
R
z=-
s
t
E, +
v
w
iv.
Planar world: Negahdaripour and Horn 13
]
present a closed-form solution fort, to, and th
e
normal
n
of the world plane
.
v.
Quadratic patches
: Negahdaripour [8] gives
a
closed-form solution in the case that a portio
n
of the world can be represented as a quadrati
c
form
.
In this paper we consider various integrals ove
r
an image region thought to correspond to a singl
e
rigid object in motion relative to the viewer
. In th
e
simplest case, the observer is moving relative to
a
static environment and the whole image can b
e
used
. The size of the field of view has a strong ef
-
fect on the accuracy of the determination of th
e
components of motion along the optical axis
.
When we need to estimate this accuracy, we will
,
for convenience, assume a circular image o
f
radius r,,
. This corresponds to a conical field o
f
view with half angle 0,,, where r, = tan 0,,, since w
e
have assumed that the focal length equals one
.
(We assume that 0 < 0, < n/2)
.
We will show that the field of view should b
e
large
. Although orthographic projection usuall
y
simplifies machine vision problems, this is on
e
case in which we welcome the effects of perspec
-
tive
"
distortion"
!
3
.1 Depth Know
n
When depth is known, it is straightforward t
o
recover the motion
. (Depth may have been ob-
tained using a binocular stereo system or som
e
kind of range finder
.) We cannot, in general, fin
d
a motion to satisfy the brightness change con-
straint equation at every picture cell, because o
f
noise in the measurements
. Instead we minimiz
e
jj[E+
v
w+(1/Z)s
t]
2
dxdy
(18
)
Differentiating with respect to to and t and settin
g
the results equal to zero leads to the pair o
f
vector equations
:
Direct Methods for Recovering Motion
5
5
[IIuIz)
2
ssTdxdy]
t
+
[JJ(1/z)svTdxdy]
w
= -
f f
E,(11Z)sdxd
y
[fJ(I/zvs
T
dxdY]t
(19
)
+ [JJvv
T
dxd
Y
]
w
-
f f
E,v
dxd
y
This is a set of six linear equations in six un-
knowns with a symmetric coefficient matrix
. (Th
e
equations can be solved by partitioning in orde
r
to reduce the computational effort
.) The coef-
ficients are all integrals of products of com-
ponents of (1/Z)s and v
. It may be useful t
o
note tha
t
trace(sv
T
) = trace(vs
r
) = s •
v
= 0
(20)
-
We could have obtained slightly different equa-
tions for to and t if we had chosen to weight the in
-
tegrand in equation (18) differently
. We study th
e
special case in which
IItll
= 0 and the special cas
e
in
which
llwll
= 0 later
.
One application of the above result is t
o
"
dynamic stereo." A binocular stereo system ca
n
provide disparity estimates from which 1/Z ca
n
be calculated
.
The above equations can then b
e
used to solve for the motion, provided estimates o
f
the derivatives of image brightness are also sup
-
plied
. The correspondence problem of binocula
r
stereo has, unfortunately, been found to be a dif-
ficult one
. It would represents the major com-
putational burden in a dynamic stereo system
.
We hope that motion vision research will even-
tually
lead to simpler methods for recoverin
g
depth than those used for binocular stereo
-
although they are likely to be relatively inaccurat
e
when based only on instantaneous translationa
l
and rotational velocity estimates
.
(17)