scispace - formally typeset
Open AccessJournal ArticleDOI

Direct methods for recovering motion

TLDR
Direct methods for recovering the motion of an observer in a static environment in the case of pure rotation, pure translation, and arbitrary motion when the rotation is known are developed.
Abstract
We have developed direct methods for recovering the motion of an observer in a static environment in the case of pure rotation, pure translation, and arbitrary motion when the rotation is known. Some of these methods are based on the minimization of the difference between the observed time derivative of brightness and that predicted from the spatial brightness gradient, given the estimated motion. We minimize the square of the integral of this difference taken over the image region of interest. Other methods presented here exploit the fact that surfaces have to be in front of the observer in order to be seen. We do not establish point correspondences, nor do we estimate the optical flow. We use only first-order derivatives of the image brightness, and we do not assume an analytic form for the surface. We show that the field of view should be large to accurately recover the components of motion in the direction toward the image region. We also demonstrate the importance of points where the time derivative of brightness is small and discuss difficulties resulting from very large depth ranges. We emphasize the need for adequate filtering of the image data before sampling to avoid aliasing, in both the spatial and temporal dimensions.

read more

Content maybe subject to copyright    Report

International Journal of Computer Vision
. 2
. 51-76 (1988
)
© 1988 Kluwer Academic Publishers
. Manufactured in The Netherland
s
Direct Methods for Recovering Motion
*
BERTHOLD K
.P
. HORN AND E
.J
. WELDON JR
.
Department of Electrical Engineering, University of Hawaii at Manoa,
Honolulu, Hawaii
9682
2
Abstrac
t
We have developed direct methods for recovering the motion of an observer in a static environment in th
e
case of pure rotation, pure translation, and arbitrary motion when the rotation is known
. Some of thes
e
methods are based on the minimization of the difference between the observed time derivative of bright
-
ness and that predicted from the spatial brightness gradient, given the estimated motion
. We minimiz
e
the square of the integral of this difference taken over the image region of interest
. Other methods presen
-
ted here exploit the fact that surfaces have to be in front of the observer in order to be seen
.
We do not establish point correspondences, nor do we estimate the optical flow
. We use only first-orde
r
derivatives of the image brightness, and we do not assume an analytic form for the surface
. We show tha
t
the field of view should be large to accurately recover the components of motion in the direction towar
d
the image region
. We also demonstrate the importance of points where the time derivative of brightness i
s
small and discuss difficulties resulting from very large depth ranges
. We emphasize the need for adequat
e
filtering of the image data before sampling to avoid aliasing, in both the spatial and tempora
l
dimensions
.
I
. Introductio
n
In this paper we consider the problem of deter
-
mining the motion of a monocular observer mov-
ing with respect to a
rigid
.
unknown world
. W
e
use a least-squares, as opposed to a discrete
,
method of solving for the motion parameters
; ou
r
method uses all of the points in a two-image se-
quence and does not attempt to establish corres-
pondence between the images
. Hence the metho
d
is relatively robust to quantization error, noise, il
-
lumination gradients, and other effects
.
So far, we can determine the observer motion i
n
two special cases
:
*This research was supported by the National Science Foun
-
dation under Grant No
. DMC85-11966
. Additional suppor
t
was provided by NASA (Grant No
. GSFC 5-1162) and by th
e
Veteran's Administration
.
$BKPH on leave from the Department of Electrical En-
gineering and Computer Science, Massachusetts institute o
f
Technology
. Cambridge, Massachusetts 02139
.
• when the motion is pure rotation
,
• when the motion is pure translation or whe
n
the rotational component of the motion i
s
known
.
At this writing we have not developed a direc
t
method that is applicable to arbitrary motion
.
1
.1 Earlier Wor
k
In the continuous or least-squares approach t
o
motion vision, motion parameters are found tha
t
are consistent with the observed motion of the en
-
tire image
. Bruss and Horn [ 1 ] use this approac
h
to calculate motion parameters assuming that th
e
optical flow is known at each point
. Adiv [2] use
s
the approach of Bruss and Horn to segment th
e
scene into independently moving planar objects
:
he shows that given the optical flow, segmenta-
tion can be performed and the motion calculated
.
Nagahdaripour and Horn 13] eschew the use o
f
optical flow and calculate the observer's motion

52
Horn
and Weldo
n
directly from the spatial and temporal derivative
s
of the image brightness, assuming a planar world
.
The advantage of this direct approach
. which w
e
also use here, is that certain computational dif-
ficulties inherent in the calculation of optical flo
w
are avoided
. In particular
. it is not necessary t
o
make the usual assumption that the optical flo
w
field is smooth
: an assumption that is violate
d
near object boundaries, necessitating flo
w
segmentation
.
Waxman and Ullman [41 and Waxman an
d
Wohn
[51
also avoid the discrete approach to mo-
tion vision
: their techniques make use of first an
d
second derivatives of the optical flow to comput
e
both the motion parameters and the structure o
f
the imaged world
. In the interests of developin
g
methods that can be implemented, the technique
s
presented in this paper avoid the use of second
-
and higher-order derivatives
.
1
.2 Summary of tfae Pape
r
One of our approaches to the motion vision prob-
lem can be summarized as follows
: Given the ob-
server motion and the spatial brightness functio
n
of the image one can predict the time derivative o
f
brightness at each point in the image
. We find th
e
motion that minimizes the integral of the squar
e
of the difference between this predicted value an
d
the observed time derivative
. The integral is take
n
over the image region of interest, which
. in th
e
discussion here, is usually taken to be th
e
whole image
.
We use auxiliary vectors derived from th
e
derivatives of brightness and the image positio
n
that occur in the basic brightness change con-
straint equation
. Study of the distribution of the
directions of these vectors on the unit sphere sug-
gests specific algorithms and also helps uncove
r
relationships between accuracy and parameter
s
of the imaging situation
.
We have developed a simple robust algorith
m
for recovering the angular velocity vector in th
e
case of pure rotation
. This algorithm involve
s
solving three linear equations in the three un-
known components of the rotation vector
. Th
e
coefficients of the equations are moments ofcom-
ponents of one of the axuiliary vectors over the
given image region
. We show that the accuracy o
f
the recovered component of rotation about th
e
direction toward the image region is poor relativ
e
to the other components
. unless the image regio
n
subtends a substantial solid angle
.
We have developed several algorithms fo
r
recovering the translational velocity in the case o
f
pure translation
. These algorithms exploit th
e
constraint that objects have to he in front of th
e
camera in order to be imaged
. This constrain
t
leads to a nonlinear constrained optimizatio
n
problem
. The performance of these algorithm
s
depends on a number of factors including
:
the angle subtended by the image
. i
.e
., the fiel
d
of view
,
the direction of motion relative to the optica
l
axis
,
the depth range
.
the distribution of brightness gradients
.
the noise in the estimated time derivative o
f
brightness
,
the noise in the estimated spatial gradient o
f
brightness, an
d
the number of picture cells considered
.
We have not yet been able to select a "best
"
algorithm from the set developed, since one ma
y
be more accurate under one set of circumstance
s
while another is better in a different situation
.
Also, the better algorithms tend to require mor
e
computation, and some do not lend themselves t
o
parallel implementation
. Further study using rea
l
image data will be needed to determine the rang
e
of applicability of each algorithm
.
We found a strong dependence of the accurac
y
of recovery of certain components of the motio
n
on the size of the field of view
. This is in concer
t
with other reports describing difficulties wit
h
small fields of view, such as references [?J an
d
[5J
.
1
.3
Comments on Sampling
. Filtering
. and Aliasin
g
Work with real image data has demonstrated th
e
need to take care in filtering and sampling
. Th
e
estimates of spatial gradient and time derivative
s
are sensitive to aliasing effects resulting from in
-
adequate low-pass filtering before sampling
. This

Direct Methods for Recovering Motion
5
3
A
is easily overlooked, particularly in the tim
e
direction
. It is usually a mistake, for example, t
o
simply pick every nth frame out of an image se-
quence
. At the very least, )i consecutive frame
s
should be averaged hefore sampling in order t
o
reduce the high-frequency components
. One ma
y
object to the "smearing" introduced by this tech
-
nique, but a series of widely separated snapshot
s
typically do not obey the conditions of the sam
-
pling theorem, and as a result the estimates of th
e
derivatives may contain large errors
.
This, ofcourse, is nothing new
. since the sam
e
considerations apply when one tries to estimat
e
the optical flow using first derivatives of imag
e
brightness (Horse and Schunck
161)
.
It is impor-
tant to remember that the filtering must he ap-
plied heforc sampling-once the data has bee
n
sampled, the damage has been done
.
2 The Brightness-Change Constraint Equatio
n
Following Longuet-Higgins and Prazdny
171
an
d
Bruss and Horn [ll we use a viewer-based coor-
dinate system. Figure 1 depicts
the system
unde
r
consideration
. A world poin
t
R
=
(X, Y,Z)
r
(1
)
is imaged a
t
r = (x
.y,l)'
(2
)
Fig
. I
.
The viewer-centered coordinate system
. The trans-
lational velocity of the camera is
t
=
(U
.KW)
r
.
while th
e
rotational component is m =
(A,8,C)
r
.
That is, the image plane has equation Z = 1
. Th
e
origin is at the projection center and the Z-axi
s
runs along the
optical
axis
. The X- and Y-axes ar
e
parllel to the x- and y-axes of the image plane
.
Image coordinates are measured relative to th
e
principal point, the point
(0,0,1)
T
where the opti
-
cal axis pierces the image plane
. The points r an
d
R are related by the perspective projectio
n
equatio
n
r=
(x
.y
.l)r
X
Y
Z
R
R
i
(3
)
wit
h
Z =
R
• i (4
)
and where i denotes the unit vector in the
Z
direction
,
Suppose the observer moves with instan-
taneous translational velocity t =
(U, V,
W)
T
an
d
instantaneous rotational velocity co =
(A
.B
.C)
T
relative to a fixed environment, then the tim
e
derivative of the vector R can be written a
s
R,=-t-u)XR
(5
)
The motion of the world point R results in motio
n
of the corresponding image point
: the value o
f
this
motion field
is giver,
b
y
dr
_
d
(
R
)
r`
dt - dt
R -
il
=
RJR
i}
-
(R,
i)R
(
6
)
(R
.
z)
'
This can also be expressed a
s
i
x
(R,
X
r)
(7
)
r` -
R
i
since a X
(b X c)
=
(c • a)b - (a
b)c
.
Substitutin
g
equation (5) into othis result gives (see Negah-
daripour and Horn [31)
:
1
1
r,=-2XIrX
/
{rXw-
R
t
}
J
[8
}
`1
i
1
In component form this can be expressed as

54
Horn and Weldo
n
-U+xW+Axy-B(x'`+
1)+C
y
z
_
-17
+yW_
Bxy+A(y
2
+1)-C
x
Z
0
(9
)
a result first obtained by Longuet-Higgins an
d
Prazdny [7]
.
This shows how, given the world motion, th
e
motion field can be calculated for every imag
e
point
. If we assume that tha brightness of a smal
l
surface patch is not changed by motion, then ex
-
pansion of the total derivative of brightness
E
leads t
o
aE
dx +
aE
dy
+
aE
= 0
(10
)
Ox dr
ay
dt
a
t
(The applicability of the constant brightnes
s
assumption is discussed in Appendix A-) Denot
-
ing the vector
(Mlax,
aE143y,0)
r
by E,
and
Mlat
b
y
E„ permits us to express this result
morecompact-
ly in the for
m
E,•r,+E,=
0 (11
)
Substituting equation (8) into this result an
d
rearranging give
s
E
r
-][(E,Xi)Xr]Xr]•
m
+
[(E,
x
Z)
X
rL
t
= 0
(12
)
R
2
To simplify this expression we le
t
s=(E,X1)Xr
(13
)
an
d
v=-sXr
(14
)
so equation (12) reduces to the
brightness chang
e
constraint equation
of Negahdaripour and Hor
n
[3], namel
y
v -
w +
s
t
= R i -E, (15
)
The vectors s and
v
can be expressed in compo-
nent form as
-E
x
s =
-E,
an
d
(XE
.+
yE
y
+E
l
.
+
y(xE
,r
+
yE
y
)
V
-
-
E
x
-
x(xE
x
+
yE
y
)
(16
)
Y
E
.Y -
xE,
.
Note that s • r = 0,
v
• r = 0 and s • v = 0
. Thes
e
three vectors thus form an orthogonal triad
. Th
e
vectors s and
v
are inherent properties of th
e
image
. Note that the projection ofs into the imag
e
plane is just the (negative) gradient of the image
.
Also, the quantity s indicates the directions i
n
which translation of a given magnitude will con
-
tribute maximally to the temporal brightnes
s
change of a given
picture cell
.
The quantity
v
plays a similar role for rotation
.
3 Solving the Brightness Chang
e
Constraint Equatio
n
Equation (15) relates observer motion (t,o), th
e
depth of the world R • z = Z(x,y) and certai
n
measurable quantities of the image (s,v)
.
I
n
general, it is not possible to solve for the first tw
o
of these given the last
. Some interesting specia
l
cases are addressed in
this paper
and in Negah
-
daripour and Horn [3]
; these are
:
'
i.
Known depth
: In section 3
.1 we show tha
t
given Z, s, and
v,
the quantities, t and
(D
can b
e
calculated in closed form using a least
-
squares method
.
ii.
Pure rotation
(11
t
11
= 0)
: In section 3
.2 w
e
show that given
v,
the rotation vector
W
can b
e
calculated in closed form
.
iii.
Pure translation or known rotation
: In sec-
tion 3
.3 we present a least-squares method fo
r
determining t
. Once t is known, the brightnes
s
change constraint equation can be used t
o
'We do not discuss here related methods using optical flow
,
such as those of Bross and Horn (l]
.

find the depth at each picture cell
:
Z =
R
z=-
s
t
E, +
v
w
iv.
Planar world: Negahdaripour and Horn 13
]
present a closed-form solution fort, to, and th
e
normal
n
of the world plane
.
v.
Quadratic patches
: Negahdaripour [8] gives
a
closed-form solution in the case that a portio
n
of the world can be represented as a quadrati
c
form
.
In this paper we consider various integrals ove
r
an image region thought to correspond to a singl
e
rigid object in motion relative to the viewer
. In th
e
simplest case, the observer is moving relative to
a
static environment and the whole image can b
e
used
. The size of the field of view has a strong ef
-
fect on the accuracy of the determination of th
e
components of motion along the optical axis
.
When we need to estimate this accuracy, we will
,
for convenience, assume a circular image o
f
radius r,,
. This corresponds to a conical field o
f
view with half angle 0,,, where r, = tan 0,,, since w
e
have assumed that the focal length equals one
.
(We assume that 0 < 0, < n/2)
.
We will show that the field of view should b
e
large
. Although orthographic projection usuall
y
simplifies machine vision problems, this is on
e
case in which we welcome the effects of perspec
-
tive
"
distortion"
!
3
.1 Depth Know
n
When depth is known, it is straightforward t
o
recover the motion
. (Depth may have been ob-
tained using a binocular stereo system or som
e
kind of range finder
.) We cannot, in general, fin
d
a motion to satisfy the brightness change con-
straint equation at every picture cell, because o
f
noise in the measurements
. Instead we minimiz
e
jj[E+
v
w+(1/Z)s
t]
2
dxdy
(18
)
Differentiating with respect to to and t and settin
g
the results equal to zero leads to the pair o
f
vector equations
:
Direct Methods for Recovering Motion
5
5
[IIuIz)
2
ssTdxdy]
t
+
[JJ(1/z)svTdxdy]
w
= -
f f
E,(11Z)sdxd
y
[fJ(I/zvs
T
dxdY]t
(19
)
+ [JJvv
T
dxd
Y
]
w
-
f f
E,v
dxd
y
This is a set of six linear equations in six un-
knowns with a symmetric coefficient matrix
. (Th
e
equations can be solved by partitioning in orde
r
to reduce the computational effort
.) The coef-
ficients are all integrals of products of com-
ponents of (1/Z)s and v
. It may be useful t
o
note tha
t
trace(sv
T
) = trace(vs
r
) = s •
v
= 0
(20)
-
We could have obtained slightly different equa-
tions for to and t if we had chosen to weight the in
-
tegrand in equation (18) differently
. We study th
e
special case in which
IItll
= 0 and the special cas
e
in
which
llwll
= 0 later
.
One application of the above result is t
o
"
dynamic stereo." A binocular stereo system ca
n
provide disparity estimates from which 1/Z ca
n
be calculated
.
The above equations can then b
e
used to solve for the motion, provided estimates o
f
the derivatives of image brightness are also sup
-
plied
. The correspondence problem of binocula
r
stereo has, unfortunately, been found to be a dif-
ficult one
. It would represents the major com-
putational burden in a dynamic stereo system
.
We hope that motion vision research will even-
tually
lead to simpler methods for recoverin
g
depth than those used for binocular stereo
-
although they are likely to be relatively inaccurat
e
when based only on instantaneous translationa
l
and rotational velocity estimates
.
(17)

Citations
More filters
Journal ArticleDOI

Performance of optical flow techniques

TL;DR: These comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques the authors implemented.
Journal ArticleDOI

Improving resolution by image registration

TL;DR: In this paper, the relative displacements in image sequences are known accurately, and some knowledge of the imaging process is available, and the proposed approach is similar to back-projection used in tomography.
Book ChapterDOI

Hierarchical Model-Based Motion Estimation

TL;DR: In this paper, a hierarchical estimation framework for the computation of diverse representations of motion information is described, which includes a global model that constrains the overall structure of the motion estimated, a local model that is used in the estimation process, and a coarse-fine refinement strategy.
Journal ArticleDOI

Epipolar-plane image analysis: An approach to determining structure from motion

TL;DR: This article describes the application of a technique for building a three-dimensional description of a static scene from a dense sequence of images, and shows how projective duality is used to extend the analysis to a wider class of camera motions and object types that include curved and moving objects.
Journal ArticleDOI

The Fundamental Matrix: Theory, Algorithms, and Stability Analysis

TL;DR: This paper clarifies the projective nature of the correspondence problem in stereo and shows that the epipolar geometry can be summarized in one 3×3 matrix of rank 2 which is proposed to call the Fundamental matrix, a task which is of practical importance.
References
More filters
Book

Pattern classification and scene analysis

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Proceedings ArticleDOI

Determining Optical Flow

TL;DR: In this article, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Book

Perceptrons: An Introduction to Computational Geometry

TL;DR: The aim of this book is to seek general results from the close study of abstract version of devices known as perceptrons.
Journal ArticleDOI

The interpretation of a moving retinal image

TL;DR: It is shown that from a monocular view of a rigid, textured, curved surface it is possible, in principle, to determine the gradient of the surface at any point, and the motion of the eye relative to it, from the velocity field of the changing retinal image, and its first and second spatial derivatives.
Book

Elements of Photogrammetry

Paul R. Wolf