Direct methods for recovering motion

doi:10.1007/BF00836281

International Journal of Computer Vision

. 2

. 51-76 (1988

)

. Manufactured in The Netherland

s

Direct Methods for Recovering Motion

*

BERTHOLD K

.P

. HORN AND E

.J

. WELDON JR

.

Department of Electrical Engineering, University of Hawaii at Manoa,

Honolulu, Hawaii

9682

2

Abstrac

t

We have developed direct methods for recovering the motion of an observer in a static environment in th

e

case of pure rotation, pure translation, and arbitrary motion when the rotation is known

. Some of thes

e

methods are based on the minimization of the difference between the observed time derivative of bright

-

ness and that predicted from the spatial brightness gradient, given the estimated motion

. We minimiz

e

the square of the integral of this difference taken over the image region of interest

. Other methods presen

-

ted here exploit the fact that surfaces have to be in front of the observer in order to be seen

.

We do not establish point correspondences, nor do we estimate the optical flow

. We use only first-orde

r

derivatives of the image brightness, and we do not assume an analytic form for the surface

. We show tha

t

the field of view should be large to accurately recover the components of motion in the direction towar

d

the image region

. We also demonstrate the importance of points where the time derivative of brightness i

s

small and discuss difficulties resulting from very large depth ranges

. We emphasize the need for adequat

e

filtering of the image data before sampling to avoid aliasing, in both the spatial and tempora

l

dimensions

.

I

. Introductio

n

In this paper we consider the problem of deter

-

mining the motion of a monocular observer mov-

ing with respect to a

rigid

.

unknown world

. W

e

use a least-squares, as opposed to a discrete

,

method of solving for the motion parameters

; ou

r

method uses all of the points in a two-image se-

quence and does not attempt to establish corres-

pondence between the images

. Hence the metho

d

is relatively robust to quantization error, noise, il

-

lumination gradients, and other effects

.

So far, we can determine the observer motion i

n

two special cases

:

*This research was supported by the National Science Foun

-

dation under Grant No

. DMC85-11966

. Additional suppor

t

was provided by NASA (Grant No

. GSFC 5-1162) and by th

e

Veteran's Administration

.

$BKPH on leave from the Department of Electrical En-

gineering and Computer Science, Massachusetts institute o

f

Technology

. Cambridge, Massachusetts 02139

.

• when the motion is pure rotation

,

• when the motion is pure translation or whe

n

the rotational component of the motion i

s

known

.

At this writing we have not developed a direc

t

method that is applicable to arbitrary motion

.

1

.1 Earlier Wor

k

In the continuous or least-squares approach t

o

motion vision, motion parameters are found tha

t

are consistent with the observed motion of the en

-

tire image

. Bruss and Horn [ 1 ] use this approac

h

to calculate motion parameters assuming that th

e

optical flow is known at each point

. Adiv [2] use

s

the approach of Bruss and Horn to segment th

e

scene into independently moving planar objects

:

he shows that given the optical flow, segmenta-

tion can be performed and the motion calculated

.

Nagahdaripour and Horn 13] eschew the use o

f

optical flow and calculate the observer's motion

52



Horn

and Weldo

n

directly from the spatial and temporal derivative

s

of the image brightness, assuming a planar world

.

The advantage of this direct approach

. which w

e

also use here, is that certain computational dif-

ficulties inherent in the calculation of optical flo

w

are avoided

. In particular

. it is not necessary t

o

make the usual assumption that the optical flo

w

field is smooth

: an assumption that is violate

d

near object boundaries, necessitating flo

w

segmentation

.

Waxman and Ullman [41 and Waxman an

d

Wohn

[51

also avoid the discrete approach to mo-

tion vision

: their techniques make use of first an

d

second derivatives of the optical flow to comput

e

both the motion parameters and the structure o

f

the imaged world

. In the interests of developin

g

methods that can be implemented, the technique

s

presented in this paper avoid the use of second

-

and higher-order derivatives

.

1

.2 Summary of tfae Pape

r

One of our approaches to the motion vision prob-

lem can be summarized as follows

: Given the ob-

server motion and the spatial brightness functio

n

of the image one can predict the time derivative o

f

brightness at each point in the image

. We find th

e

motion that minimizes the integral of the squar

e

of the difference between this predicted value an

d

the observed time derivative

. The integral is take

n

over the image region of interest, which

. in th

e

discussion here, is usually taken to be th

e

whole image

.

We use auxiliary vectors derived from th

e

derivatives of brightness and the image positio

n

that occur in the basic brightness change con-

straint equation

. Study of the distribution of the

directions of these vectors on the unit sphere sug-

gests specific algorithms and also helps uncove

r

relationships between accuracy and parameter

s

of the imaging situation

.

We have developed a simple robust algorith

m

for recovering the angular velocity vector in th

e

case of pure rotation

. This algorithm involve

s

solving three linear equations in the three un-

known components of the rotation vector

. Th

e

coefficients of the equations are moments ofcom-

ponents of one of the axuiliary vectors over the

given image region

. We show that the accuracy o

f

the recovered component of rotation about th

e

direction toward the image region is poor relativ

e

to the other components

. unless the image regio

n

subtends a substantial solid angle

.

We have developed several algorithms fo

r

recovering the translational velocity in the case o

f

pure translation

. These algorithms exploit th

e

constraint that objects have to he in front of th

e

camera in order to be imaged

. This constrain

t

leads to a nonlinear constrained optimizatio

n

problem

. The performance of these algorithm

s

depends on a number of factors including

:

•

the angle subtended by the image

. i

.e

., the fiel

d

of view

,

•

the direction of motion relative to the optica

l

axis

,

•

the depth range

.

•

the distribution of brightness gradients

.

•

the noise in the estimated time derivative o

f

brightness

,

•

the noise in the estimated spatial gradient o

f

brightness, an

d

•

the number of picture cells considered

.

We have not yet been able to select a "best

"

algorithm from the set developed, since one ma

y

be more accurate under one set of circumstance

s

while another is better in a different situation

.

Also, the better algorithms tend to require mor

e

computation, and some do not lend themselves t

o

parallel implementation

. Further study using rea

l

image data will be needed to determine the rang

e

of applicability of each algorithm

.

We found a strong dependence of the accurac

y

of recovery of certain components of the motio

n

on the size of the field of view

. This is in concer

t

with other reports describing difficulties wit

h

small fields of view, such as references [?J an

d

[5J

.

1

.3

Comments on Sampling

. Filtering

. and Aliasin

g

Work with real image data has demonstrated th

e

need to take care in filtering and sampling

. Th

e

estimates of spatial gradient and time derivative

s

are sensitive to aliasing effects resulting from in

-

adequate low-pass filtering before sampling

. This

Direct Methods for Recovering Motion



5

3

A

is easily overlooked, particularly in the tim

e

direction

. It is usually a mistake, for example, t

o

simply pick every nth frame out of an image se-

quence

. At the very least, )i consecutive frame

s

should be averaged hefore sampling in order t

o

reduce the high-frequency components

. One ma

y

object to the "smearing" introduced by this tech

-

nique, but a series of widely separated snapshot

s

typically do not obey the conditions of the sam

-

pling theorem, and as a result the estimates of th

e

derivatives may contain large errors

.

This, ofcourse, is nothing new

. since the sam

e

considerations apply when one tries to estimat

e

the optical flow using first derivatives of imag

e

brightness (Horse and Schunck

161)

.

It is impor-

tant to remember that the filtering must he ap-

plied heforc sampling-once the data has bee

n

sampled, the damage has been done

.

2 The Brightness-Change Constraint Equatio

n

Following Longuet-Higgins and Prazdny

171

an

d

Bruss and Horn [ll we use a viewer-based coor-

dinate system. Figure 1 depicts

the system

unde

r

consideration

. A world poin

t

R

=

(X, Y,Z)

r



(1

)

is imaged a

t

r = (x

.y,l)'



(2

)

Fig

. I

.

The viewer-centered coordinate system

. The trans-

lational velocity of the camera is

t

=

(U

.KW)

r

.

while th

e

rotational component is m =

(A,8,C)

r

.

That is, the image plane has equation Z = 1

. Th

e

origin is at the projection center and the Z-axi

s

runs along the

optical

axis

. The X- and Y-axes ar

e

parllel to the x- and y-axes of the image plane

.

Image coordinates are measured relative to th

e

principal point, the point

(0,0,1)

T

where the opti

-

cal axis pierces the image plane

. The points r an

d

R are related by the perspective projectio

n

equatio

n

r=

(x

.y

.l)r



X

Y

Z

R

i



(3

)

wit

h

Z =

R

• i (4

)

and where i denotes the unit vector in the

Z

direction

,

Suppose the observer moves with instan-

taneous translational velocity t =

(U, V,

W)

T

an

d

instantaneous rotational velocity co =

(A

.B

.C)

T

relative to a fixed environment, then the tim

e

derivative of the vector R can be written a

s

R,=-t-u)XR



(5

)

The motion of the world point R results in motio

n

of the corresponding image point

: the value o

f

this

motion field

is giver,

b

y

dr

_

d

(

R

)

r`



dt - dt

R -

il

=

RJR

•

i}

-

(R,

•

i)R



(

6

)

(R

.

z)

'

This can also be expressed a

s

i

x

(R,

X

r)



(7

)

r` -



R

i

since a X

(b X c)

=

(c • a)b - (a

•

b)c

.

Substitutin

g

equation (5) into othis result gives (see Negah-

daripour and Horn [31)

:



1

r,=-2XIrX

/

{rXw-

R

t

}

J



[8

}

`1



i

1

In component form this can be expressed as

54



Horn and Weldo

n

-U+xW+Axy-B(x'`+

1)+C

y

z

_

-17

+yW_

Bxy+A(y

2

+1)-C

x

Z

0

(9

)

a result first obtained by Longuet-Higgins an

d

Prazdny [7]

.

This shows how, given the world motion, th

e

motion field can be calculated for every imag

e

point

. If we assume that tha brightness of a smal

l

surface patch is not changed by motion, then ex

-

pansion of the total derivative of brightness

E

leads t

o

aE

dx +

aE

dy

+

aE

= 0



(10

)

Ox dr



ay

dt



a

t

(The applicability of the constant brightnes

s

assumption is discussed in Appendix A-) Denot

-

ing the vector

(Mlax,

aE143y,0)

r

by E,

and

Mlat

b

y

E„ permits us to express this result

morecompact-

ly in the for

m

E,•r,+E,=

0 (11

)

Substituting equation (8) into this result an

d

rearranging give

s

E

r

-][(E,Xi)Xr]Xr]•

m

+

[(E,

x

Z)

X

rL

t

= 0



(12

)

R

2

To simplify this expression we le

t

s=(E,X1)Xr



(13

)

an

d

v=-sXr

(14

)

so equation (12) reduces to the

brightness chang

e

constraint equation

of Negahdaripour and Hor

n

[3], namel

y

v -

w +

s

t

= R i -E, (15

)

The vectors s and

v

can be expressed in compo-

nent form as

-E

x

s =



-E,



an

d

(XE

.+

yE

y

+E

l

.

+

y(xE

,r

+

yE

y

)

V

-



-

E

x

-

x(xE

x

+

yE

y

)



(16

)

Y

E

.Y -

xE,

.

Note that s • r = 0,

v

• r = 0 and s • v = 0

. Thes

e

three vectors thus form an orthogonal triad

. Th

e

vectors s and

v

are inherent properties of th

e

image

. Note that the projection ofs into the imag

e

plane is just the (negative) gradient of the image

.

Also, the quantity s indicates the directions i

n

which translation of a given magnitude will con

-

tribute maximally to the temporal brightnes

s

change of a given

picture cell

.

The quantity

v

plays a similar role for rotation

.

3 Solving the Brightness Chang

e

Constraint Equatio

n

Equation (15) relates observer motion (t,o), th

e

depth of the world R • z = Z(x,y) and certai

n

measurable quantities of the image (s,v)

.

I

n

general, it is not possible to solve for the first tw

o

of these given the last

. Some interesting specia

l

cases are addressed in

this paper

and in Negah

-

daripour and Horn [3]

; these are

:

'

i.

Known depth

: In section 3

.1 we show tha

t

given Z, s, and

v,

the quantities, t and

(D

can b

e

calculated in closed form using a least

-

squares method

.

ii.

Pure rotation

(11

t

11

= 0)

: In section 3

.2 w

e

show that given

v,

the rotation vector

W

can b

e

calculated in closed form

.

iii.

Pure translation or known rotation

: In sec-

tion 3

.3 we present a least-squares method fo

r

determining t

. Once t is known, the brightnes

s

change constraint equation can be used t

o

'We do not discuss here related methods using optical flow

,

such as those of Bross and Horn (l]

.

find the depth at each picture cell

:

Z =

R

z=-



s

t

E, +

v

w

iv.

Planar world: Negahdaripour and Horn 13

]

present a closed-form solution fort, to, and th

e

normal

n

of the world plane

.

v.

Quadratic patches

: Negahdaripour [8] gives

a

closed-form solution in the case that a portio

n

of the world can be represented as a quadrati

c

form

.

In this paper we consider various integrals ove

r

an image region thought to correspond to a singl

e

rigid object in motion relative to the viewer

. In th

e

simplest case, the observer is moving relative to

a

static environment and the whole image can b

e

used

. The size of the field of view has a strong ef

-

fect on the accuracy of the determination of th

e

components of motion along the optical axis

.

When we need to estimate this accuracy, we will

,

for convenience, assume a circular image o

f

radius r,,

. This corresponds to a conical field o

f

view with half angle 0,,, where r, = tan 0,,, since w

e

have assumed that the focal length equals one

.

(We assume that 0 < 0, < n/2)

.

We will show that the field of view should b

e

large

. Although orthographic projection usuall

y

simplifies machine vision problems, this is on

e

case in which we welcome the effects of perspec

-

tive

"

distortion"

!

3

.1 Depth Know

n

When depth is known, it is straightforward t

o

recover the motion

. (Depth may have been ob-

tained using a binocular stereo system or som

e

kind of range finder

.) We cannot, in general, fin

d

a motion to satisfy the brightness change con-

straint equation at every picture cell, because o

f

noise in the measurements

. Instead we minimiz

e

jj[E+

v

w+(1/Z)s

t]

2

dxdy



(18

)

Differentiating with respect to to and t and settin

g

the results equal to zero leads to the pair o

f

vector equations

:

Direct Methods for Recovering Motion



5

[IIuIz)

2

ssTdxdy]

t

+

[JJ(1/z)svTdxdy]

w

= -

f f

E,(11Z)sdxd

y

[fJ(I/zvs

T

dxdY]t



(19

)

+ [JJvv

T

dxd

Y

]

w

-

f f

E,v

dxd

y

This is a set of six linear equations in six un-

knowns with a symmetric coefficient matrix

. (Th

e

equations can be solved by partitioning in orde

r

to reduce the computational effort

.) The coef-

ficients are all integrals of products of com-

ponents of (1/Z)s and v

. It may be useful t

o

note tha

t

trace(sv

T

) = trace(vs

r

) = s •

v

= 0



(20)

-

We could have obtained slightly different equa-

tions for to and t if we had chosen to weight the in

-

tegrand in equation (18) differently

. We study th

e

special case in which

IItll

= 0 and the special cas

e

in

which

llwll

= 0 later

.

One application of the above result is t

o

"

dynamic stereo." A binocular stereo system ca

n

provide disparity estimates from which 1/Z ca

n

be calculated

.

The above equations can then b

e

used to solve for the motion, provided estimates o

f

the derivatives of image brightness are also sup

-

plied

. The correspondence problem of binocula

r

stereo has, unfortunately, been found to be a dif-

ficult one

. It would represents the major com-

putational burden in a dynamic stereo system

.

We hope that motion vision research will even-

tually

lead to simpler methods for recoverin

g

depth than those used for binocular stereo

-

although they are likely to be relatively inaccurat

e

when based only on instantaneous translationa

l

and rotational velocity estimates

.

(17)

Direct methods for recovering motion

Citations

Performance of optical flow techniques

Improving resolution by image registration

Hierarchical Model-Based Motion Estimation

Epipolar-plane image analysis: An approach to determining structure from motion

The Fundamental Matrix: Theory, Algorithms, and Stability Analysis

References

Pattern classification and scene analysis

Determining Optical Flow

Perceptrons: An Introduction to Computational Geometry

The interpretation of a moving retinal image

Elements of Photogrammetry

Related Papers (5)

Determining Optical Flow

An iterative image registration technique with an application to stereo vision

Robot Vision

Hierarchical Model-Based Motion Estimation

A computer algorithm for reconstructing a scene from two projections