scispace - formally typeset
Open AccessJournal ArticleDOI

Spatiotemporal energy models for the perception of motion

Reads0
Chats0
TLDR
In this article, the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency, and the outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy.
Abstract
A motion sequence may be represented as a single pattern in x–y–t space; a velocity of motion corresponds to a three-dimensional orientation in this space. Motion sinformation can be extracted by a system that responds to the oriented spatiotemporal energy. We discuss a class of models for human motion mechanisms in which the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency. The outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy. These responses are then fed into an opponent stage. Energy models can be built from elements that are consistent with known physiology and psychophysics, and they permit a qualitative understanding of a variety of motion phenomena.

read more

Content maybe subject to copyright    Report

284
J.
Opt.
Soc.
Am.
A/Vol.
2,
No.
2/February
1985
Spatiotemporal
energy
models
for
the
perception
of
motion
Edward
H.
Adelson
and
James
R.
Bergen
David
Sarnoff
Research
Center,
RCA,
Princeton,
New
Jersey
08540
Received
July
9,
1984;
accepted
October
12,
1984
A
motion sequence may
be
represented
as
a
single
pattern
in
x-y-t
space;
a
velocity
of
motion
corresponds
to
a
three-dimensional
orientation
in
this
space.
Motion
sinformation
can
be
extracted
by
a
system
that
responds
to
the
oriented
spatiotemporal
energy.
We
discuss
a
class
of
models
for
human
motion mechanisms
in
which
the
first
stage
consists
of
linear filters
that
are
oriented
in
space-time
and
tuned
in
spatial
frequency.
The
outputs
of
quad-
rature
pairs
of such
filters
are
squared
and
summed
to
give
a
measure
of
motion
energy.
These
responses
are
then
fed
into
an
opponent
stage.
Energy
models
can
be
built
from
elements
that
are
consistent
with
known
physiology
and
psychophysics,
and
they
permit
a
qualitative
understanding
of
a
variety
of
motion
phenomena.
1. INTRODUCTION
When
we
watch
a
movie,
we
see
a
sequence
of
images
in
which
objects
appear
at
a
sequence
of
positions.
Although
each
frame
represents
a
frozen
instant
of
time,
the
movie
gives
us
a
convincing
impression
of
motion.
Somehow
the
visual
system
interprets
the
succession of
still
images
so
as
to
arrive
at
a
perception
of
a
continuously
moving
scene.
This
phenomenon
represents
one
form of
apparent
motion.
How
is
it
that
we
see
apparent
motion?
One
possibility
is
that
our
visual
system
matches
up
corresponding
points
in
suc-
ceeding
frames
and
calculates an
inferred
velocity
based
on
the
distance traveled
over
the
frame
interval.
Much research
on
apparent
motion
has
taken
the
establishment
of
this
cor-
respondence
to
be
the
fundamental
problem
to
be
solved.
1
-
3
We
argue
that
this
correspondence problem
can
often
be
by-
passed
altogether;
we
take
up
this
argument after
discussing
various
approaches to
the
problem
of
motion
analysis.
Figure
la
shows
a
vertical
bar,
which
is
presented
at
a
se-
quence
of
discrete
positions
at
a
sequence
of
discrete times.
In
a
typical
feature-matching
model,
the
visual
system
is
said
to (1)
find
salient features
in
successive
frames;
(2)
establish
a
correspondence
between
them;
(3)
determine
Ax,
the
dis-
tance
traveled, and
At,
the
time
between
frames;
and,
finally,
(4)
compute
the
velocity
as
Ax/At.
In
this
example,
the
features
to
be
matched might
be
the
edges
of
the
bar.
In
a
typical
global
matching
model,
the
visual
system
would
perform
a
match
over
some large
region
of
the
image,
in
es-
sence
performing
a
template match
by
sliding
the
image
from
one
frame
to
match
the
image
optimally
in
the
next
frame.
Most
cross-correlation
models
(see,
e.g.,
Lappin
and
Bell
4
)
are
examples
of
the
global
matching
approach.
Once
again,
Ax
and
At
can be
determined, and
the
velocity
can be
inferred.
Matching
models
are
designed
to
make
predictions
about
stimuli presented
as
sequences
of
frames
(e.g.,
movies).
Not
all
stimuli
fall
naturally
into
such
a
description.
In
an
ordi-
nary
television,
for
example,
the
electron
beam
illuminates
adjacent
points
in
a
rapid
sequence,
sweeping
out
the
even
lines
of
the
raster
pattern
on one
field
and
then
returning
to
fill in
the
odd lines
on
the next
field
(two
fields
constitute
a
frame).
Should
the
matching
be
taken
between frames
or
between
fields?
For
that
matter,
why
should
it
not
be
taken
between
the
successively
illuminated
points
themselves?
(Note
that
the
motion
of
the raster
itself,
which
is
normally
invisible,
will
become visible
if
the
raster
is
quite
slow.)
Although
the
answer
is
not
immediately
obvious,
it
is
clear
that
we
need
to
consider
the
well-known
persistence
of
visual
responses-i.e.,
the
temporal
filtering imposed
by early visual
mechanisms-in
order
to
make
sense
of even
the
simplest
phenomena
of
apparent
motion.
The
rapidly
illuminated
points
on
a
television
screen
are
blended
together
in
time,
effectively
making
all
the
lines
of
a
frame (including
both
fields)
visually
present
at
one
time.
One
approach
to
motion
modeling,
therefore,
is
to
build
in
a
temporal-filtering
stage
that
preprocesses
the
visual
input
before
it
is
passed
along
to
the
matching
system.
The
resulting
model
treats
the
stimulus
in
both
a
continuous
and
a
discrete
fashion.
Filtering
is
a
continuous operation and
leads
to
a
continuously
varying
output,
whereas
matching
is
discrete,
taking
place
between
images
sampled
at
two
particular
moments
in
time. Having
been
forced
to
introduce
filtering
into
the
model,
we
would
like
to
make
full
use
of
its
properties.
In fact,
filtering
can
be
used
to
extract
the
motion
information
itself,
thus
rendering
the
discrete
matching
stage
superfluous.
There
are
other
reasons
for
shying
away
from
matching
models
as
they
are commonly
presented.
They
can
usually
make
predictions
about
simple
stimuli
such
as
a
moving
bar,
but
they
may
run
into
trouble
when
presented
with
a
sequence
such
as
is
shown
in
Fig.
lb.
Here,
a
sequence
of
vertical
ran-
dom
noise
patterns
is
presented.
When
this
sequence
is
viewed,
complex
motions
are
seen,
varying
from
point
to
point
in
the
image.
Different
velocities
are
seen
at
different
posi-
tions,
and these
velocities change
rapidly.
A
feature-matching
model has
difficulty
making
predictions
because of
the
fa-
miliar
problems:
What
constitutes
a
feature?
What
should
be
matched
to
what?
Most feature-based
models
are
not
well
enough
defined
to
offer
predictions
about
a
stimulus
such
as
that
of Fig.
lb.
Yet motion
is
seen,
and
we
would like
to
be-
lieve
that
this
motion
percept
is
generated
by
the
same
lawful
processes
that
generate
the
percept
of
the
moving
bar.
Can
a
global
matching
model,
such
as
a
cross-correlation
model,
do
better?
Again,
it
is
hard
to
know
what
such
a
model
will
predict.
Most
global
matching
models
have
been
for-
mulated
only
to
deal
with
the
visibility
of
single
global
motions
and
thus
cannot
be easily
applied
to the situation
in
which
many motions
are
seen
at
different points
in
the
field.
0740-3232/85/020284-16$02.00
©
1985
Optical
Society
of
America
E.
H.
Adelson
and
J.
R. Bergen

E.
H.
Adelson
and J.
R.
Bergen
Vol.
2,
No.
2/February
1985/J.
Opt.
Soc. Am.
A
285
ti
t
2
t
3
b~
~~
;jW
;y
E
Fig.
1.
a,
A
sequence of
images
presented
at
times
t
1
,
t
2
,
and
t
3
showing
a
bar
moving
to
the
right.
b,
A
sequence
of vertical
random
noise
patterns,
also
shown
at
three
successive
instants
of
time.
Motion
is
seen
in
each
case.
The
motion
percept
is
simple
in a
and
complex
in
b,
but
a
motion
model
should
be
able
to handle
both
cases.
A
number
of approaches
have
recently been
developed
that
can be
used with
complex
inputs
such
as
the
dynamic
noise
of
Fig.
lb.
Marr and
Ullman
5
describe
a
method
for
ex-
tracting
the
motion
of
zero
crossings
in
the
outputs
of
linear
filters
by comparing
the
sign
of
the
filter
output
to
the
sign
of
its temporal
derivative
at
the
zero
crossing.
A
rather
different
approach
has been
described
by van
Santen
and
Sperling
6
in
an
elaboration
of
Reichardt's
7
model
in
which
a local corre-
lation
(i.e.,
multiplication)
is
performed
across
space
and
time.
In
van
Santen
and
Sperling's
model,
filters
tuned
for
spatial
frequency
serve
as
the inputs
to
the
correlator
stages.
Van
Santen
and
Sperling provide a
formal
analysis
of
the
model's
properties,
describe
a
set
of
linking
assumptions,
and
show
that
the
model makes correct
predictions
about
a
large
variety
of
simple
motion
displays.
A
third
approach
has been
de-
scribed
by Watson
and
Ahumada8:
Motion
information
is
extracted
with
simple
linear filters
without
a
multiplicative
stage,
the
filters
are
tuned
for
spatial
and
temporal
frequency
as
well
as
velocity,
and
directional
selectivity
is
achieved
by
setting
up
the
appropriate
phase
relationships
between
an
underlying
pair
of
filters.
It
is
notable
that
this
approach
achieves
directional
selectivity
without
any
nonlinearities
(although
some
sort
of
nonlinearity must,
of
course,
be
present
at
some
point
for
motion
detection
to
occur).
Ross
and
Burr
9
have
also
proposed
that
the
visual
system
extracts
motion
information
with directionally
tuned
linear
filters. Morgan'
0
has applied linear-filtering
concepts
to
stroboscopic
displays,
and
Adelson" has
discussed
how
a
number
of motion
illusions
can
be
understood
in
terms
of
mechanisms
that
respond
to
the
motion
energy
within
particular
spatiotemporal-frequency
bands.
Although
it
is
not
immediately
apparent,
there
are
signifi-
cant
formal
connections
between
the
linear-filtering
approach
and
the
correlational
approach
of
a
Reichardt-style
model,
as
has
been previously
noted.
6
.'
2
The
topic
is
taken
up
in
Ap-
pendix
A;
at
this
point,
we
simply
comment
that
both
types
of
model
can be
considered
to
respond
to
motion
energy
within
a
given
spatiotemporal-frequency
band
(a
property
that
will
be
discussed
at
greater
length
below).
Our
interest
in
this paper
is
not
so
much
to
discuss
a
par-
ticular
model
as
to
discuss
a
general
class
of
models
and
not
so
much
to
discuss
this
class as
to
discuss
a
general
approach
to
the
problem
of
motion
detection.
We
will
consider
models
closely
related
to
the
ones
just
mentioned-models
that
are
based
on
a
simple
low-level
analysis
of
visual
information,
starting
with
the outputs
of
linear
filters.
This
kind
of
pro-
cessing
is
well
understood and
can be
readily applied
to
any
stimulus
input.
Moreover,
it
is
just
the
kind
of processing
that
is
considered
to
occur
early in
the
visual
pathway, based
on
a large
variety
of
psychophysical
and
physiological
experi-
ments.'
3
-16
Low-Level
Processing
in
Motion
Perception
A
low-level
approach
seems
particularly appropriate
when one
is
dealing
with motion
phenomena
that
occur
with
a
rapid
sequence of
presentations.
Many
investigators
have found
that
these
rapid presentations
lead
to
motion
percepts
that
are
determined
by
rather
simple
low-level
properties
of
the
stimuli.
Braddick'
7
provided
evidence
for
two
distinct
kinds
of
motion
mechanisms
in
apparent
motion.
He
called
them
long-range
and short-range
mechanisms.
The
short-range
process
operates
over
rather
short
spatial
distances
and
short
time
intervals
and
involves
low-level
kinds
of
visual
infor-

286
J.
Opt.
Soc.
Am.
A/Vol.
2,
No.
2/February
1985
mation.
The
long-range
mechanism
can
operate
over
large
spatial separations
and
longish
time
intervals
and
may
involve
somewhat
higher-level forms
of
visual
information.
Hochberg
and
Brooks'
8
also
found
evidence
for
two
pro-
cesses
in
motion
perception.
They
presented
a
sequence
of
images
containing
collections
of
simple
shapes, such
as
circles,
triangles,
and
squares. Each
shape
could
take
one
of
two
motion
paths:
It
could
take
a
short
path
but
change
identity
(e.g.,
a
triangle
could
take
a
short
path
by
turning
into
a
square),
or
it
could
take
a
longer
path
and
retain
its
identity.
At
lower
presentation
rates,
the identity
of
the
objects became
important
and
a
triangle
would
remain
a
triangle
even
if
it
meant
taking
a
longer
path.
But
with
rapid
presentations,
the
shorter
path
length
won
out,
even
though
it
meant
aban-
doning
stable
object
identity.
Sperling'
9
found
that
rapid,
multiple-presentation
motion
stimuli
gave
much
more
compelling
motion
than
did
the
slower
two-view
stimuli
of classic
apparent-motion
experi-
ments.
Evidence
for
a
fast,
low-level
process in
motion
per-
ception
has
also
been
presented
by
various
others.
2 20
'
2
'
The
models
that
we
develop
below
are
designed
to
deal
with
the
rapid-presentation
situation
and
are
based
on
the
simplest,
lowest-level
processes
that
we
can use.
We
will
try
to
avoid
the
concept
of
matching altogether.
2.
REPRESENTING
MOTION
IN
X-Y-
T
SPACE
Moving
stimuli
may
be
pictured
as
occupying
a
three-di-
mensional
space,
in
which
x
and
y
are
the
two
spatial
dimen-
sions
and
t
is
the
temporal
dimension. Consider
a
vertical
bar
moving
continuously
to
the right,
as
shown
in
Fig.
2a.
The
three-dimensional
spatiotemporal
diagram
is
shown
in
Fig.
2b;
the
moving
bar
becomes
a
slanted
slab
in
this
space.
If
the
continuous
motion
is
sampled
at
discrete times,
the
result is
Fig.
2c,
which
shows
a
movie
of
a
moving
bar.
In
Fig.
3,
only
the
x-t
slice
of
the
space
is
shown
(we
can
ignore
the y
dimension
since
a
vertical
bar
is
unchanging
along
the
y
direction).
The
moving
bar
in
Fig.
3a
becomes
a
slanted
strip.
The
slant
reflects
the
velocity
of
the
motion.
Figure
3b
shows
the
result
of
sampling
the
continuous
motion. In
practice,
when
one
presented
the
movie
corresponding
to Fig.
3b,
one
would
leave
each
frame
on
for
a
period
of
time
before
replacing
it
with
the next
one.
Figure
3c
shows
the
spa-
tiotemporal
plot
of
a
movie
in
which
each
frame
lasts
almost
through
the
full
interval
between frames.
(In
most
actual
movie
projection,
a
single
frame
is
broken
up
into
several
shorter
flashes
in
order to
minimize
the
perception
of
24-Hz
flicker;
for
simplicity,
we
do
not
consider
the
case
of
multiple
shuttering
here.)
We
know
that
the
sampled motion
of
Fig.
3c
will
look
sim-
ilar
to
the continuous
motion
of
Fig.
3a.
Indeed,
if
the
sam-
pling
is
sufficiently
frequent
in
time the
two
stimuli
will
look
identical.
Pearson
2 2
has
discussed
how
this
may
be
under-
stood
by
applying
the
standard
notions
of
sampling
and
ali-
asing
to
the
case
of
three-dimensional
sampling
in
space
and
time
and
considering
the spatiotemporal-filtering properties
of
the
human
visual system.
The
argument,
in
brief,
is
this:
A
continuously
moving
image
has
a
three-dimensional Fourier
spectrum
in
fA-fy-ft.
A
sampled
version
of
the
display
has
a
different
spectrum.
The
differences
between
the spectra
of
the
continuous and sampled
scenes
may
be
called
sampling
artifacts
(when
these
artifacts
intrude
on
the
spectrum
of
the
original signal
they
are known
as
aliasing
components).
It
is
these
components
that
allow
an observer
to
distinguish
be-
tween
a
continuous
and
a
sampled display.
The
task
of
a
display
engineer
is
therefore
to
ensure
that
the
artifactual
components
that
are
due
to
sampling
are
of
such
low
contrast
that
they
are
invisible
to the
human
observer.
To
achieve
this
goal,
it
is
necessary
not
to
remove
the
artifactual
components
altogether
but
merely
to
prevent them
from
reaching
threshold
visibility.
This
can
be done by
appropriately
pre-
filtering,
sampling,
and
postfiltering
the
moving
images.
It
is
not
always
easy
to
assess
the
visibility
of
sampling
ar-
tifacts;
one
must
take
into account
subthreshold
summation
between
the
artifactual
components
as
well
as
masking
by
true-image
components.
However,
Watson et al.
2 3
have
de-
scribed
a
set
of
conditions
under
which
one
may
be
confident
that
the
artifacts
will
not
be
visible.
For
sufficiently
high
spatial
and temporal
frequencies,
human
contrast
sensitivity
is
zero;
that
is,
components
lying
outside
a
certain
spa-
tiotemporal-frequency limit
(which
Watson
et
al.
2 3
call
the
window
of
visibility)
cannot
be
seen
regardless
of
their
con-
trast.
If
the
sampling
is
sufficiently
fine
to
keep
all
the
spectral
energy
of
the
sampling
artifacts
outside
this
window,
then
the
artifacts
must
be
invisible.
Morgan1
0
has
applied
frequency-based
analyses
to
the
problem
of
motion
interpolation and
has described
two
dif-
ferent
approaches.
In
the
first
approach,
the
analysis begins
with
the
extraction
of
a
position
signal,
i.e.,
a
single
number
that
varies
over
time.
Low-pass
filtering
is
then
applied
to
this
signal.
Thus
the first
stage of
motion
analysis
is
highly
nonlinear
(position
extraction),
and
linear filtering
follows
it.
In
Morgan's
second
approach,
the
filtering
is
applied directly
to the
stimulus
itself;
position
is
extracted
after
the
filtering
has
occurred.
The
present
discussion
(like
that
of
Pearson
and
that
of
Watson
et
al.)
is
more
closely
connected
to
the
second
approach
than
to
the
first.
But
one
should
note
that
position
as
such
need
not
be
extracted
in
the
computation
of
motion,
as
will
become
clear
in
what
follows.
When
temporal
sampling
is
too
coarse-as
in
an
old
movie-motion
tends
to
look
jerky.
But
motion
is
still
seen.
That
is,
to
convey
the
impression
of
motion,
it
is
not
necessary
that
a
sampled
stimulus
be
indistinguishable
from
a
contin-
uous
one.
A
spatiotemporal-frequency
analysis
helps
one
to
understand
this
as well,
because
a
continuous
and
a
sampled
stimulus
share
a
great
deal
of
spatiotemporal
energy,
even
if
they
do
not
share
it
all.
We
can
expect
the
two
stimuli
to
look
similar
insofar
as
there
are
visual
mechanisms
that
respond
to
the
shared
energy.
It
is
sometimes
helpful
to
perform
the
analysis
in
the
orig-
a
b
C
y
Fig.
2. a, A
picture
of
a
vertical
bar
moving
to
the right.
b,
A
spa-
tiotemporal
picture
of
the
same
stimulus.
Time
forms
the
third
di-
mension.
c, A
spatiotemporal
picture
of
a
moving
bar
sampled
in
time
(i.e.,
a
movie).
E.
H.
Adelson
and
J.
R.
Bergen

Vol.
2,
No.
2/February
1985/J.
Opt.
Soc.
Am.
A 287
a
b
C
d
e
Fig.
3.
a,
An
(x,
t) plot
of
a
bar
moving
to
the
right
over
time.
Time
proceeds
downward.
The
vertical
dimension
is
not
shown.
b, An
(x,
t)
plot
of
the
same
bar,
sampled
in
time.
c,
The
sampled
motion
as
displayed
in a
movie in which each
frame remains
on
until
the next
one
appears.
d,
Continuous
motion
after
spatiotemporal
blurring.
e,
Sampled
motion
after
spatiotemporal
blurring.
The
middle-
and
low-frequency
in-
formation
is
almost
the
same for
the
two
stimuli.
a
b
C
d
e
N
f
Fig.
4. (x,
t)
plots
of
moving
bars.
a,
A
movie
of
a
bar
moving
to
the
right.
b,
A
bar
moving
to
the
right
continuously.
c,
The
difference
(sampling
artifacts)
between
the
sampled and continuous
motions.
d,
A
movie
sampled
at
a
high
frame rate.
e,
Continuous
motion.
f,
The
difference
between
the
finely
sampled
and
continuous
motion. When
the
sampling
rate
is
high,
the
sampling
artifacts
become
difficult
or
impossible
to
see.
E.
H.
Adelson
and
J.
R.
Bergen

288
J.
Opt.
Soc.
Am.
A/Vol.
2,
No.
2/February
1985
inal
space-time
domain,
rather
than
in
the
frequency domain.
Figure
4
makes explicit
the
difference between
the
sampled
and
continuous
versions
of
the
moving
bar.
If
we
simply
subtract
the
continuous
pattern
(Fig. 4b)
from
the
sampled
one (Fig.
4a),
we
can
derive
a
new
spatiotemporal
plot
of
the
sampling
artifacts,
as
illustrated
in
Fig.
4c.
Since
the
differ-
ence
can
be
positive
or
negative,
we
have
displayed
it
on
a
gray
pedestal,
so
that
gray
corresponds to
zero,
white
to
positive,
and
black
to
negative. Observe
that
the
sampled-motion
stimulus
of
Fig.
4a
can
be
considered
to
be
the
sum
of
the
real
motion
of
Fig. 4b
and
the
artifacts
of
Fig.
4c.
That
is,
we
can
think
of
the
sampled
motion
as
being
continuous motion
with
sampling
noise
added
to
it.
If
the
motion
is
sampled
more
frequently
in
time,
the
ap-
proximation
to
continuous motion
is
improved,
as
shown
in
Fig.
4d.
In
this
case,
the
artifacts
(Fig.
4f)
have
rather
little
energy
in
the
range
of
frequencies
that
we
can
see.
If
sampling
is
made
frequent
enough,
there
will
plainly
come
a
point
at
which
the artifactual
components
have
so
little
energy
in
the
visible
spatial-
and temporal-frequency
range
that
they
will
become
invisible,
since
the
fine
spatiotemporal
structure
of
the
artifacts
will
be
blurred to
invisibility
by
the
spatial
and
temporal
response
of
the
eye.
At
this
point,
the
continuous
and the
sampled stimuli
will
be
perfectly indistinguishable.
Again,
it
is
not
necessary
that
the
sampled stimulus
look
identical
to
the
continuous
one
in order
for
the
motion
to
look
similar.
A
motion
mechanism
that
responds
to
low
spatial
and temporal
frequencies
will
give
the
same
responses to
the
two
stimuli,
even
if
mechanisms
sensitive
to
higher
frequencies
give
different
responses.
So
far,
we
have
discussed
the
conditions
under
which
dif-
ferent
moving
stimuli
may
be
expected to
give
similar
im-
pressions
of
motion.
But
we
have
not
discussed
how
motion
information,
in
itself,
might
be
extracted;
this
constitutes
our
next
problem.
3.
MOTION
AS
ORIENTATION
Motion
can
be
perceived
in
continuous
or
sampled displays,
when
there
is
energy
of
the
appropriate
spatiotemporal
or-
ientation.
This
is
illustrated
in
Fig.
5,
which
shows
spa-
tiotemporal
diagrams
of
a
bar:
a,
moving
quickly
to
the
left;
b,
moving
slowly
to
the
left;
c,
stationary;
d, moving
slowly
to
the
right;
and
e,
moving
quickly
to
the
right. The
velocity
is
inverse
with
the
slope.
The
problem
of
detecting
motion,
then,
is
the
problem
of
detecting
spatiotemporal orientation.
How
can
this
be
done?
We
already
know
a
way
of
detecting
orientation
in
ordinary
spatial
displays, namely,
through the
use of
oriented
receptive
fields
like
those
described
by
Hubel and
Wiesel
24
and
some-
times
referred
to
as
bar
detectors
and
edge
detectors.
Simple
cells
in
visual cortex are
now
known
to
act
more or
less
as
linear
filters:
Their
receptive-field
profiles
represent a
weighting
function,
with
both
positive
and
negative
weights,
which
may
be
taken
as
the
spatial
impulse response
of a linear
system.'
4
If
we
could
construct
a
cell
with
a
spatiotemporal
impulse
response
that
was
analogous
to
a
simple
cell's
spatial
impulse
responses,
we
would
have
the
situation
shown
at
the
bottom
of
Fig.
5
(cf.
Ross
and
Burr
9
).
The
cell's
spatiotemporal
im-
pulse
response
is
oriented
in
space
and
time.
In
Fig.
5f,
it
responds
well
to an
edge
moving
continuously
to the
right.
In
x~~~~~~
d
V=-2
V=-1
V=O
V=1
V=2
X
X
Fig.
5.
a-e,
(x,
t)
plots
of
bars
moving
to
the
left
or
to
the
right
at
various speeds.
f,
Motion
is
like
orientation
in
(x,
t),
and
a
spa-
tiotemporally
oriented
receptive
field
can be
used
to
detect
it.
g,
The
same
oriented
receptive
field
can
respond
to
sampled
motion
just
as
it
responds
to
continuous
motion.
Fig.
5g,
it
responds
well
to
a
sampled
version of
the
same
stimulus.
As
far
as
this
hypothetical
cell
is
concerned,
both
stimuli
have
substantial
rightward-motion
energy.
The
models
that
we
will
develop
will
be
based
on
idealized
mechanisms;
in discussing
these
mechanisms
we
will
use
the
terms
"unit"
and
"channel."
A
unit
corresponds
roughly
to
a
cell
or
to
a
small
set
of
cells
working in
concert
to
extract
a
simple
property
at
one
position in
the
visual
field.
A
channel
consists of an
array
of
similar
units
distributed
across
the
vi-
sual
field.
In
principle,
there
is
no
reason
why
an
oriented
unit
could
not
be
constructed
directly.
The
unit
would
gather
inputs
from an
array
of
photoreceptors
covering
the
spatial
extent
of
its
receptive
field,
and
it
would
sum
their outputs
over
time
with
the appropriate
temporal
impulse
responses.
In
practice,
however,
such
a
unit
would be
difficult
to construct
because
it
would
require
a
different
temporal
impulse response
cor-
rectly tailored to
each
spatial
position
in
the
receptive
field.
The
problem,
then,
is
to
construct
a
unit
that
responds
to
spatiotemporal
orientation
(i.e.,
motion)
and
yet
that
is
built
out
of
simple
neural
mechanisms.
In
Section
4, we
will
discuss
how
such
a
unit
can
be
built
by
combining impulse responses
that
are
space-time
separable
by
using
an
approach
similar
to
that
of
Watson
and
Ahumada.8
For
those
readers
who
are
not
entirely
comfortable with
these notions,
we
begin by re-
viewing
space-time
separability
as
well
as
spatiotemporal
impulse
responses.
4.
SPATIOTEMPORAL
IMPULSE RESPONSES
Many
cells
in
the
visual
system
respond (to
a
good
approxi-
mation)
by
performing
a weighted
integration
of
the
effect
of
light
falling
on
their
receptive
field;
the
receptive-field
profile,
with
its
positive
and
negative
lobes,
defines
the
weighting
function,
or
spatial
impulse response.
Across
the
top
of
Fig.
6
is
an
idealized
spatial
impulse response
from such
a cell.
Since
any
spatial
pattern
can
be
thought
of
as
a
sum
of
points
of
light
of
various
intensities
packed
together
side
by
side,
one
can
easily
predict
the
response
of
a
linear
unit
to
an
arbitrary
E.
H.
Adelson
and
J.
R.
Bergen

Citations
More filters
Journal ArticleDOI

Performance of optical flow techniques

TL;DR: These comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques the authors implemented.
Book

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems

Peter Dayan, +1 more
TL;DR: This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory.
Journal ArticleDOI

The design and use of steerable filters

TL;DR: The authors present an efficient architecture to synthesize filters of arbitrary orientations from linear combinations of basis filters, allowing one to adaptively steer a filter to any orientation, and to determine analytically the filter output as a function of orientation.
Proceedings ArticleDOI

SlowFast Networks for Video Recognition

TL;DR: This work presents SlowFast networks for video recognition, which achieves strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by the SlowFast concept.
Journal ArticleDOI

The variable discharge of cortical neurons: implications for connectivity, computation, and information coding

TL;DR: It is suggested that quantities are represented as rate codes in ensembles of 50–100 neurons, which implies that single neurons perform simple algebra resembling averaging, and that more sophisticated computations arise by virtue of the anatomical convergence of novel combinations of inputs to the cortical column from external sources.
References
More filters
Journal ArticleDOI

Handbook of Sensory Physiology

Journal ArticleDOI

Receptive fields of single neurones in the cat's striate cortex

TL;DR: The present investigation, made in acute preparations, includes a study of receptive fields of cells in the cat's striate cortex, which resembled retinal ganglion-cell receptive fields, but the shape and arrangement of excitatory and inhibitory areas differed strikingly from the concentric pattern found in retinalganglion cells.
Journal ArticleDOI

Application of fourier analysis to the visibility of gratings

TL;DR: The contrast thresholds of a variety of grating patterns have been measured over a wide range of spatial frequencies and the results show clear patterns of uniformity in the response to grating noise.
Journal ArticleDOI

The contrast sensitivity of retinal ganglion cells of the cat.

TL;DR: Spatial summation within cat retinal receptive fields was studied by recording from optic‐tract fibres the responses of ganglion cells to grating patterns whose luminance perpendicular to the bars varied sinusoidally about the mean level.
Book

The Interpretation of Visual Motion

TL;DR: In this paper, the authors used the methodology of artificial intelligence to investigate the phenomena of visual motion perception: how the visual system constructs descriptions of the environment in terms of objects, their three-dimensional shape, and their motion through space, on the basis of the changing image that reaches the eye.
Related Papers (5)