
A Biologically Inspired Model for the Detection
of External and Internal Head Motions
Stephan Tschechne, Georg Layher, and Heiko Neumann
Institute of Neural Information Processing, University of Ulm,
89069 Ulm, Germany
http://www.uni-ulm.de/in/neuroinformatik.html
Abstract. Non-verbal communication signals are to a large part con-
veyed by visual motion information of the user’s facial components (in-
trinsic motion) and head (extrinsic motion). An observer perceives the
visual flow as a superposition of both types of motions. However, when
visual signals are used for training of classifiers for non-articulated com-
munication signals, a decomposition is advantageous. We propose a bi-
ologically inspired model that builds upon the known functional organi-
zation of cortical motion processing at low and intermediate stages to
decompose the composite motion signal. The approach extends previous
models to incorporate mechanisms that represent motion gradients and
direction sensitivity. The neural models operate on larger spatial scales to
capture properties in flow patterns elicited by turning head movements.
Center-surround mechanisms build contrast-sensitive cells and detect lo-
cal facial motion. The model is probed with video samples and outputs
occurrences and magnitudes of extrinsic and intrinsic motion patterns.
Keywords: Social Interaction, HCI, Motion Patterns, Cortical Process-
ing
1 Introduction
The recent development of computers show a clear trend towards companion
properties [4, 14]. Systems are demanded to improve the efficiency of human com-
puter interaction (HCI) by adapting to the user’s need, experience and moods.
To achieve this, those systems must be able to interpret non-verbal communica-
tion patterns that are signalled by the user [2, 5]. These patterns are to a large
part contained in visual signals that can be captured by an interaction system,
from which they can automatically be derived [9]. Among these visual patterns
bodily and facial expressions contain important cues to emotionally relevant in-
formation, whereas both types can be either of static or dynamic nature. Optic
flow has been investigated for emotion and facial analysis earlier [13, 6, 11, 8].
However, when it comes to analysis of facial motion, extrinsic (head) and in-
trinsic (facial) movements are superpositioned and elicit a composite flow field
of respective patterns from the observer’s perspective (see Fig. 1). Subsequent
classification stages profit from a segregation of these patterns. In [13], an at-
tempt to decompose the facial optical flow into affine as well as higher-order

2 S. Tschechne, G. Layher, H. Neumann
flow patterns in order to segregate the facial motion has been proposed. In [3]
head-pose and facial expressions are estimated graphics and animation. Here, an
affine deformation with parallax is modelled to fit active contours using singular
value decomposition. In [1] the authors propose a multi-stage landmark fitting
and tracking method to derive face and head gestures.
In this paper we propose a novel mechanism to detect occurrences of basic
components of motion from optic flow. The method is studied on the example
of head movements and dynamic facial expressions which both cause optic flow
at the observer position. We model mechanisms of signal processing in early and
intermediate stages of visual cortex to provide robust automatic decomposition
of extrinsic and intrinsic facial motions. We demonstrate how occurrences of
extrinsic as well as intrinsic components are robustly derived from an optic flow
field. Our approach contrasts others by neither requiring face detection or a
previous learning phase. Additionally, processing of multiple persons comes at
no extra cost.
+ =
extrinsic intrinsic superposition
Fig. 1. Visual flow at the observer position is a superposition (right) of extrinsic (head,
left) and intrinsic (facial, middle) motion. Subsequent processing benefits from sepa-
rated processing of both sources.
2 Visual Representations of Head Movements
The instantaneous motion of a three-dimensional surface that is sensed by a sta-
tionary observer can be represented by the linear combination of its translational
and rotational motion components [10], as well as non-rigid motion caused by
object deformations. Any of these cause visual motion in the image plane. In
this paper we focus on the analysis of facial and head motion of an agent in
a communicative act by means of optic flow. This motion is composed of the
extrinsic (head) motion caused by translations and rotations and the superim-
posed internal motion of facial components (intrinsic motion). We assume that
any translational extrinsic motion of the head has been compensated to fixate
the head in the middle of the image so that the world coordinate system is cen-
tered in the moving head. We aim at spatial processing of the resulting patterns

A Model for the Detection of External and Internal Head Motions 3
to individually detect the extrinsic and intrinsic components. For a rotation of a
simple head model around the Y -axis, an arbitrary surface point P = (x, y, z)
T
appears on a rotational path in space (see Fig.2) with frequency ω and periodic
length T . P is uniquely defined by its radius r and vertical component y at time
t, when a distance d to the observer is assumed:
P (t, r, y) = r ·
cos ωt
y
sin ωt
−
0
0
d
ω =
2π
T
(1)
In the following we assume that y = 0 and r = 1. Application of the projec-
tion model with x = f · X/Z and y = f · Y /Z, given the focal length f of the
camera yields the projected coordinates P
proj
in image plane for the observer:
P
proj
(t, r) = f ·
r cos ωt
r sin ωt−d
0
(2)
If we assume constant angular speed, the apparent image speed of the pro-
jection of P is
∂P
proj
(t, r)
∂t
=
rsinωt+d−1
r
2
sin
2
ωt−2dr sinωt+d
2
0
(3)
which yields a characteristic motion pattern, see Fig. 2. The apparent mo-
tion on the positive half circle, where the facial surface is oriented towards the
observer, leads to a generic speed pattern. For a frontal motion from right to
left the pattern is composed by an acceleration, followed by maximum frontal
motion, and a symmetric deceleration patch. This pattern corresponds to the
speed gradients as investigated by [12] and is also depicted in Fig. 2. For sym-
metric and bounded objects image patches of increasing and decreasing speeds
are juxtaposed reflecting appearance and disappearance of the surface markings
on a convex rotating body surface. In Sec. 2.1 we suggest a filtering mechanism
for these arrangements of speed gradients which is tuned to such symmetric
arrangements of image motions with symmetric speed changes.
Facial motion, on the other hand, is caused by actions of facial muscles, e.g.
during verbal communication, eye blinks, or while forming facial expressions.
These spatio-temporal changes occur as deformations on a smaller temporal and
spatial scale compared to the size of the face and are characterized by changes
in motion direction and/or speed relative to its coherent surrounding motion.
In order to analyze such intrinsic facial motions, we reasoned that a center-
surround mechanism for the filtering of motion patterns within the facial region
will indicate occurrences of intrinsic motions, see Sec. 2.2.
2.1 A Model of Cortical Motion Gradient Detection
In the following we describe the implementation details of our model cells for
detecting motion patterns that are characteristic for extrinsic motions corre-
sponding to rotations around the X- or Y -axis, respectively, namely patterns

4 S. Tschechne, G. Layher, H. Neumann
rotation (radians)
proj. speed
proj. position
x
0 0.5 1 1.5 2
−0.2
0
0.15
0 0.5 1 1.5 2
−0.4
0
0.4
rotation (radians)
visible invisible
visible invisible
speed gradient
(magn.)
x
y
z
P
Fig. 2. Left: Projection model of one point on a head’s surface and its trajectory in the
projection when the head is rotating. Middle: X-position over rotation angle of point
P . Right: One-periodical plot of speed over rotation angle (dashed ) and speed gradient
(solid) of a point when rotating on a circular path around Y axis. Point P is closest at
position 1.0. Due to foreshortening effects, characteristic speed gradients occur where
the point approaches or retreats.
containing speed gradients. All presented detectors need a visual motion field
u which is transformed into a log-polar representation, the velocity space V.
In velocity space, motion is separably represented by speed s
p
= log kuk and
direction τ
p
= tan
−1
(
u
y
u
x
). This representation allows selecting image locations
containing certain motion directions, speeds, or both, which will be fundamen-
tal features for the upcoming design of gradient cells. Filters for speed ψ and
direction θ with tuning widths σ are defined by
F
S
(µ, ν) = exp(−
(µ − ψ)
2
2σ
2
ψ
) (4)
F
θ
(µ, ν) = exp(−
(ν − log(θ))
2
2σ
2
θ
) ∀(µ, ν) ∈ V. (5)
Gradient cells M
+/−
p
at image position p respond when two conditions are
served, see Fig. 3: First, conjunctive input configurations need to match their
tunings for speed and directions, and second, an increase or decrease in speed
along an axes corresponding to their directional exists. This increase or decrease
is reflected in a simultaneous stimulation of two speed- and direction-tuned cells
that are spatially arranged to build the desired speed gradient. The speed- and
direction-tuned subcells M
p
are derived from the given motion field by applying
a filter F in velocity space representation. Each cell response incorporates a
divise normalisation component in order to keep responses within bounds.
M
p
= V
p
· F (6)
M
+
p
= M
p−∆p
· M
p+∆p
· ( + M
p−∆p
+ M
p+∆p
)
−1
(0 < 1) (7)

A Model for the Detection of External and Internal Head Motions 5
Responses to flow gradients of opposite polarity are subsequently combined
by AND-gating to build a curvature detector. These cells operate at spatially
juxtaposed locations with component offsets along their directional preference
depending on the spatial scale of the filter kernels.
C
p
= M
+
p+∆p
· M
−
p−∆p
· ( + M
+
p+∆p
+ M
−
p−∆p
)
−1
(8)
In order to make the response more robust and selective to motion direction
this curvature response is combined with the output of a motion integration cell
M
p
. The final response is thus defined by
R
p
= M
+
p+∆p
· M
p
· M
−
p−∆p
· ( + M
+
p+∆p
+ M
p
+ M
−
p−∆p
)
−1
(9)
center
surround
position
activation
Fig. 3. Left: Gradient Cell Mp
+
. Middle: The full model cell circuit for detecting
rotational motion patterns. Oppositely tuned gradient cells are combined with cells
sensitive to uniform motion patterns. Right: Center-surround cell for motion disconti-
nuity detection with two centered subcells with different spatial tunings and integration
weights for center and surround cell.
2.2 Model Mechanisms for Motion Contrast Detection
Local facial motions can be accounted by mechanisms that operate on a smaller
scale within the facial projection into the image plane. To detect intrinsic motion,
we propose cells that are sensitive to local changes in speed and direction. These
motion patterns are produced in the facial plane while the person is talking
or during other facial actions. Here, we employ a center-surround interaction
mechanism of motion sensitive cells that are able to detect local variations in
visual flow, but won’t be sensitive to large uniform flow fields. Such a sensitivity
can be generated by cells with antagonistic center-surround motion sensitivity.
The input integration of velocity responses is defined by weighted kernels Ω
with different spatial scale dimensions operating on responses of motion and
speed selective filters. Integration over N directions yields the activation for a
direction-insensitive motion contrast cell.
D
sub
p,θ
= V
p
∗ F
¯µ
(10)