Eye Movement Analysis for Activity Recognition Using Electrooculography

doi:10.1109/TPAMI.2010.86

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, - PREPRINT - 1

Eye Movement Analysis for Activity Recognition

Using Electrooculography

Andreas Bulling, Student Member, IEEE, Jamie A. Ward,

Hans Gellersen, and Gerhard Tr

¨

oster, Senior Member, IEEE

Abstract—In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement

data was recorded using an electrooculography (EOG) system. We ﬁrst describe and evaluate algorithms for detecting three eye

movement characteristics from EOG signals - saccades, ﬁxations, and blinks - and propose a method for assessing repetitive patterns

of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum

redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an ofﬁce

environment using an example set of ﬁve activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a

video, and browsing the web. We also include periods with no speciﬁc activity (the NULL class). Using a support vector machine (SVM)

classiﬁer and person-independent (leave-one-person-out) training, we obtain an average precision of 76.1% and recall of 70.5% over

all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on

the wider applicability of EAR to other activities that are difﬁcult, or even impossible, to detect using common sensing modalities.

Index Terms—Ubiquitous computing, Feature evaluation and selection, Pattern analysis, Signal processing.

F

1 INTRODUCTION

H

UMAN activity recognition has become an impor-

tant application area for pattern recognition. Re-

search in computer vision has traditionally been at the

forefront of this work [1], [2]. The growing use of am-

bient and body-worn sensors has paved the way for

other sensing modalities, particularly in the domain of

ubiquitous computing. Important advances in activity

recognition were achieved using modalities such as body

movement and posture [3], sound [4], or interactions

between people [5].

There are, however, limitations to current sensor con-

ﬁgurations. Accelerometers or gyroscopes, for example,

are limited to sensing physical activity; they cannot

easily be used for detecting predominantly visual tasks,

such as reading, browsing the web, or watching a video.

Common ambient sensors, such as reed switches or light

sensors, are limited in that they only detect basic activity

events, e.g. entering or leaving a room, or switching an

appliance. Further to these limitations, activity sensing

using subtle cues, such as user attention or intention,

remains largely unexplored.

A rich source of information, as yet unused for activity

recognition, is the movement of the eyes. The movement

patterns our eyes perform as we carry out speciﬁc activi-

• A. Bulling and G. Tr¨oster are with the Wearable Computing Laboratory,

Department of Information Technology and Electrical Engineering, Swiss

Federal Institute of Technology (ETH) Zurich, Gloriastrasse 35, 8092

Zurich, Switzerland. E-mail: {bulling, troester}@ife.ee.ethz.ch

• J. A. Ward and H. Gellersen are with the Computing Department, Lan-

caster University, InfoLab 21, South Drive, Lancaster, United Kingdom,

LA1 4WA. E-mail: {j.ward, hwg}@comp.lancs.ac.uk

• Corresponding author: A. Bulling, bulling@ife.ee.ethz.ch

ties have the potential to reveal much about the activities

themselves - independently of what we are looking at.

This includes information on visual tasks, such as read-

ing [6], information on predominantly physical activities,

such as driving a car, but also on cognitive processes

of visual perception, such as attention [7] or saliency

determination [8]. In a similar manner, location or a par-

ticular environment may inﬂuence our eye movements.

Because we use our eyes in almost everything that we

do, it is conceivable that eye movements provide useful

information for activity recognition.

Developing sensors to record eye movements in daily

life is still an active topic of research. Mobile settings

call for highly miniaturised, low-power eye trackers with

real-time processing capabilities. These requirements are

increasingly addressed by commonly used video-based

systems of which some can now be worn as relatively

light headgear. However, these remain expensive, with

demanding video processing tasks requiring bulky aux-

illiary equipment. Electrooculography (EOG) - the mea-

surement technique used in this work - is an inexpensive

method for mobile eye movement recordings; it is com-

putationally light-weight and can be implemented using

wearable sensors [9]. This is crucial with a view to long-

term recordings in mobile real-world settings.

1.1 Paper Scope and Contributions

The aim of this work is to assess the feasibility of recog-

nising human activity using eye movement analysis, so-

called eye-based activity recognition (EAR)

1

. The speciﬁc

contributions are: (1) the introduction of eye movement

1. An earlier version of this paper was published in [10].

0000–0000/00$00.00

c

 2010 IEEE

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, - PREPRINT -

analysis as a new sensing modality for activity recog-

nition; (2) the development and characterisation of new

algorithms for detecting three basic eye movement types

from EOG signals (saccades, ﬁxations, and blinks) and a

method to assess repetitive eye movement patterns; (3)

the development and evaluation of 90 features derived

from these eye movement types; and (4) the implementa-

tion of a method for continuous EAR, and its evaluation

using a multi-participant EOG dataset involving a study

of ﬁve real-world ofﬁce activities.

1.2 Paper Organisation

We ﬁrst survey related work, introduce EOG, and de-

scribe the main eye movement characteristics that we

identify as useful for EAR. We then detail and charac-

terise the recognition methodology: the methods used

for removing drift and noise from EOG signals, and the

algorithms developed for detecting saccades, ﬁxations,

blinks, and for analysing repetitive eye movement pat-

terns. Based on these eye movement characteristics, we

develop 90 features; some directly derived from a partic-

ular characteristic, others devised to capture additional

aspects of eye movement dynamics.

We rank these features using minimum redundancy

maximum relevance feature selection (mRMR) and a

support vector machine (SVM) classiﬁer. To evaluate

both algorithms on a real-world example, we devise an

experiment involving a continuous sequence of ﬁve of-

ﬁce activities, plus a period without any speciﬁc activity

(the NULL class). Finally, we discuss the ﬁndings gained

from this experiment and give an outlook to future work.

2 RELATED WORK

2.1 Electrooculography Applications

Eye movement characteristics such as saccades, ﬁxations,

and blinks, as well as deliberate movement patterns

detected in EOG signals, have already been used for

hands-free operation of static human-computer [11] and

human-robot [12] interfaces. EOG-based interfaces have

also been developed for assistive robots [13] or as a

control for an electric wheelchair [14]. Such systems

are intended to be used by physically disabled people

who have extremely limited peripheral mobility but still

retain eye-motor coordination. These studies showed

that EOG is a measurement technique that is inexpensive,

easy to use, reliable, and relatively unobtrusive when

compared to head-worn cameras used in video-based

eye trackers. While these applications all used EOG as a

direct control interface, our approach is to use EOG as a

source of information on a person’s activity.

2.2 Eye Movement Analysis

A growing number of researchers use video-based eye

tracking to study eye movements in natural environ-

ments. This has led to important advances on our un-

derstanding of how the brain processes tasks, and of

the role that the visual system plays in this [15]. Eye

movement analysis has a long history as a tool to inves-

tigate visual behaviour. In an early study, Hacisalihzade

et al. used Markov processes to model visual ﬁxations of

observers recognising an object [16]. They transformed

ﬁxation sequences into character strings and used the

string edit distance to quantify the similarity of eye

movements. Elhelw et al. used discrete time Markov

chains on sequences of temporal ﬁxations to identify

salient image features that affect the perception of visual

realism [17]. They found that ﬁxation clusters were able

to uncover the features that most attract an observer’s

attention. Dempere-Marco et al. presented a method for

training novices in assessing tomography images [18].

They modelled the assessment behaviour of domain

experts based on the dynamics of their saccadic eye

movements. Salvucci et al. evaluated means for auto-

mated analysis of eye movements [19]. They described

three methods based on sequence-matching and hidden

Markov models that interpreted eye movements as accu-

rately as human experts but in signiﬁcantly less time.

All of these studies aimed to model visual behaviour

during speciﬁc tasks using a small number of well-

known eye movement characteristics. They explored the

link between the task and eye movements, but did not

recognise the task or activity using this information.

2.3 Activity Recognition

In ubiquitous computing, one goal of activity recognition

is to provide information that allows a system to best

assist the user with his or her task [20]. Traditionally,

activity recognition research has focused on gait, posture,

and gesture. Bao et al. used body-worn accelerometers

to detect 20 physical activities, such as cycling, walking

and scrubbing the ﬂoor, under real-world conditions [21].

Logan et al. studied a wide range of daily activities, such

as using a dishwasher, or watching television, using a

large variety and number of ambient sensors, including

RFID tags and infra-red motion detectors [22]. Ward et al.

investigated the use of wrist worn accelerometers and mi-

crophones in a wood workshop to detect activities such

as hammering, or cutting wood [4]. Several researchers

investigated the recognition of reading activity in sta-

tionary and mobile settings using different eye tracking

techniques [6], [23]. Our work, however, is the ﬁrst to

describe and apply a general-purpose architecture for

EAR to the problem of recognising everyday activities.

3 BACKGROUND

3.1 Electrooculography

The eye can be modelled as a dipole with its positive

pole at the cornea and its negative pole at the retina.

Assuming a stable corneo-retinal potential difference, the

eye is the origin of a steady electric potential ﬁeld. The

electrical signal that can be measured from this ﬁeld is

called the electrooculogram (EOG).

BULLING ET AL.: EYE MOVEMENT ANALYSIS FOR ACTIVITY RECOGNITION USING ELECTROOCULOGRAPHY 3

Time [sec]

B S S S S F B

EOG

h

v

0

1 2

3

4

5 6

7

8 9

Fig. 1. Denoised and baseline drift removed horizontal

(EOG

h

) and vertical (EOG

v

) signal components. Exam-

ples of the three main eye movement types are marked

in grey: saccades (S), ﬁxations (F), and blinks (B).

If the eye moves from the centre position towards

the periphery, the retina approaches one electrode while

the cornea approaches the opposing one. This change

in dipole orientation causes a change in the electric

potential ﬁeld and thus the measured EOG signal ampli-

tude. By analysing these changes, eye movements can

be tracked. Using two pairs of skin electrodes placed

at opposite sides of the eye and an additional reference

electrode on the forehead, two signal components (EOG

h

and EOG

v

), corresponding to two movement compo-

nents - a horizontal and a vertical - can be identiﬁed.

EOG typically shows signal amplitudes ranging from 5

µV/degree to 20 µV/degree and an essential frequency

content between 0 Hz and 30 Hz [24].

3.2 Eye Movement Types

To be able to use eye movement analysis for activity

recognition, it is important to understand the different

types of eye movement. We identiﬁed three basic eye

movement types that can be easily detected using EOG:

saccades, ﬁxations, and blinks (see Fig. 1).

3.2.1 Saccades

The eyes do not remain still when viewing a visual scene.

Instead, they have to move constantly to build up a

mental “map” from interesting parts of that scene. The

main reason for this is that only a small central region of

the retina, the fovea, is able to perceive with high acuity.

The simultaneous movement of both eyes is called a

saccade. The duration of a saccade depends on the

angular distance the eyes travel during this movement:

the so-called saccade amplitude. Typical characteristics

of saccadic eye movements are 20 degrees for the ampli-

tude, and 10 ms to 100 ms for the duration [25].

3.2.2 Fixations

Fixations are the stationary states of the eyes during

which gaze is held upon a speciﬁc location in the visual

Baseline Drift

Removal

Blink

Detection

Feature

Extraction

Baseline Drift

Removal

Noise

Removal

Noise

Removal

Saccade

Detection

EOG

h

EOG

v

Eye Movement

Encoding

Wordbook

Analysis

Feature

Selection

Classification

Saccade

Detection

Fixation

Detection

Fig. 2. Architecture for eye-based activity recognition on

the example of EOG. Light grey indicates EOG signal

processing; dark grey indicates use of a sliding window.

scene. Fixations are usually deﬁned as the time between

each two saccades. The average ﬁxation duration lies

between 100 ms and 200 ms [26].

3.2.3 Blinks

The frontal part of the cornea is coated with a thin

liquid ﬁlm, the so-called “precornial tear ﬁlm”. To spread

this ﬂuid across the corneal surface, regular opening

and closing of the eyelids, or blinking, is required. The

average blink rate varies between 12 and 19 blinks per

minute while at rest [27]; it is inﬂuenced by environ-

mental factors such as relative humidity, temperature

or brightness, but also by physical activity, cognitive

workload, or fatigue [28]. The average blink duration

lies between 100 ms and 400 ms [29].

4 METHODOLOGY

We ﬁrst provide an overview of the architecture for EAR

used in this work. We then detail our algorithms for

removing baseline drift and noise from EOG signals, for

detecting the three basic eye movement types, and for

analysing repetitive patterns of eye movements. Finally,

we describe the features extracted from these basic eye

movement types, and introduce the minimum redun-

dancy maximum relevance feature selection, and the

support vector machine classiﬁer.

4.1 Recognition Architecture

Fig. 2 shows the overall architecture for EAR. The

methods were all implemented ofﬂine using MATLAB

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, - PREPRINT -

and C. Input to the processing chain are the two EOG

signals capturing the horizontal and the vertical eye

movement components. In the ﬁrst stage, these signals

are processed to remove any artefacts that might hamper

eye movement analysis. In the case of EOG signals, we

apply algorithms for baseline drift and noise removal.

Only this initial processing depends on the particular eye

tracking technique used; all further stages are completely

independent of the underlying type of eye movement

data. In the next stage, three different eye movement

types are detected from the processed eye movement

data: saccades, ﬁxations, and blinks. The corresponding

eye movement events returned by the detection algo-

rithms are the basis for extracting different eye move-

ment features using a sliding window. In the last stage, a

hybrid method selects the most relevant of these features,

and uses them for classiﬁcation.

4.2 EOG Signal Processing

4.2.1 Baseline Drift Removal

Baseline drift is a slow signal change superposing the

EOG signal but mostly unrelated to eye movements. It

has many possible sources, e.g. interfering background

signals or electrode polarisation [30]. Baseline drift only

marginally inﬂuences the EOG signal during saccades,

however, all other eye movements are subject to baseline

drift. In a ﬁve electrode setup, as used in this work

(see Fig. 8), baseline drift may also differ between the

horizontal and vertical EOG signal component.

Several approaches to remove baseline drift from elec-

trocardiography signals (ECG) have been proposed (for

example see [31], [32], [33]). As ECG shows repetitive sig-

nal characteristics, these algorithms perform sufﬁciently

well at removing baseline drift. However, for signals

with non-repetitive characteristics such as EOG, devel-

oping algorithms for baseline drift removal is still an

active area of research. We used an approach based on

wavelet transform [34]. The algorithm ﬁrst performed

an approximated multilevel 1-D wavelet decomposition

at level nine using Daubechies wavelets on each EOG

signal component. The reconstructed decomposition co-

efﬁcients gave a baseline drift estimation. Subtracting

this estimation from each original signal component

yielded the corrected signals with reduced drift offset.

4.2.2 Noise Removal

EOG signals may be corrupted with noise from different

sources, such as the residential power line, the measure-

ment circuitry, electrodes and wires, or other interfering

physiological sources such as electromyographic (EMG)

signals. In addition, simultaneous physical activity may

cause the electrodes to loose contact or move on the skin.

As mentioned before, EOG signals are typically non-

repetitive. This prohibits the application of denoising

algorithms that make use of structural and temporal

knowledge about the signal.

Several EOG signal characteristics need to be pre-

served by the denoising. First, the steepness of signal

edges needs to be retained to be able to detect blinks

and saccades. Second, EOG signal amplitudes need to

be preserved to be able to distinguish between different

types and directions of saccadic eye movements. Finally,

denoising ﬁlters must not introduce signal artefacts that

may be misinterpreted as saccades or blinks in subse-

quent signal processing steps.

To identify suitable methods for noise removal we

compared three different algorithms on real and syn-

thetic EOG data: a low-pass ﬁlter, a ﬁlter based on

wavelet shrinkage denoising [35] and a median ﬁlter. By

visual inspection of the denoised signal we found that

the median ﬁlter performed best; it preserved edge steep-

ness of saccadic eye movements, retained EOG signal

amplitudes, and did not introduce any artiﬁcial signal

changes. It is crucial, however, to choose a window

size W

mf

that is small enough to retain short signal

pulses, particularly those caused by blinks. A median

ﬁlter removes pulses of a width smaller than about half

of its window size. By taking into account the average

blink duration reported earlier, we ﬁxed W

mf

to 150 ms.

4.3 Detection of Basic Eye Movement Types

Different types of eye movements can be detected from

the processed EOG signals. In this work, saccades, ﬁx-

ations, and blinks form the basis of all eye movement

features used for classiﬁcation. The robustness of the

algorithms for detecting these is key to achieving good

recognition performance. Saccade detection is particu-

larly important because ﬁxation detection, eye move-

ment encoding, and the wordbook analysis are all reliant

on it (see Fig. 2). In the following, we introduce our

saccade and blink detection algorithms and characterise

their performance on EOG signals recorded under con-

strained conditions.

4.3.1 Saccade and Fixation Detection

For saccade detection, we developed the so-called Con-

tinuous Wavelet Transform - Saccade Detection (CWT-SD)

algorithm (see Fig. 3 for an example). Input to CWT-SD

are the denoised and baseline drift removed EOG signal

components EOG

h

and EOG

v

. CWT-SD ﬁrst computes

the continuous 1-D wavelet coefﬁcients at scale 20 using

a Haar mother wavelet. Let s be one of these signal

components and ψ the mother wavelet. The wavelet

coefﬁcient C

a

b

of s at scale a and position b is deﬁned:

C

a

b

(s) =

Z

R

s(t)

1

√

a

ψ



t − b

a



dt.

By applying an application-speciﬁc threshold th

sd

on

the coefﬁcients C

i

(s) = C

20

i

(s), CWT-SD creates a vector

M with elements M

i

:

BULLING ET AL.: EYE MOVEMENT ANALYSIS FOR ACTIVITY RECOGNITION USING ELECTROOCULOGRAPHY 5

0 1 2 3 4 5

0

1

-1

L r

S

A

r

M

small

M

large

time [s]

sd

-th

sd

th

sd

-th

sd

th

EOG

h

r

a)

b)

c)

d)

EOG

wl

large

small

Fig. 3. Continuous Wavelet Transform - Saccade Detec-

tion (CWT-SD) algorithm. (a) Denoised and baseline drift

removed horizontal EOG signal during reading with exam-

ple saccade amplitude (S

A

); (b) the transformed wavelet

signal (EOG

wl

), with application-speciﬁc small (±th

small

)

and large (±th

large

) thresholds; (c) marker vectors for

distinguishing between small (M

small

) and large (M

large

)

saccades, and (d) example character encoding for part of

the EOG signal.

M

i

=











1, ∀i : C

i

(s) < −th

sd

,

−1, ∀i : C

i

(s) > th

sd

,

0, ∀i : −th

sd

≤ C

i

(s) ≤ th

sd

.

This step divides EOG

h

and EOG

v

in saccadic (M =

1, −1) and non-saccadic (ﬁxational) (M = 0) segments.

Saccadic segments shorter than 20 ms and longer than

200 ms are removed. These boundaries approximate the

typical physiological saccade characteristics described in

literature [25]. CWT-SD then calculates the amplitude,

and direction of each detected saccade. The saccade

amplitude S

A

is the difference in EOG signal amplitude

before and after the saccade (c.f. Fig. 3). The direction

is derived from the sign of the corresponding elements

in M. Finally, each saccade is encoded into a character

representing the combination of amplitude and direction.

For example, a small saccade in EOG

h

with negative

direction gets encoded as “r” and a large saccade with

positive direction as “L”.

Humans typically alternate between saccades and ﬁxa-

tions. This allows us to also use CWT-SD for detecting ﬁx-

ations. The algorithm exploits the fact that gaze remains

stable during a ﬁxation. This results in the corresponding

gaze points, i.e. the points in visual scene gaze is directed

at, to cluster together closely in time. Therefore, ﬁxations

can be identiﬁed by thresholding on the dispersion of

these gaze points [36]. For a segment S of length n

comprised of a horizontal S

h

and a vertical S

v

EOG

signal component, the dispersion is calculated as

Dispersion(S) = max(S

h

)−min(S

h

)+max(S

v

)−min(S

v

)

Initially all non-saccadic segments are assumed to con-

tain a ﬁxation. The algorithm then drops segments for

which the dispersion is above a maximum threshold th

fd

of 10,000, or if its duration is below a minimum threshold

th

fd

t

of 200 ms. The value of th

fd

was derived as part of

the CWT-SD evaluation; that of th

fd

t

approximates the

typical average ﬁxation duration reported earlier.

A particular activity may require saccadic eye move-

ments of different distance and direction. For example,

reading involves a fast sequence of small saccades while

scanning each line of text while large saccades are re-

quired to jump back to the beginning of the next line.

We opted to detect saccades with two different ampli-

tudes, “small” and “large”. This requires two thresholds

th

sd

small

and th

sd

large

to divide the range of possible

values of C into three bands (see Fig. 3): no saccade

(−th

sd

small

< C < th

sd

small

), small saccade (−th

sd

large

<

C < −th

sd

small

or th

sd

small

< C < th

sd

large

), and large

saccade (C < −th

sd

large

or C > th

sd

large

). Depending on

its peak value, each saccade is then assigned to one of

these bands.

To evaluate the CWT-SD algorithm, we performed

an experiment with ﬁve participants - one female and

four male (age: 25 - 59 years, mean = 36.8, sd = 15.4).

To cover effects of differences in electrode placement

and skin contact the experiment was performed on two

different days; in between days the participants took

off the EOG electrodes. A total of twenty recordings

were made per participant, 10 per day. Each experi-

ment involved tracking the participants’ eyes while they

followed a sequence of ﬂashing dots on a computer

screen. We used a ﬁxed sequence to simplify labelling

of individual saccades. The sequence was comprised of

10 eye movements consisting of ﬁve horizontal and eight

vertical saccades. This produced a total of 591 horizontal

and 855 vertical saccades.

By matching saccade events with the annotated

ground truth we calculated true positives (T P ), false

positives (F P ) and false negatives (F N), and from these,

precision (

T P

T P +F P

), recall (

T P

T P +F N

), and the F1 score

(2∗

precision∗recall

precision+recall

). We then evaluated the F1 score across

a sweep on the CWT-SD threshold th

sd

= 1 . . . 50 (in

50 steps) separately for the horizontal and vertical EOG

signal components. Fig. 4 shows the mean F1 score over

all ﬁve participants with vertical lines indicating the

standard deviation for selected values of th

sd

. What can

be seen from the ﬁgure is that similar thresholds were

used to achieve the top F1 scores of about 0.94. It is

interesting to note that the standard deviation across all

participants reaches a minimum for a whole range of

values around this maximum. This suggests that also

thresholds close to this point can be selected that still

achieve robust detection performance.

Eye Movement Analysis for Activity Recognition Using Electrooculography

Figures

Citations

A tutorial on human activity recognition using body-worn inertial sensors

A Tutorial on Human Activity Recognition Using Body-Worn

Deep learning for healthcare applications based on physiological signals: A review.

A Comprehensive Survey of Deep Learning for Image Captioning

EmotionMeter: A Multimodal Framework for Recognizing Human Emotions

References

De-noising by soft-thresholding

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

LIBLINEAR: A Library for Large Linear Classification

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

Activity recognition from user-annotated acceleration data

Related Papers (5)

System for assisted mobility using eye movements based on electrooculography

Identifying fixations and saccades in eye-tracking protocols

Design of a Novel Efficient Human–Computer Interface: An Electrooculagram Based Virtual Keyboard

Eye Tracking Methodology: Theory and Practice

Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction