How many Gabor wavelet coefficients are calculated at 20 locations?

In the experiment, total 800 Gabor wavelet coefficients corresponding 5-scale and 8-orientation are calculated at 20 specific locations.

What is the method for recognizing emotion-specified expressions?

Previous work suggests that the appearance-based methods (specifically Gabor wavelets) can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) [11, 20] and single AUs [5] under four conditions.

(Open Access) Evaluation of Gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity (2002) | Yingli Tian

Q: What have the authors contributed in "Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity" ?

This paper evaluates a Gabor-wavelet-based method to recognize AUs in image sequences of increasing complexity.

Q: How many Gabor wavelet coefficients are calculated in 20 locations?

In their implementation, 800 Gabor wavelet coefficients are calculated in 20 locations which are automatically defined based on the geometric features in the upper face (Figure 2).

Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition

in Image Sequences of Increasing Complexity

Ying-li Tian

Takeo Kanade

and Jeffrey F. Cohn

2,3

IBM T. J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213

Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260

Email: yltian@us.ibm.com, tk@cs.cmu.edu jeffcohn@pitt.edu

Abstract

Previous work suggests that Gabor-wavelet-based meth-

ods can achieve high sensitivity and speciﬁcity for emotion-

speciﬁed expressions (e.g., happy, sad) and single action

units (AUs) of the Facial Action Coding System (FACS).

This paper evaluates a Gabor-wavelet-based method to rec-

ognize AUs in image sequences of increasing complexity. A

recognition rate of 83% is obtained for three single AUs

when image sequences contain homogeneous subjects and

are without observable head motion. The accuracy of AU

recognition decreases to 32% when the number of AUs in-

creases to nine and the image sequences consist of AU com-

binations, head motion, and non-homogeneous subjects.

For comparison, an average recognition rate of 87.6% is

achieved for the geometry-feature-based method. The best

recognition is a rate of 92.7% obtained by combining Ga-

bor wavelets and geometry features.

1. Introduction

In facial feature extraction of expression analysis, there

are mainly two types of approaches: geometric feature-

based methods and appearance-based methods [1, 2, 3, 5,

6, 7, 10, 11, 12, 13, 15, 17, 16, 18, 19]. The geomet-

ric facial features present the shape and locations of fa-

cial components (including mouth, eyes, brows, nose etc.).

The facial components or facial feature points are extracted

to form a feature vector that represents the face geometry.

In appearance-based methods, image ﬁlters, such as Gabor

wavelets, are applied to either the whole-face or speciﬁc re-

gions in a face image to extract a feature vector.

Zhang

et al.

[20] have compared two type of features to

recognize expressions, the geometric positions of 34 ﬁdu-

cial points on a face and 612 Gabor wavelet coefﬁcients

extracted from the face image at these 34 ﬁducial points.

The recognition rates for six emotion-speciﬁed expressions

(e.g. joy and anger) were signiﬁcantly higher for Gabor

wavelet coefﬁcients. Recognition of FACS AUs was not

tested. Bartlett

et al.

[1] compared optical ﬂow, geometric

features, and principle component analysis (PCA) to recog-

nize 6 individual upper face AUs (AU1, AU2, AU4, AU5,

AU6, and AU7) without combinations. The best perfor-

mance was achieved by PCA. Donato

et al.

[5] compared

several techniques for recognizing 6 single upper face AUs

and 6 lower face AUs. These techniques include optical

ﬂow, principal component analysis, independent compo-

nent analysis, local feature analysis, and Gabor wavelet rep-

resentation. The best performances were obtained using a

Gabor wavelet representation and independent component

analysis. All of these systems [1, 5, 20] used a manual step

to align each input image with a standard face image using

the center of the eyes and mouth.

Previous work suggests that the appearance-based meth-

ods (speciﬁcally Gabor wavelets) can achieve high sensitiv-

ity and speciﬁcity for emotion-speciﬁed expressions (e.g.,

happy, sad) [11, 20] and single AUs [5] under four condi-

tions. (1) Subjects were homogeneous either all Japanese

or all Euro-American. (2) Head motion was excluded. (3)

Face images were aligned and cropped to a standard size.

(4) Speciﬁc-emotion expression or single AUs were recog-

nized. In multi-culture society, expression recognition must

be robust to variations of face shape, proportion, and skin

color. Facial expression typically consists of AU combi-

nations, that often occur together with head motion. AUs

can occur either singly or in combination. When AUs occur

in combination they may be

additive

, in which the combi-

nation does not change the appearance of the constituent

AUs, or

non-additive

, in which the appearance of the con-

stituents does change. The non-additive AU combinations

make recognition more difﬁcult.

In this paper, we investigate the AU recognition accuracy

of Gabor wavelets for both single AUs and AU combina-

tions. We also compare the Gabor-wavelet-based method

and the geometry-feature-based method for AU recognition

Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02)

in a more complex image database than have been used in

previous studies of facial expression analysis using Gabor

wavelets. The database consists of image sequences from

subjects of European, African, and Asian ancestry. Small

head motions and multiple AUs are included. For 3 sin-

gle AUs without head motion, a recognition rate of 83%

is obtained for the Gabor-wavelet-based method. When

the number of recognized AUs increases to 9 and the im-

age sequences consists of AU combinations, head motions,

and non-homogeneous subjects, the accuracy of the Gabor-

wavelet-based method decreases to 32%. In comparison,

an average recognition rate of 87.6% is achieved for the

geometry-feature-based method, and the best recognition

rate of 92.7% obtained by combining the Gabor-wavelet-

based method and the geometry-feature-based method.

2. Facial Feature Extraction

Contracting the facial muscles produces changes in both

the direction and magnitude of skin surface displacement,

and in the appearance of permanent and transient facial fea-

tures. Examples of permanent features are eyes, brow, and

any furrows that have become permanent with age. Tran-

sient features include facial lines and furrows that are not

present at rest. In order to analyze a sequence of images,

we assume that the ﬁrst frame is a neutral expression. Af-

ter initializing the templates of the permanent features in

the ﬁrst frame, both geometric facial features and Gabor

wavelets coefﬁcients are automatically extracted the whole

image sequence. No face crop or alignment is necessary.

2.1. Geometric facial features

Figure 1. Multi-state models for geometric fea-

ture extraction.

To detect and track changes of facial components in near

frontal face images, multi-state models are developed to ex-

tract the geometric facial features (Fig. 1). A three-state lip

model describes lip state: open, closed, and tightly closed.

A two-state model (open or closed) is used for each of the

eyes. Each brow and cheek has a one-state model. Transient

facial features, such as nasolabial furrows, have two states:

present and absent. Given an image sequence, the region

of the face and approximate location of individual face fea-

tures are detected automatically in the initial frame [14].

The contours of the face features and components then are

adjusted manually in the initial frame. Both permanent

(e.g., brows, eyes, lips) and transient (lines and furrows)

face feature changes are automatically detected and tracked

in the image sequence. We group 15 parameters which de-

scribe shape, motion, eye state, motion of brow and cheek,

and furrows in the upper face. These parameters are geo-

metrically normalized to compensate for image scale and

in-plane head motion based two inner corners of the eyes.

Details of geometric feature extraction can be found in pa-

per [16].

2.2. Gabor wavelets

Figure 2. Locations to calculate Gabor coeﬃ-

cients in upper face.

We use Gabor wavelets to extract the facial appearance

changes as a set of multi-scale and multi-orientation coefﬁ-

cients. The Gabor ﬁlter may be applied to speciﬁc locations

on a face or to the whole face image [4, 5, 9, 17, 20]. Fol-

lowing Zhang

et al.

[20], we use the Gabor ﬁlter in a selec-

tive way, for particular facial locations instead of the whole

face image.

The response image of the Gabor ﬁlter can be written as

a correlation of the input image I(x), with the Gabor kernel

(x)

) =

Z Z

I(x)p

(x − x

)dx, (1)

where the Gabor ﬁlter p

(x) can be formulated [4]:

(x) =

exp(−

2σ

)

exp(ikx) − exp(−

)

(2)

where k is the characteristic wave vector.

In our implementation, 800 Gabor wavelet coefﬁcients

are calculated in 20 locations which are automatically de-

ﬁned based on the geometric features in the upper face

(Figure 2). We use σ = π, ﬁve spatial frequencies with

wavenumbers k

= (

), and 8 orientations

from 0 to π differing by π/8. In general, p

(x) is com-

plex. In our approach, only the magnitudes are used be-

cause they vary slowly with the position while the phases

are very sensitive. Therefore, for each location, we have 40

Gabor wavelet coefﬁcients.

Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02)

3. Evaluation of Gabor-Wavelet-Based AU

Recognition in Image Sequences of In-

creasing Complexity

3.1. Experimental Setup

AUs to be Recognized: Figure 3 shows the AUs to

be recognized and their Gabor images when the spatial

frequency=

in horizontal orientation. AU 43 (close) and

AU 45 (blink) differ from each other in the duration of eye

closure. Because AU duration is not considered and AU 46

(wink) is close eye only in left or right, we pool AU 43,

AU 45, and AU 46 as one unit in this paper. AU 1 (inner

brow raise), AU 2 (outer brow raise) and AU 4 (brows pull

together and lower) describe actions of brows. Figure 3(h)

shows an AU combination.

Database: The Cohn-Kanade expression database [8] is

used in our experiments. The database contains image se-

quences from 210 subjects between the ages of 18 and

50 years. They were 69% female, 31% male, 81% Euro-

American, 13% Afro-American, and 6% other groups. Over

90% of the subjects had no prior experience in FACS. Sub-

jects were instructed by an experimenter to perform single

AUs and AU combinations. Subjects sat directly in front

of the camera and performed a series of facial behaviors

which was recorded in an observation room. Image se-

quences with in-plane and limited out-of-plane motion are

included. The image sequences began with a neutral face

and were digitized into 640x480 pixel arrays with either 8-

bit gray-scale or 24-bit color values. Face size varies be-

tween 90×80 and 220×200 pixels. No face alignment or

cropping is performed.

AU Recognition NNs: We use a three-layer neural network

with one hidden layer to recognize AUs by a standard back-

propagation method. The network is shown in Figure 4, and

could be divided into two components. The sub-network

shown in Figure 4(a) is used for recognizing AU by using

the geometric features alone. The inputs of the neural net-

work are the 15 geometric feature parameters. The sub-

network shown in Figure 4(b) is used for recognizing AUs

by using Gabor wavelets. The inputs are the Gabor coefﬁ-

cients extracted based on 20 locations. For using both geo-

metric features and regional appearance patterns, these two

sub-networks are applied in concert. The outputs are the

recognized AUs. Each output unit gives an estimate of the

probability of the input image consisting of the associated

AUs. The networks are trained to respond to the designated

AUs whether they occur singly or in combination. When

AUs occur in combination, multiple output nodes are ex-

cited.

3.2. Experimental Results

First, we report the recognition results of Gabor wavelets

for single AUs (AU 41, AU 42, and AU 43). Then, AU

Recognition of Gabor wavelets for image sequences of

(a) AU0 (neutral)

(b) AU41 (lid droop)

(d) AU43/45/46 (eye close)

(e) AU4 (brow lowerer)

(f) AU6 (cheek raiser)

(g) AU7 (lid tightener)

(h) AU1+2+5 (upper lid and brow raiser)

Figure 3. AUs to be recognized and their Ga-

bor images when the spatial frequency=

horizontal orientation.

Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02)

Figure 4. AU recognition neural networks.

increasing complexity is investigated. Because input se-

quences contain multiple AUs, several outcomes are pos-

sible. Correct denotes that target AUs are recognized.

Missed denotes that some but not all of the target AUs are

recognized. F alse denotes that AUs that do not occur are

falsely recognized. For comparison, the AU recognition re-

sults of the geometric-feature-based method are reported.

Best AU recognition results are achieved by combining Ga-

bor wavelets and geometric features.

AU Recognition of Gabor wavelets for single AUs: In this

investigation, we focus on recognition of AU41, AU42, and

AU43 by Gabor wavelets. We selected 33 sequences from

21 subjects for training and 17 sequences from 12 subjects

for testing. All subjects are Euro-American without observ-

able head motions. The data distribution of training and test

data sets is shown in Table 1.

Table 1. Data distribution of training and test

data sets for single AU recognition.

Data Set

AU 41

AU 42

AU 43

Total

T rain

241

T est

112

Table 2 shows the recognition results for 3 single AUs

(AU 41, AU 42, and AU 43) when we use three feature

points of the eye and three spatial frequencies of Gabor

wavelet (

). The average recognition rate is 83%.

More speciﬁcally, 93% for AU41, 70% for AU42, and 81%

for AU43. These are comparable to the reliability of differ-

ent human coders.

AU Recognition of Gabor Wavelets for AU Combina-

tions in Image Sequences of Increasing Complexity:

In this evaluation, we test recognition accuracy of Gabor

wavelets for AU combinations in a more complex database.

The database consists of 606 image sequences from 107

subjects of European, African, and Asian ancestry. Most

Table 2. Recognition results of single AUs by

using Gabor wavelets.

AU 41

AU 42

AU 43

AU 41

AU 42

AU 43

Recognition rate: 83%

image sequences contain AU combinations and some in-

clude small head motion. We split the image sequences into

training (407 sequences from 59 subjects) and testing (199

sequences from 48 subjects) sets to ensure that the same

subjects did not appear in both training and testing. Table 3

shows the AU distribution for training and test sets.

Table 3. AU distribution of training and test

data sets in image sequences of increasing com-

plexity.

Datasets

AU0

AU1

AU2

AU4

AU5

AU6

AU7

AU41

AU43

T rain

407

163

124

157

T est

199

104

Table 4. AU Recognition of Gabor Wavelets

for AU combinations in image sequences of in-

creasing complexity.

AUs

Total

Correct

Missed

False

AU1

104

100

AU2

AU4

AU5

AU6

AU7

AU41

AU43

AU0

199

140

208

Total

671

215

456

219

Average Recognition Rate: 32%

False Alarm Rate: 32.6%

In the experiment, total 800 Gabor wavelet coefﬁcients

corresponding 5-scale and 8-orientation are calculated at 20

speciﬁc locations. We have found that 480 Gabor coefﬁ-

cients of three middle scales perform better than use all 5

scales. The inputs are 480 Gabor coefﬁcients (3 spatial fre-

quencies in 8 orientations, applied at 20 locations). The

Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02)

recognition results are summarized in Table 4. We have

achieved average recognition- and false alarm rates of 32%

and 32.6% respectively. Recognition is adequate only for

AU6, AU43, and AU0. The appearance changes associate

with these AUs are detected often occurred in speciﬁc re-

gions for AU6 and AU43 comparing with AU0. For exam-

ple, crows-feet wrinkles often appear for AU6 and the eyes

look qualitatively different when they are open and closed

(AU43). Use of PCA to reduce the dimensionality of the

Gabor wavelet coefﬁcients failed to increase recognition ac-

curacy.

AU Recognition of Geometric Features for AU Com-

binations in Image Sequences of Increasing Complex-

ity: For comparison, using the 15 parameters of geometric

features, we achieved average recognition- and false alarm

rates of 87.6% and 6.4% respectively (Table 5). Recogni-

tion of individual AUs is good with the exception of AU7.

Most instances of AU7 are of low intensity, which change

only 1 or 2 pixels in face image and cannot be extracted by

geometry-feature-based method.

Table 5. AU Recognition Using Geometric Fea-

tures.

AUs

Total

Correct

Missed

False

AU1

104

100

AU2

AU4

AU5

AU6

AU7

AU41

AU43

AU0

199

Total

671

588

Average Recognition Rate: 87.6%

False Alarm Rate: 6.4%

AU Recognition of Combining Geometric features and

Gabor Wavelets for AU Combinations in Image Se-

quences of Increasing Complexity: In this experiment,

both geometric features and Gabor wavelets are fed to the

network. The inputs are 15 geometric feature and 480 Ga-

bor coefﬁcients (3 spatial frequencies in 8 orientations ap-

plied at 20 locations). The recognition results are shown

in Table 6. In comparison to the results of using either the

geometric features or the Gabor wavelets alone, combin-

ing these features increases the accuracy of AU recognition,

recognition performance has been improved to 92.7% from

87.6% and 32% respectively.

4. Conclusion and Discussion

We summarize the AU recognition results by using Ga-

bor wavelets alone, geometric features alone, and both of

them in Figure 5. Three recognition rates for each AU are

described by histograms. The gray histogram shows recog-

nition results based on Gabor wavelets. The dark gray his-

togram shows recognition results based on geometric fea-

tures, and the white histogram shows results obtained using

both types of features. Using Gabor wavelets alone, recog-

nition is adequate only for AU6, AU43, and AU0. Using

geometric features, recognition is consistently good with

the exception of AU7. The results using geometric features

alone are consistent with previous research that shows high

AU recognition rates for this approach. Combining both

types of features, the recognition performance increased for

all AUs.

Table 6. AU recognition results by combining

Gabor wavelets and geometric features.

AUs

Total

Correct

Missed

False

AU1

104

101

AU2

AU4

AU5

AU6

AU7

AU41

AU43

AU0

199

Total

671

622

Average Recognition Rate: 92.7%

False Alarm Rate: 7.6%

Consistent with previous studies, we found that Gabor

wavelets work well for single AU recognition for homoge-

neous subjects without head motion. However, for recog-

nition of AU combinations when image sequences include

non-homogeneous subjects with small head motions, we are

surprised to ﬁnd relatively poor recognition using this ap-

proach. In summary, several factors may account for the dif-

ference. First, the previous studies used homogeneous sub-

jects. For instance, Zhang et al. included only Japanese and

Donato et al. included only Euro-American. We use diverse

subjects of European, African, and Asian ancestry. Second,

the previous studies recognized emotion-speciﬁed expres-

sions or only single AUs. We tested the Gabor-wavelet-

based method on both single AUs and AU combinations,

including non-additive combinations in which the occur-

rence of one AU modiﬁes another. Third, the previous stud-

ies manually aligned and cropped face images. We omitted

this preprocessing step. Our geometric features and the lo-

cations to calculate Gabor coefﬁcients were robust to head

motion. These differences suggest that any advantage of

Gabor wavelets in facial expression recognition may de-

pend on manual preprocessing and may fail to generalize to

Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02)

Evaluation of Gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity

Figures

Citations

Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements:

Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines

Facial Expression Recognition via a Boosted Deep Belief Network

Facial Expression Analysis.

Automatic, Dimensional and Continuous Emotion Recognition

References

Neural network-based face detection

Comprehensive database for facial expression analysis

Coding facial expressions with Gabor wavelets

Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression

Recognizing action units for facial expression analysis

Related Papers (5)

Comprehensive database for facial expression analysis

Automatic facial expression analysis: a survey

Coding facial expressions with Gabor wavelets

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Robust Real-Time Face Detection

Frequently Asked Questions (8)

Q1. What have the authors contributed in "Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity" ?

Q2. What is the effect of contracting facial muscles?

Q3. What is the way to detect AUs?

Q4. How many Gabor wavelet coefficients are calculated at 20 locations?

Q5. What are the two types of facial feature extraction methods?

Q6. What is the method for recognizing emotion-specified expressions?

Q7. How many Gabor wavelet coefficients are calculated in 20 locations?

Q8. What are the facial features that are automatically detected and tracked in the image sequence?