What contributions have the authors mentioned in the paper "Combined face and gait recognition using alpha matte preprocessing" ?

This paper presents advances on the Human ID Gait Challenge. With this fusion gain, the authors show significant performance improvement.

What is the dimensional transformation matrix obtained using MDA?

These (c− 1) dimensional vectors zk are obtained as followszk = Umdayk, k = 1, . . . , N (4)where Umda is the transformation matrix obtained using MDA.

What is the d′ d dimensional PCA space?

Then the projection to the d′ < d dimensional PCA space is given byyk = Upca(gk − g), k = 1, . . . , N (3) Here Upca is the d′×d transformation matrix with the first d′ orthonormal basis vectors obtained using PCA on the training set {g1, g2, . . . , gN} and g = ∑N k=1 gk is the mean of the training set.

(Open Access) Combined face and gait recognition using alpha matte preprocessing (2012) | Martin Hofmann

Q: What future works have the authors mentioned in the paper "Combined face and gait recognition using alpha matte preprocessing" ?

For future work, stronger and better face and gait methods should be combined. It can be foreseen that recognition rates could improve even further.

Q: What is the advantage of the splitting of the test sequences?

The splitting of the test sequences has the advantage, that for each sequence, multiple sub faces of each person can be used for classification.

Q: What is the main advantage of behavior based features over other physiologic features?

A major advantage of these behavior based features over other physiologic features is the possibility to identify people from large distances and without the person’s direct cooperation.

Q: Why is there a band on the silhouette?

due to the nature of the image capturing, there is a band on the silhouette which belongs partially to foreground and partially to background.

Q: What is the way to recognize a face?

Even though face recognition has its performance peak at high resolution frontal face images, it can still be seen that facial profile recognition can contribute to the performance, when combined correctly.

Combined Face and Gait Recognition using Alpha Matte Preprocessing

Martin Hofmann

, Stephan M. Schmidt

, AN. Rajagopalan

1,2

, Gerhard Rigoll

Institute for Human-Machine Communication, Technische Universit

at M

unchen, Germany

Department of Electrical Engineering, Indian Institute of Technology Madras, India

martin.hofmann@tum.de, stephan.schmidt@mytum.de, raju@iitm.ac.in, rigoll@tum.de

Abstract

This paper presents advances on the Human ID Gait

Challenge. Our method is based on combining an improved

gait recognition method with an adapted low resolution face

recognition method. For this, we experiment with a new au-

tomated segmentation technique based on alpha-matting.

This allows better construction of feature images used for

gait recognition. The same segmentation is also used as

a basis for ﬁnding and recognizing low-resolution facial

proﬁle images in the same database. Both, gait and face

recognition methods show results comparable to the state

of the art. Next, the two approaches are fused (which to

our knowledge, has not yet been done for the Human ID

Gait Challenge). With this fusion gain, we show signiﬁcant

performance improvement. Moreover, we reach the highest

recognition rates and the largest absolute number of correct

detections to date.

1. Introduction

The focus of this paper is on recognizing people from

larger distances. At a distance, many typical physiologic

features, such as ﬁngerprint, DNA, hand, ear, retina and

face, are obscured or cannot be obtained at all. By con-

trast, behavior based features such as gait features can be

extracted from walking people at a distance.

In our approach we make use of gait recognition com-

bined with person identiﬁcation based on low-resolution

face proﬁle images. As such we combine physiologic and

behavior based features. We show that both modalities lead

to good results on their own. When combining them, we

observe a signiﬁcant improvement in recognition perfor-

mance, which demonstrates the strength of a multimodal

approach.

Primarily our approach is motivated by the success of

gait recognition methods for recognition at a distance. In

1967, Murray [11] suggested that if all gait movements are

considered, gait is unique. Early studies in 1977 by Cut-

ting and Kozlowski [2] suggest that it is possible to recog-

nize friends from just their way of walking. Later, Steve-

nage et al. [15] showed that people can be recognized with-

out any information on the body-shape, only using gait fea-

tures. A major advantage of these behavior based features

over other physiologic features is the possibility to iden-

tify people from large distances and without the person’s

direct cooperation. Also no direct interaction with a sens-

ing device is necessary, which allows for undisclosed iden-

tiﬁcation. Thus gait recognition has great potential in video

surveillance, tracking and monitoring.

For low resolution data, gait recognition has its clear ad-

vantages. However, in our approach, we also use low reso-

lution face data. Even though face recognition has its per-

formance peak at high resolution frontal face images, it can

still be seen that facial proﬁle recognition can contribute to

the performance, when combined correctly.

A multitude of gait recognition algorithms (see Table

1) have so far been proposed, which leads to a rich set of

results we can compare to. Most of these methods build

solely on the binarized silhouette images. However we feel

that a lot of identity information gets lost by this early bi-

narization. Thus instead of binarizing, both our face and

gait recognition methods build on a novel automated color

foreground segmentation method based on alpha-matting.

For gait recognition we use the continuous alpha-matte seg-

mentation and show a small increase in performance. To

our knowledge so far face recognition has not been applied

to the Human ID Gait database [12], so we cannot compare

these results directly. When fusing gait and face features we

observe a signiﬁcant performance gain, such that our com-

bined method outperforms the state of the art.

2. Related Work

Generally speaking there are two kinds of gait recogni-

tion methods. On the one hand model-based methods, on

the other hand model-free methods. Model based methods

[1][21] deﬁne a (simpliﬁed) human model and match the

gait sequences to this model. Gait recognition is then per-

formed on the temporal change of the model parameters,

such as leg angles [21]. Those methods are typically very

(a) (b) (c) (d) (e)

Figure 1: Left to right: input image; foreground segmenta-

tion; tri-state labeling with morphologic operations; alpha

matte; ﬁnal segmentation

demanding and good results are hard to achieve. Model-

free methods [3][5][7][9][12][17][19][20] on the other hand

have shown more success in the recent past. Here, the per-

son identity is directly inferred from the features without

an intermediate person model. Most methods build on a

silhouette extraction for each frame in a gait cycle. Silhou-

ettes are either averaged [3][9][19], or all silhouettes are

used simultaneously [7][12][16]. Different classiﬁers rang-

ing from nearest neighbor [3], SVM and HMM [7][16] have

been applied with similarly good results.

Recently gait recognition has been combined with face

recognition [6][10][22]. Typical face recognition meth-

ods require a high resolution frontal face image. How-

ever for gait recognition, persons are only captured in low-

resolution side view images. In [6], for face recognition,

only the ﬁnal segment of the gait video, where the person is

visible in near frontal, is used. In [13], multiple cameras are

used to ensure that both the side view, as well as the frontal

view are available. To avoid these special cases, face recog-

nition can be performed on the low-resolution side view im-

ages [22]. Our approach is similar to the latter ones, because

we also do not depend on specialized data, but instead work

directly on the low-resolution side view videos.

For performance evaluation, many databases have been

recorded. However, the most popular and widely used

database is probably the Human ID Gait database [12]. This

database features video sequences of a total of 122 subjects,

which walk perpendicular to the camera at a distance. While

many methods have been applied to this dataset, so far no

fusion method using gait and face was ever applied to this

database.

3. Segmentation using Alpha Mattes

In this work, we investigate a new segmentation tech-

nique which we apply to both gait recognition as well as

face recognition. Current gait recognition methods rely on

good segmentation to extract the contour and the silhouettes

of the foreground objects. Typically, a background is esti-

mated by calculating the mean and variance of the scene

over a certain period. Then the foreground is estimated by

ﬁnding the pixels with signiﬁcant deviation from the back-

ground model. This leads to a noisy, binary segmentation

as depicted in Figure 1b). However, due to the nature of

the image capturing, there is a band on the silhouette which

belongs partially to foreground and partially to background.

Thus at each pixel (x, y), the image I is modeled as a linear

composition of the foreground F and the background B:

I(x, y)=α(x, y)F (x, y)+(1− α(x, y))B(x, y) (1)

Here, α(x, y) is the opacity of the pixel at (x, y). F (x, y),

B(x, y) and α(x, y) are unknown. For a typical color image

with three color channels we thus have 7 unknowns to solve

for at each pixel. This kind of problem statement is typical

for matting problems. To leverage the high number of un-

knowns, proximity and smoothness assumptions are made.

Also the typical matting application has a human in the

loop who has to provide some scribbles for foreground and

background, leading to the so called trimap. This map con-

tains regions which are deﬁnitely foreground (α(x, y)=1),

some which are deﬁnitely background (α(x, y)=0) and

some unknown regions for which the matting method deter-

mines the α(x, y).

However, for automated gait recognition it is infeasible

to have a human in the loop. We therefore automatically

generate the trimap from the noisy foreground segmenta-

tion. We get the deﬁnite-foreground regions (α(x, y)=

1) by eroding the foreground segmentation with a circu-

lar structure element with radius r =4. The deﬁnite-

background regions are obtained by eroding the background

region with the same circular structure element. The result-

ing trimap is shown in Figure 1c).

For background segmentation we use Gaussian mixture

models [14], for alpha matting we used closed form matting

[8].

The resulting foreground segmentation – the alpha-matte

– is depicted in Figure 1d). It can be seen that this seg-

mentation is superior to the initial background segmenta-

tion. Holes are closed, erroneous pixels are removed and

most of all, the smooth transition of the foreground to

the background is captured. Furthermore by F (x, y)=

I(x, y) · α(x, y) we can approximate a precise color seg-

mentation of the foreground object (see Figure 1e) ). This

color segmentation is used for the face recognition part.

4. Gait recognition

4.1. Feature Extraction using α-GEI

For gait recognition we use a method based on the classi-

cal Gait Energy Image (GEI) [3]. However, instead of using

binary silhouettes, we use the alpha channel from the alpha

matting as described in the previous section. We call this

the Alpha Gait Energy Image (α-GEI)

In essence, the Alpha Gait Energy Image is an arithmetic

mean of the alpha channel. Denote α

the alpha matte in

frame t. Then, the α-GEI g is formally deﬁned as the alpha

matte average over one full gait cycle:

g(x, y)=



t=1

(x, y) (2)

4.2. Feature Space Reduction

The gait energy images g(x, y) have a resolution of

88 × 128 pixels. Thus the feature vector is still large with

11264 dimensions. We apply principal component analysis

(PCA) followed by multiple discriminant analysis (MDA)

to reduce the size of the feature vector. A combination of

PCA and MDA, as proposed in [4], results in the best recog-

nition performance. While PCA seeks a projection that best

represents the data, MDA seeks a projection that best sepa-

rates the data.

Assume that the training set, consisting of Nd-

dimensional training vectors {g

,...,g

}, is given.

Then the projection to the d



<ddimensional PCA space

is given by

= U

pca

− g),k=1,...,N (3)

Here U

pca

is the d



×d transformation matrix with the ﬁrst d



orthonormal basis vectors obtained using PCA on the train-

ing set {g

,...,g

} and g =



k=1

is the mean of

the training set. After PCA, MDA is performed. It is as-

sumed that the reduced vectors Y = {y

,...,y

} be-

long to c classes. Thus the set of reduced training vectors Y

is composed of its c disjunct subsets Y = Y

∩Y

∩ ...Y

The MDA projection has by construction (c − 1) dimen-

sions. These (c − 1) dimensional vectors z

are obtained as

follows

= U

mda

,k=1,...,N (4)

where U

mda

is the transformation matrix obtained using

MDA. This matrix results from optimizing the ratio of the

between-class scatter matrix S

and the within-class scatter

matrix S

J(U

mda



mda

. (5)

Here the within-class scatter matrix S

is deﬁned as S



i=1

, with S



y ∈Y

(y − m

)(y − m

)

and m



y ∈Y

y. Where N

= |Y

| is the number of vectors

in Y

. The between-class scatter S

is deﬁned as S



i=1

− m)(m

− m)

, with m =



i=1

Finally, for each Gait Energy Image, the corresponding

gait feature vector is computed as follows

= U

pca

mda

−g)=T (g

−g),k=1,...,N

(6)

(a) (b)

Figure 2: a) Rough deﬁnition of the pre-face around the

face region. b) Registration of the pre-faces using sum of

absolute differences.

4.3. Classiﬁcation

Each class c is modeled with only one vector, which is

the mean feature vector



z∈Z

z. (7)

For each α-GEI from the test set ˆg

, we perform the

transformation in Equation 6 to get the reduced feature vec-

tor ˆz

. A distance D

gait

(c)=||ˆz

− z

|| using Euclidean

distance measure is deﬁned. It deﬁnes for all sequences i,

the distance to the c-th class. Final person identiﬁcation us-

ing gait then becomes a nearest-neighbor classiﬁcation. We

assign a class label L

to each test gait image according to

=argmin

gait

5. Face recognition

5.1. Pre-faces

In the ﬁrst part of the algorithm, the gallery set is pro-

cessed. The goal is to ﬁnd a 20×20 patch of the face pro-

ﬁle of each person. To robustly achieve this and to avoid

erroneous segmentations, ﬁrst for each gallery sequence a

pre-face is calculated. To this end, the mean of all frames

in a sequence is calculated (similar to GEI), in order to ﬁnd

the person more precisely than using a bounding box. Over

this mean image, a 30×40 patch is deﬁned, which is used

to cut the region for all frames (see Figure 2).

Because viewing direction and body positions slightly

changes when the person walks across the scene, instead of

only extracting one face per sequence, multiple such faces,

which are evenly spread over the sequence, are extracted.

This ensures that as much information about the person is

(a)

(b)

Figure 3: a) The alpha matte based segmentation; the

roughly cropped pre-face and the ﬁnal face segmentation.

b) Several sub faces of a speciﬁc sequence. It can be clearly

seen that the appearance of a face changes within the se-

quence.

captured as possible. Thus, always ﬁve consecutive pre-

faces are combined. Those ﬁve pre-faces are registered us-

ing sum of absolute differences. After registration, the mean

is taken to ﬁnd the averaged pre-face.

Finally, to ﬁnd the precise head location within the

30×40 pixel pre-face, a simple threshold method is used to

ﬁnd the highest point (top of head) and the left-most point

(nose). Using these two points, a 20×20 pixel patch is ex-

tracted, which captures the ﬁnal segmentation of the face.

Results of segmentation can be seen in Figure 3. Note that

due to the alpha matte preprocessing the segmentations con-

tain only color foreground regions. Disturbing background

pixels are eliminated.

The same segmentation is carried out on the test se-

quences. The splitting of the test sequences has the ad-

vantage, that for each sequence, multiple sub faces of each

person can be used for classiﬁcation. This way, multiple

aspects of the person are captured and in addition, the inﬂu-

ence of erroneous segmentations is reduced.

5.2. Eigenface Calculation

We apply the classical eigenface method [18] for face

recognition. This means that the average face is calculated

by taking the mean. This average face is subtracted from the

gallery faces and a covariance matrix is estimated from the

gallery data. Thus a PCA is performed. In order to capture

color information like skin and hair color, all three color

channels are appended and used for the calculation of the

covariance matrix.

Let {f

,...,f

} be the set of M 20 × 20 × 3 color

face patches in the gallery set. Here M is number of all sub

faces, so it is roughly 40 times larger than the number of

people in the gallery set. Then the resulting transformation

= U

face

− f ) (9)

where

f =



k=1

is the mean face and U

face

is the

transformation matrix learned by PCA.

5.3. Classiﬁcation

Face recognition is done similarly to gait recognition.

However, instead of having one average gait template,

we have several sub faces for each sequence as described

above. Typically one would use k-nearest neighbor in such

a case. For the later fusion step, however, we need a contin-

uous score for each potential class. Thus for each of the sub

faces of a test sequence we calculate the distance to all sub

faces of all trainings sequences (see Figure 4). Out of these

matches, we only keep the k nearest matches. Within these

k matches, the average distance to all comprised classes is

averaged, thus resulting in a distance D

face

(j). If a class c

is not comprised in the k best matches at all, then the dis-

tance is set to D

face

(j)=∞. In our experiments we set

k = 100, however the method is not sensitive to this value

as long as it is big enough (> 10).

For pure face classiﬁcation the class c with the minimal

distance argmin

face

eigenvector 1

eigenvector 2

subfaces test-sequence A

subfaces gallery-sequence B

subfaces gallery-sequence C

mean(B):

{

class B

class C

mean(C):

Figure 4: Illustration of the face classiﬁcation (shown for

the ﬁrst two eigenvalues). For a given test sequence A, the

k closest matches are found (here k =5). Within those top

k matches, the class averages (here, to class B and C, re-

spectively) are a measure for the similarity to these classes.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

recognition rate

probe A

probe B

probe C

probe D

probe E

probe F

probe G

probe H

probe I

probe J

probe K

probe L

mean

gait only

face only

prod. rule

sum rule

max rule

Figure 5: Quantitative results on the Human ID Gait

database [12]. (1) using only gait information, (2) using

only face, (3) fusion using product rule, (4) fusion using

sum rule, (5) fusion using max rule

6. Fusion of Face and Gait

In this work we use score level fusion. This means that

the distance scores D

gait

face

before decision making. There are multiple ways of fusing

the results. We use max, product and sum rules:

(c)=D

gait

face

(c)=D

gait

(c)+D

face

(c)=max(D

gait

(c),D

face

(c)) (12)

(13)

The distances result from different modalities, thus the

values are not directly comparable. Therefore normaliza-

tion of the vectors is of central importance. Before fu-

sion, the vectors are normalized to have unit length, i.e.

D(c) ← D(c)/



ˆc

D(ˆc).

7. Results and Comparison

Figure 5 shows the quantitative results on the Human ID

Gait database. It can be seen that fusion using either the

product rule or the sum rule greatly improves the recog-

nition rates, except Probe B, where fusion slightly reduces

recognition rates of gait, but greatly increases results of face

recognition. The max rule shows inferior performance.

For performance evaluation, we compare our method

to several state-of-the-art results. Summarizing results are

shown in Table 1 (largely taken from [5]). Here, recognition

rates for all 12 experiments, as well as the weighed recog-

nition average are shown.

It can be seen that our α-GEI (53.6.0%) - which does not

use synthetic images as in [3] - outperforms the standard

GEI (48.2%). This demonstrates the effectiveness of the al-

pha matte preprocessing and it can be foreseen that when

implementing synthetic images, recognition rates can be

even improved further. We cannot compare our α-eigenface

method, since currently no other face recognition method

was applied to the Human ID Gait database.

Both our face (54, 6%) and our gait recognition method

(53, 6%) alone cannot compete with the current state of the

art. However, when combining these multimodal meth-

ods, recogniton rates exceed all previous approaches. This

shows the importance of simultaneously using multiple

modalities and fusing them. It can be seen that simple prod-

uct and sum rules lead to good fusion results and to adra-

matic increase in performance.

8. Conclusion and Outlook

In this work, a new preprocessing method using closed

form alpha matting was introduced. It was applied to both

face and gait recognition. In order to use this method, which

typically requires a ”human in the loop”, an automated gen-

eration of the trimap was presented. Using this preprocess-

ing it was possible to increase the performance of the stan-

dard Gait Energy Image.

Combining both the modiﬁed face and gait recognition

method, it was possible to achieve unprecedented perfor-

mance results on the Human ID Gait challenge. Similar

fusion techniques have currently only been carried out on

other (smaller) datasets.

For future work, stronger and better face and gait meth-

ods should be combined. It can be foreseen that recognition

rates could improve even further.

References

[1] C. BenAbdelkader, R. Cutler, and L. Davis. Stride and ca-

dence as a biometric in automatic person identiﬁcation and

veriﬁcation. In Proceedings Fifth IEEE International Con-

ference on Automatic Face and Gesture Recognition, pages

372–377. IEEE, 2002.

[2] J. Cutting and L. Kozlowski. Recognizing friends by their

walk: Gait perception without familiarity cues. Bulletin of

the Psychonomic Society, 9(5):353–356, 1977.

[3] J. Han and B. Bhanu. Individual recognition using gait en-

ergy image. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, pages 316–322, 2006.

[4] P. Huang, C. Harris, and M. Nixon. Recognising humans by

gait via parametric canonical space. Journal of Artiﬁcial In-

telligence in Engineering, 13(4):359–366, November 1999.

[5] Y. Huang, D. Xu, and T.-J. Cham. Face and human gait

recognition using image-to-class distance. IEEE Trans. Cir-

cuits Syst. Video Techn., 20(3):431–438, 2010.

Combined face and gait recognition using alpha matte preprocessing

Figures

Citations

Score Level and Rank Level Fusion for Kinect-Based Multi-Modal Biometric System

Robust Methods for Content Analysis of Auditory Scenes

Fusion of Face and Gait for Biometric Recognition: Systematic Literature Review

Face and Body-Based Human Recognition by GAN-Based Blur Restoration.

Individuality-Preserving Silhouette Extraction for Gait Recognition and Its Speedup

References

Robust Real-Time Face Detection

Robust real-time face detection

Adaptive background mixture models for real-time tracking

A Closed-Form Solution to Natural Image Matting

Individual recognition using gait energy image

Related Papers (5)

Individual recognition using gait energy image

Gait recognition using a view transformation model in the frequency domain

The humanID gait challenge problem: data sets, performance, and analysis

A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition

The OU-ISIR Gait Database Comprising the Large Population Dataset and Performance Evaluation of Gait Recognition

Frequently Asked Questions (8)

Q1. What contributions have the authors mentioned in the paper "Combined face and gait recognition using alpha matte preprocessing" ?

Q2. What future works have the authors mentioned in the paper "Combined face and gait recognition using alpha matte preprocessing" ?

Q3. What is the advantage of the splitting of the test sequences?

Q4. What is the main advantage of behavior based features over other physiologic features?

Q5. Why is there a band on the silhouette?

Q6. What is the dimensional transformation matrix obtained using MDA?

Q7. What is the d′ d dimensional PCA space?

Q8. What is the way to recognize a face?