scispace - formally typeset
Open AccessProceedings ArticleDOI

Combined face and gait recognition using alpha matte preprocessing

TLDR
This method is based on combining an improved gait recognition method with an adapted low resolution face recognition method, and reaches the highest recognition rates and the largest absolute number of correct detections to date.
Abstract
This paper presents advances on the Human ID Gait Challenge. Our method is based on combining an improved gait recognition method with an adapted low resolution face recognition method. For this, we experiment with a new automated segmentation technique based on alpha-matting. This allows better construction of feature images used for gait recognition. The same segmentation is also used as a basis for finding and recognizing low-resolution facial profile images in the same database. Both, gait and face recognition methods show results comparable to the state of the art. Next, the two approaches are fused (which to our knowledge, has not yet been done for the Human ID Gait Challenge). With this fusion gain, we show significant performance improvement. Moreover, we reach the highest recognition rates and the largest absolute number of correct detections to date.

read more

Content maybe subject to copyright    Report

Combined Face and Gait Recognition using Alpha Matte Preprocessing
Martin Hofmann
1
, Stephan M. Schmidt
1
, AN. Rajagopalan
1,2
, Gerhard Rigoll
1
1
Institute for Human-Machine Communication, Technische Universit
¨
at M
¨
unchen, Germany
2
Department of Electrical Engineering, Indian Institute of Technology Madras, India
martin.hofmann@tum.de, stephan.schmidt@mytum.de, raju@iitm.ac.in, rigoll@tum.de
Abstract
This paper presents advances on the Human ID Gait
Challenge. Our method is based on combining an improved
gait recognition method with an adapted low resolution face
recognition method. For this, we experiment with a new au-
tomated segmentation technique based on alpha-matting.
This allows better construction of feature images used for
gait recognition. The same segmentation is also used as
a basis for finding and recognizing low-resolution facial
profile images in the same database. Both, gait and face
recognition methods show results comparable to the state
of the art. Next, the two approaches are fused (which to
our knowledge, has not yet been done for the Human ID
Gait Challenge). With this fusion gain, we show significant
performance improvement. Moreover, we reach the highest
recognition rates and the largest absolute number of correct
detections to date.
1. Introduction
The focus of this paper is on recognizing people from
larger distances. At a distance, many typical physiologic
features, such as fingerprint, DNA, hand, ear, retina and
face, are obscured or cannot be obtained at all. By con-
trast, behavior based features such as gait features can be
extracted from walking people at a distance.
In our approach we make use of gait recognition com-
bined with person identification based on low-resolution
face profile images. As such we combine physiologic and
behavior based features. We show that both modalities lead
to good results on their own. When combining them, we
observe a significant improvement in recognition perfor-
mance, which demonstrates the strength of a multimodal
approach.
Primarily our approach is motivated by the success of
gait recognition methods for recognition at a distance. In
1967, Murray [11] suggested that if all gait movements are
considered, gait is unique. Early studies in 1977 by Cut-
ting and Kozlowski [2] suggest that it is possible to recog-
nize friends from just their way of walking. Later, Steve-
nage et al. [15] showed that people can be recognized with-
out any information on the body-shape, only using gait fea-
tures. A major advantage of these behavior based features
over other physiologic features is the possibility to iden-
tify people from large distances and without the person’s
direct cooperation. Also no direct interaction with a sens-
ing device is necessary, which allows for undisclosed iden-
tification. Thus gait recognition has great potential in video
surveillance, tracking and monitoring.
For low resolution data, gait recognition has its clear ad-
vantages. However, in our approach, we also use low reso-
lution face data. Even though face recognition has its per-
formance peak at high resolution frontal face images, it can
still be seen that facial profile recognition can contribute to
the performance, when combined correctly.
A multitude of gait recognition algorithms (see Table
1) have so far been proposed, which leads to a rich set of
results we can compare to. Most of these methods build
solely on the binarized silhouette images. However we feel
that a lot of identity information gets lost by this early bi-
narization. Thus instead of binarizing, both our face and
gait recognition methods build on a novel automated color
foreground segmentation method based on alpha-matting.
For gait recognition we use the continuous alpha-matte seg-
mentation and show a small increase in performance. To
our knowledge so far face recognition has not been applied
to the Human ID Gait database [12], so we cannot compare
these results directly. When fusing gait and face features we
observe a significant performance gain, such that our com-
bined method outperforms the state of the art.
2. Related Work
Generally speaking there are two kinds of gait recogni-
tion methods. On the one hand model-based methods, on
the other hand model-free methods. Model based methods
[1][21] define a (simplified) human model and match the
gait sequences to this model. Gait recognition is then per-
formed on the temporal change of the model parameters,
such as leg angles [21]. Those methods are typically very

(a) (b) (c) (d) (e)
Figure 1: Left to right: input image; foreground segmenta-
tion; tri-state labeling with morphologic operations; alpha
matte; final segmentation
demanding and good results are hard to achieve. Model-
free methods [3][5][7][9][12][17][19][20] on the other hand
have shown more success in the recent past. Here, the per-
son identity is directly inferred from the features without
an intermediate person model. Most methods build on a
silhouette extraction for each frame in a gait cycle. Silhou-
ettes are either averaged [3][9][19], or all silhouettes are
used simultaneously [7][12][16]. Different classifiers rang-
ing from nearest neighbor [3], SVM and HMM [7][16] have
been applied with similarly good results.
Recently gait recognition has been combined with face
recognition [6][10][22]. Typical face recognition meth-
ods require a high resolution frontal face image. How-
ever for gait recognition, persons are only captured in low-
resolution side view images. In [6], for face recognition,
only the final segment of the gait video, where the person is
visible in near frontal, is used. In [13], multiple cameras are
used to ensure that both the side view, as well as the frontal
view are available. To avoid these special cases, face recog-
nition can be performed on the low-resolution side view im-
ages [22]. Our approach is similar to the latter ones, because
we also do not depend on specialized data, but instead work
directly on the low-resolution side view videos.
For performance evaluation, many databases have been
recorded. However, the most popular and widely used
database is probably the Human ID Gait database [12]. This
database features video sequences of a total of 122 subjects,
which walk perpendicular to the camera at a distance. While
many methods have been applied to this dataset, so far no
fusion method using gait and face was ever applied to this
database.
3. Segmentation using Alpha Mattes
In this work, we investigate a new segmentation tech-
nique which we apply to both gait recognition as well as
face recognition. Current gait recognition methods rely on
good segmentation to extract the contour and the silhouettes
of the foreground objects. Typically, a background is esti-
mated by calculating the mean and variance of the scene
over a certain period. Then the foreground is estimated by
finding the pixels with significant deviation from the back-
ground model. This leads to a noisy, binary segmentation
as depicted in Figure 1b). However, due to the nature of
the image capturing, there is a band on the silhouette which
belongs partially to foreground and partially to background.
Thus at each pixel (x, y), the image I is modeled as a linear
composition of the foreground F and the background B:
I(x, y)=α(x, y)F (x, y)+(1 α(x, y))B(x, y) (1)
Here, α(x, y) is the opacity of the pixel at (x, y). F (x, y),
B(x, y) and α(x, y) are unknown. For a typical color image
with three color channels we thus have 7 unknowns to solve
for at each pixel. This kind of problem statement is typical
for matting problems. To leverage the high number of un-
knowns, proximity and smoothness assumptions are made.
Also the typical matting application has a human in the
loop who has to provide some scribbles for foreground and
background, leading to the so called trimap. This map con-
tains regions which are definitely foreground (α(x, y)=1),
some which are definitely background (α(x, y)=0) and
some unknown regions for which the matting method deter-
mines the α(x, y).
However, for automated gait recognition it is infeasible
to have a human in the loop. We therefore automatically
generate the trimap from the noisy foreground segmenta-
tion. We get the definite-foreground regions (α(x, y)=
1) by eroding the foreground segmentation with a circu-
lar structure element with radius r =4. The definite-
background regions are obtained by eroding the background
region with the same circular structure element. The result-
ing trimap is shown in Figure 1c).
For background segmentation we use Gaussian mixture
models [14], for alpha matting we used closed form matting
[8].
The resulting foreground segmentation the alpha-matte
is depicted in Figure 1d). It can be seen that this seg-
mentation is superior to the initial background segmenta-
tion. Holes are closed, erroneous pixels are removed and
most of all, the smooth transition of the foreground to
the background is captured. Furthermore by F (x, y)=
I(x, y) · α(x, y) we can approximate a precise color seg-
mentation of the foreground object (see Figure 1e) ). This
color segmentation is used for the face recognition part.
4. Gait recognition
4.1. Feature Extraction using α-GEI
For gait recognition we use a method based on the classi-
cal Gait Energy Image (GEI) [3]. However, instead of using
binary silhouettes, we use the alpha channel from the alpha
matting as described in the previous section. We call this
the Alpha Gait Energy Image (α-GEI)

In essence, the Alpha Gait Energy Image is an arithmetic
mean of the alpha channel. Denote α
t
the alpha matte in
frame t. Then, the α-GEI g is formally defined as the alpha
matte average over one full gait cycle:
g(x, y)=
1
T
T
t=1
α
t
(x, y) (2)
4.2. Feature Space Reduction
The gait energy images g(x, y) have a resolution of
88 × 128 pixels. Thus the feature vector is still large with
11264 dimensions. We apply principal component analysis
(PCA) followed by multiple discriminant analysis (MDA)
to reduce the size of the feature vector. A combination of
PCA and MDA, as proposed in [4], results in the best recog-
nition performance. While PCA seeks a projection that best
represents the data, MDA seeks a projection that best sepa-
rates the data.
Assume that the training set, consisting of Nd-
dimensional training vectors {g
1
,g
2
,...,g
N
}, is given.
Then the projection to the d
<ddimensional PCA space
is given by
y
k
= U
pca
(g
k
g),k=1,...,N (3)
Here U
pca
is the d
×d transformation matrix with the first d
orthonormal basis vectors obtained using PCA on the train-
ing set {g
1
,g
2
,...,g
N
} and g =
N
k=1
g
k
is the mean of
the training set. After PCA, MDA is performed. It is as-
sumed that the reduced vectors Y = {y
1
,y
2
,...,y
N
} be-
long to c classes. Thus the set of reduced training vectors Y
is composed of its c disjunct subsets Y = Y
1
∩Y
2
...Y
c
.
The MDA projection has by construction (c 1) dimen-
sions. These (c 1) dimensional vectors z
k
are obtained as
follows
z
k
= U
mda
y
k
,k=1,...,N (4)
where U
mda
is the transformation matrix obtained using
MDA. This matrix results from optimizing the ratio of the
between-class scatter matrix S
B
and the within-class scatter
matrix S
W
:
J(U
mda
)=
|
S
B
|
|
S
W
|
=
|U
T
mda
S
B
U
mda
|
|U
T
mda
S
W
U
mda
|
. (5)
Here the within-class scatter matrix S
W
is defined as S
W
=
c
i=1
S
i
, with S
i
=
y ∈Y
i
(y m
i
)(y m
i
)
T
and m
i
=
1
N
i
y ∈Y
i
y. Where N
i
= |Y
i
| is the number of vectors
in Y
i
. The between-class scatter S
B
is defined as S
B
=
c
i=1
N
i
(m
i
m)(m
i
m)
T
, with m =
1
N
c
i=1
N
i
m
i
.
Finally, for each Gait Energy Image, the corresponding
gait feature vector is computed as follows
z
k
= U
pca
U
mda
(g
k
g)=T (g
k
g),k=1,...,N
(6)
(a) (b)
Figure 2: a) Rough definition of the pre-face around the
face region. b) Registration of the pre-faces using sum of
absolute differences.
4.3. Classification
Each class c is modeled with only one vector, which is
the mean feature vector
z
c
:
z
c
=
1
|Z
c
|
z∈Z
c
z. (7)
For each α-GEI from the test set ˆg
i
, we perform the
transformation in Equation 6 to get the reduced feature vec-
tor ˆz
i
. A distance D
gait
i
(c)=||ˆz
i
z
c
|| using Euclidean
distance measure is defined. It defines for all sequences i,
the distance to the c-th class. Final person identification us-
ing gait then becomes a nearest-neighbor classification. We
assign a class label L
i
to each test gait image according to
L
i
=argmin
c
D
gait
i
(c) (8)
5. Face recognition
5.1. Pre-faces
In the first part of the algorithm, the gallery set is pro-
cessed. The goal is to find a 20×20 patch of the face pro-
file of each person. To robustly achieve this and to avoid
erroneous segmentations, first for each gallery sequence a
pre-face is calculated. To this end, the mean of all frames
in a sequence is calculated (similar to GEI), in order to find
the person more precisely than using a bounding box. Over
this mean image, a 30×40 patch is defined, which is used
to cut the region for all frames (see Figure 2).
Because viewing direction and body positions slightly
changes when the person walks across the scene, instead of
only extracting one face per sequence, multiple such faces,
which are evenly spread over the sequence, are extracted.
This ensures that as much information about the person is

(a)
(b)
Figure 3: a) The alpha matte based segmentation; the
roughly cropped pre-face and the final face segmentation.
b) Several sub faces of a specific sequence. It can be clearly
seen that the appearance of a face changes within the se-
quence.
captured as possible. Thus, always five consecutive pre-
faces are combined. Those five pre-faces are registered us-
ing sum of absolute differences. After registration, the mean
is taken to find the averaged pre-face.
Finally, to find the precise head location within the
30×40 pixel pre-face, a simple threshold method is used to
find the highest point (top of head) and the left-most point
(nose). Using these two points, a 20×20 pixel patch is ex-
tracted, which captures the final segmentation of the face.
Results of segmentation can be seen in Figure 3. Note that
due to the alpha matte preprocessing the segmentations con-
tain only color foreground regions. Disturbing background
pixels are eliminated.
The same segmentation is carried out on the test se-
quences. The splitting of the test sequences has the ad-
vantage, that for each sequence, multiple sub faces of each
person can be used for classification. This way, multiple
aspects of the person are captured and in addition, the influ-
ence of erroneous segmentations is reduced.
5.2. Eigenface Calculation
We apply the classical eigenface method [18] for face
recognition. This means that the average face is calculated
by taking the mean. This average face is subtracted from the
gallery faces and a covariance matrix is estimated from the
gallery data. Thus a PCA is performed. In order to capture
color information like skin and hair color, all three color
channels are appended and used for the calculation of the
covariance matrix.
Let {f
1
,f
2
,...,f
M
} be the set of M 20 × 20 × 3 color
face patches in the gallery set. Here M is number of all sub
faces, so it is roughly 40 times larger than the number of
people in the gallery set. Then the resulting transformation
is
v
k
= U
face
(f
k
f ) (9)
where
f =
N
k=1
f
k
is the mean face and U
face
is the
transformation matrix learned by PCA.
5.3. Classification
Face recognition is done similarly to gait recognition.
However, instead of having one average gait template,
we have several sub faces for each sequence as described
above. Typically one would use k-nearest neighbor in such
a case. For the later fusion step, however, we need a contin-
uous score for each potential class. Thus for each of the sub
faces of a test sequence we calculate the distance to all sub
faces of all trainings sequences (see Figure 4). Out of these
matches, we only keep the k nearest matches. Within these
k matches, the average distance to all comprised classes is
averaged, thus resulting in a distance D
face
i
(j). If a class c
is not comprised in the k best matches at all, then the dis-
tance is set to D
face
i
(j)=. In our experiments we set
k = 100, however the method is not sensitive to this value
as long as it is big enough (> 10).
For pure face classification the class c with the minimal
distance argmin
c
D
face
i
(c) is taken as the recognition result.
eigenvector 1
eigenvector 2
subfaces test-sequence A
subfaces gallery-sequence B
subfaces gallery-sequence C
1
2
3
4
5
1
2
3
mean(B):
4
5
{
{
class B
class C
mean(C):
Figure 4: Illustration of the face classification (shown for
the first two eigenvalues). For a given test sequence A, the
k closest matches are found (here k =5). Within those top
k matches, the class averages (here, to class B and C, re-
spectively) are a measure for the similarity to these classes.

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recognition rate
probe A
probe B
probe C
probe D
probe E
probe F
probe G
probe H
probe I
probe J
probe K
probe L
mean
gait only
face only
prod. rule
sum rule
max rule
Figure 5: Quantitative results on the Human ID Gait
database [12]. (1) using only gait information, (2) using
only face, (3) fusion using product rule, (4) fusion using
sum rule, (5) fusion using max rule
6. Fusion of Face and Gait
In this work we use score level fusion. This means that
the distance scores D
gait
(c) and D
face
(c) are combined
before decision making. There are multiple ways of fusing
the results. We use max, product and sum rules:
D
i
(c)=D
gait
i
(c) · D
face
i
(c) (10)
D
i
(c)=D
gait
i
(c)+D
face
i
(c) (11)
D
i
(c)=max(D
gait
i
(c),D
face
i
(c)) (12)
(13)
The distances result from different modalities, thus the
values are not directly comparable. Therefore normaliza-
tion of the vectors is of central importance. Before fu-
sion, the vectors are normalized to have unit length, i.e.
D(c) D(c)/
ˆc
Dc).
7. Results and Comparison
Figure 5 shows the quantitative results on the Human ID
Gait database. It can be seen that fusion using either the
product rule or the sum rule greatly improves the recog-
nition rates, except Probe B, where fusion slightly reduces
recognition rates of gait, but greatly increases results of face
recognition. The max rule shows inferior performance.
For performance evaluation, we compare our method
to several state-of-the-art results. Summarizing results are
shown in Table 1 (largely taken from [5]). Here, recognition
rates for all 12 experiments, as well as the weighed recog-
nition average are shown.
It can be seen that our α-GEI (53.6.0%) - which does not
use synthetic images as in [3] - outperforms the standard
GEI (48.2%). This demonstrates the effectiveness of the al-
pha matte preprocessing and it can be foreseen that when
implementing synthetic images, recognition rates can be
even improved further. We cannot compare our α-eigenface
method, since currently no other face recognition method
was applied to the Human ID Gait database.
Both our face (54, 6%) and our gait recognition method
(53, 6%) alone cannot compete with the current state of the
art. However, when combining these multimodal meth-
ods, recogniton rates exceed all previous approaches. This
shows the importance of simultaneously using multiple
modalities and fusing them. It can be seen that simple prod-
uct and sum rules lead to good fusion results and to adra-
matic increase in performance.
8. Conclusion and Outlook
In this work, a new preprocessing method using closed
form alpha matting was introduced. It was applied to both
face and gait recognition. In order to use this method, which
typically requires a ”human in the loop”, an automated gen-
eration of the trimap was presented. Using this preprocess-
ing it was possible to increase the performance of the stan-
dard Gait Energy Image.
Combining both the modified face and gait recognition
method, it was possible to achieve unprecedented perfor-
mance results on the Human ID Gait challenge. Similar
fusion techniques have currently only been carried out on
other (smaller) datasets.
For future work, stronger and better face and gait meth-
ods should be combined. It can be foreseen that recognition
rates could improve even further.
References
[1] C. BenAbdelkader, R. Cutler, and L. Davis. Stride and ca-
dence as a biometric in automatic person identification and
verification. In Proceedings Fifth IEEE International Con-
ference on Automatic Face and Gesture Recognition, pages
372–377. IEEE, 2002.
[2] J. Cutting and L. Kozlowski. Recognizing friends by their
walk: Gait perception without familiarity cues. Bulletin of
the Psychonomic Society, 9(5):353–356, 1977.
[3] J. Han and B. Bhanu. Individual recognition using gait en-
ergy image. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, pages 316–322, 2006.
[4] P. Huang, C. Harris, and M. Nixon. Recognising humans by
gait via parametric canonical space. Journal of Artificial In-
telligence in Engineering, 13(4):359–366, November 1999.
[5] Y. Huang, D. Xu, and T.-J. Cham. Face and human gait
recognition using image-to-class distance. IEEE Trans. Cir-
cuits Syst. Video Techn., 20(3):431–438, 2010.

Citations
More filters
Journal ArticleDOI

Score Level and Rank Level Fusion for Kinect-Based Multi-Modal Biometric System

TL;DR: This article presents a first multimodal biometric system that combines KINECT gait modality withKINECT face modality utilizing the rank level and the score level fusion utilizing the Borda count and logistic regression approaches.

Robust Methods for Content Analysis of Auditory Scenes

J.Th. Geiger
TL;DR: Improvements for two established methods (speaker diarization and robust speech recognition) are presented and approaches to detect overlapping speech and increase the robustness of a speech recognition system against noise and reverberation are proposed.
Proceedings ArticleDOI

Fusion of Face and Gait for Biometric Recognition: Systematic Literature Review

TL;DR: A systematic literature review (SLR) conducted in the field of multimodal biometrics, considering the fusion of biometric characteristics of face and gait found that although the theme presents some trends, there are still important gaps that need to be investigated.
Journal ArticleDOI

Face and Body-Based Human Recognition by GAN-Based Blur Restoration.

TL;DR: A recognition method is proposed wherein the blur of body and face images is restored using a generative adversarial network (GAN), and the features of face and body obtained using a deep convolutional neural network (CNN) are used to fuse the matching score.
References
More filters
Journal ArticleDOI

Robust Real-Time Face Detection

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Proceedings ArticleDOI

Robust real-time face detection

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Proceedings ArticleDOI

Adaptive background mixture models for real-time tracking

TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Journal ArticleDOI

A Closed-Form Solution to Natural Image Matting

TL;DR: A closed-form solution to natural image matting that allows us to find the globally optimal alpha matte by solving a sparse linear system of equations and predicts the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms.
Journal ArticleDOI

Individual recognition using gait energy image

TL;DR: Experimental results show that the proposed GEI is an effective and efficient gait representation for individual recognition, and the proposed approach achieves highly competitive performance with respect to the published gait recognition approaches.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "Combined face and gait recognition using alpha matte preprocessing" ?

This paper presents advances on the Human ID Gait Challenge. With this fusion gain, the authors show significant performance improvement. 

For future work, stronger and better face and gait methods should be combined. It can be foreseen that recognition rates could improve even further. 

The splitting of the test sequences has the advantage, that for each sequence, multiple sub faces of each person can be used for classification. 

A major advantage of these behavior based features over other physiologic features is the possibility to identify people from large distances and without the person’s direct cooperation. 

due to the nature of the image capturing, there is a band on the silhouette which belongs partially to foreground and partially to background. 

These (c− 1) dimensional vectors zk are obtained as followszk = Umdayk, k = 1, . . . , N (4)where Umda is the transformation matrix obtained using MDA. 

Then the projection to the d′ < d dimensional PCA space is given byyk = Upca(gk − g), k = 1, . . . , N (3) Here Upca is the d′×d transformation matrix with the first d′ orthonormal basis vectors obtained using PCA on the training set {g1, g2, . . . , gN} and g = ∑N k=1 gk is the mean of the training set. 

Even though face recognition has its performance peak at high resolution frontal face images, it can still be seen that facial profile recognition can contribute to the performance, when combined correctly.