scispace - formally typeset
Open AccessJournal ArticleDOI

Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions

TLDR
This work presents a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition, and improves robustness by adding Kernel principal component analysis (PCA) feature extraction and incorporating rich local appearance cues from two complementary sources.
Abstract
Making recognition more reliable under uncontrolled lighting conditions is one of the most important challenges for practical face recognition systems. We tackle this by combining the strengths of robust illumination normalization, local texture-based face representations, distance transform based matching, kernel-based feature extraction and multiple feature fusion. Specifically, we make three main contributions: 1) we present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition; 2) we introduce local ternary patterns (LTP), a generalization of the local binary pattern (LBP) local texture descriptor that is more discriminant and less sensitive to noise in uniform regions, and we show that replacing comparisons based on local spatial histograms with a distance transform based similarity metric further improves the performance of LBP/LTP based face recognition; and 3) we further improve robustness by adding Kernel principal component analysis (PCA) feature extraction and incorporating rich local appearance cues from two complementary sources-Gabor wavelets and LBP-showing that the combination is considerably more accurate than either feature set alone. The resulting method provides state-of-the-art performance on three data sets that are widely used for testing recognition under difficult illumination conditions: Extended Yale-B, CAS-PEAL-R1, and Face Recognition Grand Challenge version 2 experiment 4 (FRGC-204). For example, on the challenging FRGC-204 data set it halves the error rate relative to previously published methods, achieving a face verification rate of 88.1% at 0.1% false accept rate. Further experiments show that our preprocessing method outperforms several existing preprocessors for a range of feature sets, data sets and lighting conditions.

read more

Content maybe subject to copyright    Report

Enhanced Local Texture Feature Sets for Face
Recognition Under Difficult Lighting Conditions
Xiaoyang Tan and Bill Triggs
INRIA & Laboratoire Jean Kuntzmann, 655 avenue de l’Europe, Montbonnot 38330, France
{xiaoyang.tan,bill.triggs}@imag.fr
Abstract. Recognition in uncontrolled situations is one of the most important
bottlenecks for practical face recognition systems. We address this by combining
the strengths of robust illumination normalization, local texture based face repre-
sentations and distance transform based matching metrics. Specifically, we make
three main contributions: (i) we present a simple and efficient preprocessing chain
that eliminates most of the effects of changing illumination while still preserving
the essential appearance details that are needed for recognition; (ii) we introduce
Local Ternary Patterns (LTP), a generalization of the Local Binary Pattern (LBP)
local texture descriptor that is more discriminant and less sensitive to noise in
uniform regions; and (iii) we show that replacing local histogramming with a lo-
cal distance transform based similarity metric further improves the performance
of LBP/LTP based face recognition. The resulting method gives state-of-the-art
performance on three popular datasets chosen to test recognition under difficult
illumination conditions: Face Recognition Grand Challenge version 1 experiment
4, Extended Yale-B, and CMU PIE.
1 Introduction
One of the key challenges of face recognition is finding efficient and discriminative fa-
cial appearance descriptors that can counteract large variations in illumination, pose,
facial expression, ageing, partial occlusions and other changes [27]. There are two
main approaches: geometric feature-based descriptors and appearance-based descrip-
tors. Geometric descriptors can be hard to extract reliably under variations in facial
appearance, while appearance-based ones such as eigenfaces tend to blur out small de-
tails owing to residual spatial registration errors. Recently, representations based on
local pooling of local appearance descriptors have drawn increasing attention because
they can capture small appearance details in the descriptors while remaining resistant
to registration errors owing to local pooling. Another motivation is the observation that
human visual perception is well-adapted to extracting and pooling local structural in-
formation (‘micro-patterns’) from images [2]. Methods in this category include Gabor
wavelets [16], local autocorrelation filters [11], and Local Binary Patterns [1].
In this paper we focus on Local Binary Patterns (LBP) and their generalizations.
LBP’s are a computationally efficient nonparametric local image texture descriptor.
They have been used with considerable success in a number of visual recognition tasks
including face recognition [1,2,20]. LBP features are invariant to monotonic gray-level
S.K. Zhou et al. (Eds.): AMFG 2007, LNCS 4778, pp. 168–182, 2007.
c
Springer-Verlag Berlin Heidelberg 2007

Enhanced Local Texture Feature Sets for Face Recognition 169
changes by design and thus are usually considered to require no image preprocessing
before use
1
. In fact, LBP itself is sometimes used as a lighting normalization stage for
other methods [12]. However, in practice the reliability of LBP decreases significantly
under large illumination variations (c.f . table 3). Lighting effects involve complex local
interactions and the resulting images often violate LBP’s basic assumption that gray-
level changes monotonically. We have addressed this problem by developing a simple
and efficient image preprocessing chain that greatly reduces the influence of illumina-
tion variations, local shadowing and highlights while preserving the elements of visual
appearance that are needed for recognition.
Another limitation of LBP is its sensitivity to random and quantization noise in uni-
form and near-uniform image regions such as the forehead and cheeks. To counter this
we extend LBP to Local Ternary Patterns (LTP), a 3-valued coding that includes a
threshold around zero for improved resistance to noise. LTP inherits most of the other
key advantages of LBP such as computational efficiency.
Current LBP based face recognition methods partition the face image into a grid
of fixed-size cells for the local pooling of texture descriptors (LBP histograms). This
coarse (and typically abrupt) spatial quantization is somewhat arbitrary and not neces-
sarily well adapted to local facial morphology. It inevitably causes some loss of dis-
criminative power. To counter this we use distance transform techniques to create local
texture comparison metrics that have more controlled spatial gradings.
To illustrate the effectiveness of our approach we present experimental results on
three state-of-the-art face recognition datasets containing large lighting variations sim-
ilar to those encountered in natural images taken under uncontrolled conditions: Face
Recognition Grand Challenge version 1 experiment1.0.4 (‘FRGC-104’) [19]; Extended
Yale illumination face database B (‘Extended Yale-B’) [9,15]; and CMU PIE [22].
2 Related Work
As emphasized by the recent FRVT and FRGC trials [19], illumination variations are
one of the most important bottlenecks for practical face recognition systems. Gener-
ally, one can cope with this in two ways. The first uses training examples to learn a
global model of the possible illumination variations, for example a linear subspace or
manifold model, which then generalizes to the variations seen in new images [5,3]. The
disadvantage is that many training images are required.
The second approach seeks conventional image processing transformations that re-
duce the image to a more “canonical” form in which the variations are suppressed. This
has the merit of easy application to real images and the lack of a need for comprehensive
training data. Given that complete illumination invariants do not exist [7], one must con-
tent oneself with finding representations that are resistant to the most common classes
of natural illumination variations. Most methods exploit the fact that these are typically
characterized by relatively low spatial frequencies. For example, the Multiscale Retinex
(MSR) method of Jobson et al. [13] normalizes the illumination by dividing the image
by a smoothed version of itself. A similar idea (with a different local filter) is used by
1
One exception is Local Gabor Binary Pattern Histogram Sequences [26] whose Gabor magni-
tude mapping can be regarded as a special kind of preprocessing for LBP.

170 X. Tan and B. Triggs
Wang et al. [23] in the Self Quotient Image model (SQI). More recently, Chen et al.
[8] improved SQI by using Logarithmic Total Variation (LTV) smoothing, and Gross &
Brajovic (GB) [10] developed an anisotropic smoothing method that relies on the itera-
tive estimation of a blurred version of the original image. Some comparative results for
these and related works can be found in [21].
In this paper we adopt the “canonical form” philosophy, basing our method on a
chain of efficient processing steps that normalize for various effects of the changing
illumination environment. The main advantages of our method are simplicity, compu-
tational efficiency and robustness to lighting changes and other image quality degrada-
tions such as blurring.
We describe our LBP/LTP face descriptors and their distance transform based sim-
ilarity metric in the next two sections, detailing our preprocessing method in §5and
concluding with experiments and discussion.
3 Local Ternary Patterns
3.1 Local Binary Patterns (LBP)
Ojala et al. [17] introduced the Local Binary Pattern operator in 1996 as a means of
summarizing local gray-level structure. The operator takes a local neighborhood around
each pixel, thresholds the pixels of the neighborhood at the value of the central pixel
and uses the resulting binary-valued image patch as a local image descriptor. It was
originally defined for 3×3 neighborhoods, giving 8 bit codes based on the 8 pixels
around the central one. Formally, the LBP operator takes the form
LBP (x
c
,y
c
)=
7
n=0
2
n
s(i
n
i
c
) (1)
where in this case n runs over the 8 neighbors of the central pixel c, i
c
and i
n
are the
gray-level values at c and n,ands(u) is 1 if u 0 and 0 otherwise. The LBP encoding
process is illustrated in fig. 1.
Two extensions of the original operator were made in [18]. The first defined LBP’s
for neighborhoods of different sizes, thus making it feasible to deal with textures at
different scales. The second defined the so-called uniform patterns: an LBP is ‘uniform’
if it contains at most one 0-1 and one 1-0 transition when viewed as a circular bit string.
For example, the LBP code in fig. 1 is uniform. Uniformity is an important concept in
the LBP methodology, representing primitive structural information such as edges and
corners. Ojala et al. observed that although only 58 of the 256 8-bit patterns are uniform,
Fig.1. Illustration of the basic LBP operator

Enhanced Local Texture Feature Sets for Face Recognition 171
nearly 90 percent of all observed image neighbourhoods are uniform. In methods that
histogram LBP’s, the number of bins can be thus significantly reduced by assigning all
non-uniform patterns to a single bin, often without losing too much information.
3.2 Local Ternary Patterns (LTP)
LBP’s are resistant to lighting effects in the sense that they are invariant to monotonic
gray-leveltransformations, and they have been shown to have high discriminativepower
for texture classification [17]. However because they threshold at exactly the value of
the central pixel i
c
they tend to be sensitive to noise, especially in near-uniform image
regions. Given that many facial regions are relatively uniform, it is potentially useful to
improve the robustness of the underlying descriptors in these areas.
This section extends LBP to 3-valued codes, Local Ternary Patterns, in which gray-
levels in a zone of width ±t around i
c
are quantized to zero, ones above this are quan-
tized to +1 and ones below it to 1, i.e. the indicator s(u) is replaced by a 3-valued
function:
s
(u, i
c
,t)=
1,u i
c
+ t
0, |u i
c
| <t
1,u i
c
t
(2)
and the binary LBP code is replaced by a ternary LTP code. Here t is a user-specified
threshold (so LTP codes more resistant to noise, but no longer strictly invariant to gray-
level transformations). The LTP encoding procedure is illustrated in fig. 2. Here the
threshold t was set to 5, so the tolerance interval is [49, 59].
Fig.2. Illustration of the basic LTP operator
When using LTP for visual matching we could use 3
n
valued codes, but the uniform
pattern argument also applies in the ternary case. For simplicity the experiments below
use a coding scheme that splits each ternary pattern into its positive and negative parts
as illustrated in fig. 3, subsequently treating these as two separate channels of LBP de-
scriptors for which separate histograms and similarity metrics are computed, combining
these only at the end of the computation.
LTP’s bear some similarity to the texture spectrum (TS) technique from the early
1990’s [24]. However TS did not include preprocessing, thresholding, local histograms
or uniform pattern based dimensionality reduction and it was not tested on faces.

172 X. Tan and B. Triggs
Fig.3. An example of the splitting of an LTP code into positive and negative LBP codes
4 Distance Transform Based Similarity Metric
T. Ahonen et al. introduced an LBP based method for face recognition [1] that divides
the face into a regular grid of cells and histograms the uniform LBP’s within each cell,
finally using nearest neighbor classification in the χ
2
histogram distance for recogni-
tion:
χ
2
(p, q)=
i
(p
i
q
i
)
2
p
i
+ q
i
(3)
Here p, q are two image descriptors (histogram vectors). Excellent results were obtained
on the FERET dataset.
Possible criticisms of this method are that subdividing the face into a regular grid is
somewhat arbitrary (cells are not necessarily well aligned with facial features), and that
partitioning appearance descriptors into grid cells is likely to cause both aliasing (due to
abrupt spatial quantization) and loss of spatial resolution (as position within a grid cell
is not coded). Given that the aim of coding is to provide illumination- and outlier-robust
appearance-based correspondence with some leeway for small spatial deviations due to
misalignment, it seems more appropriate to use a Hausdorff distance like similarity
metric that takes each LBP or LTP pixel code in image X and tests whether a similar
code appears at a nearby position in image Y , with a weighting that decreases smoothly
with image distance. Such a scheme should be able to achieve discriminant appearance-
based image matching with a well-controllable degree of spatial looseness.
We can achieve this using Distance Transforms [6]. Given a 2-D reference image X,
we find its image of LBP or LTP codes and transform this into a set of sparse binary
images b
k
, one for each possible LBP or LTP code value k (i.e. 59 images for uniform
codes). Each b
k
specifies the pixel positions at which its particular LBP or LTP code
value appears. We then calculate the distance transform image d
k
of each b
k
. Each pixel
of d
k
gives the distance to the nearest image X pixel with code k (2D Euclidean distance
is used in the experiments below). The distance or similarity metric from image X to
image Y is then:
D(X, Y )=
pixels (i, j) of Y
w(d
k
Y
(i,j)
X
(i, j)) (4)

Citations
More filters
Journal ArticleDOI

A Completed Modeling of Local Binary Pattern Operator for Texture Classification

TL;DR: It is shown that CLBP_S preserves more information of the local structure thanCLBP_M, which explains why the simple LBP operator can extract the texture features reasonably well and can be made for rotation invariant texture classification.
Journal ArticleDOI

Enhanced Computer Vision With Microsoft Kinect Sensor: A Review

TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.
Journal ArticleDOI

PCANet: A Simple Deep Learning Baseline for Image Classification?

TL;DR: PCANet as discussed by the authors is a simple deep learning network for image classification which comprises only the very basic data processing components: cascaded principal component analysis (PCA), binary hashing, and block-wise histograms.
Journal ArticleDOI

PCANet: A Simple Deep Learning Baseline for Image Classification?

TL;DR: Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)].
Journal ArticleDOI

WLD: A Robust Local Image Descriptor

TL;DR: Experimental results on the Brodatz and KTH-TIPS2-a texture databases show that WLD impressively outperforms the other widely used descriptors (e.g., Gabor and SIFT), and experimental results on human face detection also show a promising performance comparable to the best known results onThe MIT+CMU frontal face test set, the AR face data set, and the CMU profile test set.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Nonlinear total variation based noise removal algorithms

TL;DR: In this article, a constrained optimization type of numerical algorithm for removing noise from images is presented, where the total variation of the image is minimized subject to constraints involving the statistics of the noise.
Journal ArticleDOI

Eigenfaces for recognition

TL;DR: A near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals, and that is easy to implement using a neural network architecture.
Journal ArticleDOI

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Journal ArticleDOI

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Enhanced local texture feature sets for face recognition under difficult lighting conditions" ?

The authors address this by combining the strengths of robust illumination normalization, local texture based face representations and distance transform based matching metrics. Specifically, the authors make three main contributions: ( i ) they present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition ; ( ii ) they introduce Local Ternary Patterns ( LTP ), a generalization of the Local Binary Pattern ( LBP ) local texture descriptor that is more discriminant and less sensitive to noise in uniform regions ; and ( iii ) they show that replacing local histogramming with a local distance transform based similarity metric further improves the performance of LBP/LTP based face recognition. 

The main advantages of their method are simplicity, computational efficiency and robustness to lighting changes and other image quality degradations such as blurring. 

Their (unoptimized Matlab) implementation takes only about 50 ms to process a 120×120 pixel face image on a 2.8 GHz P4, allowing face preprocessing to be performed in real time. 

Since run time is a critical factor in many practical applications, it is also interesting to consider the computational load of their normalization chain. 

For a given target the transform can be computed and mapped through w() in a preprocessing step, after which matching to any subsequent image takes O(number of pixels) irrespective of the number of code values. 

Possible criticisms of this method are that subdividing the face into a regular grid is somewhat arbitrary (cells are not necessarily well aligned with facial features), and that partitioning appearance descriptors into grid cells is likely to cause both aliasing (due to abrupt spatial quantization) and loss of spatial resolution (as position within a grid cell is not coded). 

for some datasets it also helps to offset the center of the larger filter by 1–2 pixels relative to the center of the smaller one, so that the final prefilter is effectively the sum of a centered DoG and a low pass spatial derivative. 

To reduce their influence on subsequent stages of processing, the authors finally apply a nonlinear function to compress over-large values. 

Ojala et al. observed that although only 58 of the 256 8-bit patterns are uniform,nearly 90 percent of all observed image neighbourhoods are uniform. 

The operator takes a local neighborhood around each pixel, thresholds the pixels of the neighborhood at the value of the central pixel and uses the resulting binary-valued image patch as a local image descriptor. 

All of the images undergo the same geometric normalization prior to analysis: conversion to 8 bit gray-scale images; rigid scaling and image rotation to place the centers of the two eyes at fixed positions, using the eye coordinates supplied with the original datasets; and image cropping to 120×120 pixels. 

Each pixel of dk gives the distance to the nearest image X pixel with code k (2D Euclidean distance is used in the experiments below). 

Fig. 8 shows the extent to which standard LBP can be improved by combining the three enhancements proposed in this paper: using preprocessing (PP); replacing LBP with LTP; replacing local histogramming and the χ2 histogram distance with the Distance Transform based similarity metric (DT).