Open AccessJournal ArticleDOI

Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions

Q: What contributions have the authors mentioned in the paper "Enhanced local texture feature sets for face recognition under difficult lighting conditions" ?

The authors address this by combining the strengths of robust illumination normalization, local texture based face representations and distance transform based matching metrics. Specifically, the authors make three main contributions: ( i ) they present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition ; ( ii ) they introduce Local Ternary Patterns ( LTP ), a generalization of the Local Binary Pattern ( LBP ) local texture descriptor that is more discriminant and less sensitive to noise in uniform regions ; and ( iii ) they show that replacing local histogramming with a local distance transform based similarity metric further improves the performance of LBP/LTP based face recognition.

Q: What are the main advantages of their method?

The main advantages of their method are simplicity, computational efficiency and robustness to lighting changes and other image quality degradations such as blurring.

Q: How long does it take to process a face image?

Their (unoptimized Matlab) implementation takes only about 50 ms to process a 120×120 pixel face image on a 2.8 GHz P4, allowing face preprocessing to be performed in real time.

Q: What is the main reason why the authors have chosen to normalize the image?

Since run time is a critical factor in many practical applications, it is also interesting to consider the computational load of their normalization chain.

Q: How many pixels can be mapped to a given target?

For a given target the transform can be computed and mapped through w() in a preprocessing step, after which matching to any subsequent image takes O(number of pixels) irrespective of the number of code values.

Q: What are the main criticisms of the LBP method?

Possible criticisms of this method are that subdividing the face into a regular grid is somewhat arbitrary (cells are not necessarily well aligned with facial features), and that partitioning appearance descriptors into grid cells is likely to cause both aliasing (due to abrupt spatial quantization) and loss of spatial resolution (as position within a grid cell is not coded).

Q: What is the way to offset the center of the larger filter?

for some datasets it also helps to offset the center of the larger filter by 1–2 pixels relative to the center of the smaller one, so that the final prefilter is effectively the sum of a centered DoG and a low pass spatial derivative.

Q: What is the effect of the nonlinear function on the LBP feature set?

To reduce their influence on subsequent stages of processing, the authors finally apply a nonlinear function to compress over-large values.

Xiaoyang Tan, +1 more

- 01 Jun 2010 -

IEEE Transactions on Image Processing

- Vol. 19, Iss: 6, pp 1635-1650

TLDR

This work presents a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition, and improves robustness by adding Kernel principal component analysis (PCA) feature extraction and incorporating rich local appearance cues from two complementary sources.

Abstract:

Making recognition more reliable under uncontrolled lighting conditions is one of the most important challenges for practical face recognition systems. We tackle this by combining the strengths of robust illumination normalization, local texture-based face representations, distance transform based matching, kernel-based feature extraction and multiple feature fusion. Specifically, we make three main contributions: 1) we present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition; 2) we introduce local ternary patterns (LTP), a generalization of the local binary pattern (LBP) local texture descriptor that is more discriminant and less sensitive to noise in uniform regions, and we show that replacing comparisons based on local spatial histograms with a distance transform based similarity metric further improves the performance of LBP/LTP based face recognition; and 3) we further improve robustness by adding Kernel principal component analysis (PCA) feature extraction and incorporating rich local appearance cues from two complementary sources-Gabor wavelets and LBP-showing that the combination is considerably more accurate than either feature set alone. The resulting method provides state-of-the-art performance on three data sets that are widely used for testing recognition under difficult illumination conditions: Extended Yale-B, CAS-PEAL-R1, and Face Recognition Grand Challenge version 2 experiment 4 (FRGC-204). For example, on the challenging FRGC-204 data set it halves the error rate relative to previously published methods, achieving a face verification rate of 88.1% at 0.1% false accept rate. Further experiments show that our preprocessing method outperforms several existing preprocessors for a range of feature sets, data sets and lighting conditions.

Content maybe subject to copyright Report

Enhanced Local Texture Feature Sets for Face

Recognition Under Difﬁcult Lighting Conditions

Xiaoyang Tan and Bill Triggs

INRIA & Laboratoire Jean Kuntzmann, 655 avenue de l’Europe, Montbonnot 38330, France

{xiaoyang.tan,bill.triggs}@imag.fr

Abstract. Recognition in uncontrolled situations is one of the most important

bottlenecks for practical face recognition systems. We address this by combining

the strengths of robust illumination normalization, local texture based face repre-

sentations and distance transform based matching metrics. Speciﬁcally, we make

three main contributions: (i) we present a simple and efﬁcient preprocessing chain

that eliminates most of the effects of changing illumination while still preserving

the essential appearance details that are needed for recognition; (ii) we introduce

Local Ternary Patterns (LTP), a generalization of the Local Binary Pattern (LBP)

local texture descriptor that is more discriminant and less sensitive to noise in

uniform regions; and (iii) we show that replacing local histogramming with a lo-

cal distance transform based similarity metric further improves the performance

of LBP/LTP based face recognition. The resulting method gives state-of-the-art

performance on three popular datasets chosen to test recognition under difﬁcult

illumination conditions: Face Recognition Grand Challenge version 1 experiment

4, Extended Yale-B, and CMU PIE.

1 Introduction

One of the key challenges of face recognition is ﬁnding efﬁcient and discriminative fa-

cial appearance descriptors that can counteract large variations in illumination, pose,

facial expression, ageing, partial occlusions and other changes [27]. There are two

main approaches: geometric feature-based descriptors and appearance-based descrip-

tors. Geometric descriptors can be hard to extract reliably under variations in facial

appearance, while appearance-based ones such as eigenfaces tend to blur out small de-

tails owing to residual spatial registration errors. Recently, representations based on

local pooling of local appearance descriptors have drawn increasing attention because

they can capture small appearance details in the descriptors while remaining resistant

to registration errors owing to local pooling. Another motivation is the observation that

human visual perception is well-adapted to extracting and pooling local structural in-

formation (‘micro-patterns’) from images [2]. Methods in this category include Gabor

wavelets [16], local autocorrelation ﬁlters [11], and Local Binary Patterns [1].

In this paper we focus on Local Binary Patterns (LBP) and their generalizations.

LBP’s are a computationally efﬁcient nonparametric local image texture descriptor.

They have been used with considerable success in a number of visual recognition tasks

including face recognition [1,2,20]. LBP features are invariant to monotonic gray-level

S.K. Zhou et al. (Eds.): AMFG 2007, LNCS 4778, pp. 168–182, 2007.

 Springer-Verlag Berlin Heidelberg 2007

Enhanced Local Texture Feature Sets for Face Recognition 169

changes by design and thus are usually considered to require no image preprocessing

before use

. In fact, LBP itself is sometimes used as a lighting normalization stage for

other methods [12]. However, in practice the reliability of LBP decreases signiﬁcantly

under large illumination variations (c.f . table 3). Lighting effects involve complex local

interactions and the resulting images often violate LBP’s basic assumption that gray-

level changes monotonically. We have addressed this problem by developing a simple

and efﬁcient image preprocessing chain that greatly reduces the inﬂuence of illumina-

tion variations, local shadowing and highlights while preserving the elements of visual

appearance that are needed for recognition.

Another limitation of LBP is its sensitivity to random and quantization noise in uni-

form and near-uniform image regions such as the forehead and cheeks. To counter this

we extend LBP to Local Ternary Patterns (LTP), a 3-valued coding that includes a

threshold around zero for improved resistance to noise. LTP inherits most of the other

key advantages of LBP such as computational efﬁciency.

Current LBP based face recognition methods partition the face image into a grid

of ﬁxed-size cells for the local pooling of texture descriptors (LBP histograms). This

coarse (and typically abrupt) spatial quantization is somewhat arbitrary and not neces-

sarily well adapted to local facial morphology. It inevitably causes some loss of dis-

criminative power. To counter this we use distance transform techniques to create local

texture comparison metrics that have more controlled spatial gradings.

To illustrate the effectiveness of our approach we present experimental results on

three state-of-the-art face recognition datasets containing large lighting variations sim-

ilar to those encountered in natural images taken under uncontrolled conditions: Face

Recognition Grand Challenge version 1 experiment1.0.4 (‘FRGC-104’) [19]; Extended

Yale illumination face database B (‘Extended Yale-B’) [9,15]; and CMU PIE [22].

2 Related Work

As emphasized by the recent FRVT and FRGC trials [19], illumination variations are

one of the most important bottlenecks for practical face recognition systems. Gener-

ally, one can cope with this in two ways. The ﬁrst uses training examples to learn a

global model of the possible illumination variations, for example a linear subspace or

manifold model, which then generalizes to the variations seen in new images [5,3]. The

disadvantage is that many training images are required.

The second approach seeks conventional image processing transformations that re-

duce the image to a more “canonical” form in which the variations are suppressed. This

has the merit of easy application to real images and the lack of a need for comprehensive

training data. Given that complete illumination invariants do not exist [7], one must con-

tent oneself with ﬁnding representations that are resistant to the most common classes

of natural illumination variations. Most methods exploit the fact that these are typically

characterized by relatively low spatial frequencies. For example, the Multiscale Retinex

(MSR) method of Jobson et al. [13] normalizes the illumination by dividing the image

by a smoothed version of itself. A similar idea (with a different local ﬁlter) is used by

One exception is Local Gabor Binary Pattern Histogram Sequences [26] whose Gabor magni-

tude mapping can be regarded as a special kind of preprocessing for LBP.

170 X. Tan and B. Triggs

Wang et al. [23] in the Self Quotient Image model (SQI). More recently, Chen et al.

[8] improved SQI by using Logarithmic Total Variation (LTV) smoothing, and Gross &

Brajovic (GB) [10] developed an anisotropic smoothing method that relies on the itera-

tive estimation of a blurred version of the original image. Some comparative results for

these and related works can be found in [21].

In this paper we adopt the “canonical form” philosophy, basing our method on a

chain of efﬁcient processing steps that normalize for various effects of the changing

illumination environment. The main advantages of our method are simplicity, compu-

tational efﬁciency and robustness to lighting changes and other image quality degrada-

tions such as blurring.

We describe our LBP/LTP face descriptors and their distance transform based sim-

ilarity metric in the next two sections, detailing our preprocessing method in §5and

concluding with experiments and discussion.

3 Local Ternary Patterns

3.1 Local Binary Patterns (LBP)

Ojala et al. [17] introduced the Local Binary Pattern operator in 1996 as a means of

summarizing local gray-level structure. The operator takes a local neighborhood around

each pixel, thresholds the pixels of the neighborhood at the value of the central pixel

and uses the resulting binary-valued image patch as a local image descriptor. It was

originally deﬁned for 3×3 neighborhoods, giving 8 bit codes based on the 8 pixels

around the central one. Formally, the LBP operator takes the form

LBP (x



n=0

s(i

− i

) (1)

where in this case n runs over the 8 neighbors of the central pixel c, i

and i

are the

gray-level values at c and n,ands(u) is 1 if u ≥ 0 and 0 otherwise. The LBP encoding

process is illustrated in ﬁg. 1.

Two extensions of the original operator were made in [18]. The ﬁrst deﬁned LBP’s

for neighborhoods of different sizes, thus making it feasible to deal with textures at

different scales. The second deﬁned the so-called uniform patterns: an LBP is ‘uniform’

if it contains at most one 0-1 and one 1-0 transition when viewed as a circular bit string.

For example, the LBP code in ﬁg. 1 is uniform. Uniformity is an important concept in

the LBP methodology, representing primitive structural information such as edges and

corners. Ojala et al. observed that although only 58 of the 256 8-bit patterns are uniform,

Fig.1. Illustration of the basic LBP operator

Enhanced Local Texture Feature Sets for Face Recognition 171

nearly 90 percent of all observed image neighbourhoods are uniform. In methods that

histogram LBP’s, the number of bins can be thus signiﬁcantly reduced by assigning all

non-uniform patterns to a single bin, often without losing too much information.

3.2 Local Ternary Patterns (LTP)

LBP’s are resistant to lighting effects in the sense that they are invariant to monotonic

gray-leveltransformations, and they have been shown to have high discriminativepower

for texture classiﬁcation [17]. However because they threshold at exactly the value of

the central pixel i

they tend to be sensitive to noise, especially in near-uniform image

regions. Given that many facial regions are relatively uniform, it is potentially useful to

improve the robustness of the underlying descriptors in these areas.

This section extends LBP to 3-valued codes, Local Ternary Patterns, in which gray-

levels in a zone of width ±t around i

are quantized to zero, ones above this are quan-

tized to +1 and ones below it to −1, i.e. the indicator s(u) is replaced by a 3-valued

function:



(u, i

,t)=

⎧

⎨

⎩

1,u≥ i

+ t

0, |u − i

| <t

−1,u≤ i

− t

(2)

and the binary LBP code is replaced by a ternary LTP code. Here t is a user-speciﬁed

threshold (so LTP codes more resistant to noise, but no longer strictly invariant to gray-

level transformations). The LTP encoding procedure is illustrated in ﬁg. 2. Here the

threshold t was set to 5, so the tolerance interval is [49, 59].

Fig.2. Illustration of the basic LTP operator

When using LTP for visual matching we could use 3

valued codes, but the uniform

pattern argument also applies in the ternary case. For simplicity the experiments below

use a coding scheme that splits each ternary pattern into its positive and negative parts

as illustrated in ﬁg. 3, subsequently treating these as two separate channels of LBP de-

scriptors for which separate histograms and similarity metrics are computed, combining

these only at the end of the computation.

LTP’s bear some similarity to the texture spectrum (TS) technique from the early

1990’s [24]. However TS did not include preprocessing, thresholding, local histograms

or uniform pattern based dimensionality reduction and it was not tested on faces.

172 X. Tan and B. Triggs

Fig.3. An example of the splitting of an LTP code into positive and negative LBP codes

4 Distance Transform Based Similarity Metric

T. Ahonen et al. introduced an LBP based method for face recognition [1] that divides

the face into a regular grid of cells and histograms the uniform LBP’s within each cell,

ﬁnally using nearest neighbor classiﬁcation in the χ

histogram distance for recogni-

tion:

(p, q)=



− q

)

+ q

(3)

Here p, q are two image descriptors (histogram vectors). Excellent results were obtained

on the FERET dataset.

Possible criticisms of this method are that subdividing the face into a regular grid is

somewhat arbitrary (cells are not necessarily well aligned with facial features), and that

partitioning appearance descriptors into grid cells is likely to cause both aliasing (due to

abrupt spatial quantization) and loss of spatial resolution (as position within a grid cell

is not coded). Given that the aim of coding is to provide illumination- and outlier-robust

appearance-based correspondence with some leeway for small spatial deviations due to

misalignment, it seems more appropriate to use a Hausdorff distance like similarity

metric that takes each LBP or LTP pixel code in image X and tests whether a similar

code appears at a nearby position in image Y , with a weighting that decreases smoothly

with image distance. Such a scheme should be able to achieve discriminant appearance-

based image matching with a well-controllable degree of spatial looseness.

We can achieve this using Distance Transforms [6]. Given a 2-D reference image X,

we ﬁnd its image of LBP or LTP codes and transform this into a set of sparse binary

images b

, one for each possible LBP or LTP code value k (i.e. 59 images for uniform

codes). Each b

speciﬁes the pixel positions at which its particular LBP or LTP code

value appears. We then calculate the distance transform image d

of each b

. Each pixel

of d

gives the distance to the nearest image X pixel with code k (2D Euclidean distance

is used in the experiments below). The distance or similarity metric from image X to

image Y is then:

D(X, Y )=



pixels (i, j) of Y

w(d

(i,j)

(i, j)) (4)

HTML Viewer

Figures

Fig. 8. Overall results for the proposed methods on FRGC-104

Fig. 4. From left to right: a binary layer, its distance transform, and the truncated linear version of this

Table 1. Default parameter settings for our methods

Fig. 7. Examples of images from FRGC-104: (a) target images (upper row) and query images (lower row) without illumination preprocessing; (b) the corresponding illumination normalized images from the proposed preprocessing chain

Fig. 11. Example images of one person from the CMU PIE database: (a) original images; (b) the corresponding normalized images obtained with the proposed preprocessing chain

Fig. 1. Illustration of the basic LBP operator

Open Access

More filters

Timo Ojala, +2 more

- 01 Jul 2002 -

IEEE Transactions on Pattern Analysis an...

Face Description with Local Binary Patterns: Application to Face Recognition

Timo Ahonen, +2 more

- 01 Dec 2006 -

IEEE Transactions on Pattern Analysis an...

A comparative study of texture measures with classification based on featured distributions

Timo Ojala, +2 more

- 01 Jan 1996 -

Pattern Recognition

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Peter N. Belhumeur, +2 more

- 01 Jul 1997 -

IEEE Transactions on Pattern Analysis an...

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004 -

International Journal of Computer Vision

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Enhanced local texture feature sets for face recognition under difficult lighting conditions" ?

The authors address this by combining the strengths of robust illumination normalization, local texture based face representations and distance transform based matching metrics. Specifically, the authors make three main contributions: ( i ) they present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition ; ( ii ) they introduce Local Ternary Patterns ( LTP ), a generalization of the Local Binary Pattern ( LBP ) local texture descriptor that is more discriminant and less sensitive to noise in uniform regions ; and ( iii ) they show that replacing local histogramming with a local distance transform based similarity metric further improves the performance of LBP/LTP based face recognition.

Q2. What are the main advantages of their method?

The main advantages of their method are simplicity, computational efficiency and robustness to lighting changes and other image quality degradations such as blurring.

Q3. How long does it take to process a face image?

Their (unoptimized Matlab) implementation takes only about 50 ms to process a 120×120 pixel face image on a 2.8 GHz P4, allowing face preprocessing to be performed in real time.

Q4. What is the main reason why the authors have chosen to normalize the image?

Since run time is a critical factor in many practical applications, it is also interesting to consider the computational load of their normalization chain.

Q5. How many pixels can be mapped to a given target?

For a given target the transform can be computed and mapped through w() in a preprocessing step, after which matching to any subsequent image takes O(number of pixels) irrespective of the number of code values.

Q6. What are the main criticisms of the LBP method?

Possible criticisms of this method are that subdividing the face into a regular grid is somewhat arbitrary (cells are not necessarily well aligned with facial features), and that partitioning appearance descriptors into grid cells is likely to cause both aliasing (due to abrupt spatial quantization) and loss of spatial resolution (as position within a grid cell is not coded).

Q7. What is the way to offset the center of the larger filter?

for some datasets it also helps to offset the center of the larger filter by 1–2 pixels relative to the center of the smaller one, so that the final prefilter is effectively the sum of a centered DoG and a low pass spatial derivative.

Q8. What is the effect of the nonlinear function on the LBP feature set?

To reduce their influence on subsequent stages of processing, the authors finally apply a nonlinear function to compress over-large values.

Q9. How many of the 256 8-bit patterns are uniform?

Ojala et al. observed that although only 58 of the 256 8-bit patterns are uniform,nearly 90 percent of all observed image neighbourhoods are uniform.

Q10. What is the local binary pattern operator?

The operator takes a local neighborhood around each pixel, thresholds the pixels of the neighborhood at the value of the central pixel and uses the resulting binary-valued image patch as a local image descriptor.

Q11. How many pixels are used to place the center of the two eyes?

All of the images undergo the same geometric normalization prior to analysis: conversion to 8 bit gray-scale images; rigid scaling and image rotation to place the centers of the two eyes at fixed positions, using the eye coordinates supplied with the original datasets; and image cropping to 120×120 pixels.

Q12. How many pixels of dk give the distance to the nearest image X pixel?

Each pixel of dk gives the distance to the nearest image X pixel with code k (2D Euclidean distance is used in the experiments below).

Q13. What is the way to improve the performance of standard LBP?

Fig. 8 shows the extent to which standard LBP can be improved by combining the three enhancements proposed in this paper: using preprocessing (PP); replacing LBP with LTP; replacing local histogramming and the χ2 histogram distance with the Distance Transform based similarity metric (DT).

Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions

Figures

Citations

A Completed Modeling of Local Binary Pattern Operator for Texture Classification

Enhanced Computer Vision With Microsoft Kinect Sensor: A Review

PCANet: A Simple Deep Learning Baseline for Image Classification?

PCANet: A Simple Deep Learning Baseline for Image Classification?

WLD: A Robust Local Image Descriptor

References

Histograms of oriented gradients for human detection

Nonlinear total variation based noise removal algorithms

Eigenfaces for recognition

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Related Papers (5)

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

Face Description with Local Binary Patterns: Application to Face Recognition

A comparative study of texture measures with classification based on featured distributions

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Distinctive Image Features from Scale-Invariant Keypoints

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Enhanced local texture feature sets for face recognition under difficult lighting conditions" ?

Q2. What are the main advantages of their method?

Q3. How long does it take to process a face image?

Q4. What is the main reason why the authors have chosen to normalize the image?

Q5. How many pixels can be mapped to a given target?

Q6. What are the main criticisms of the LBP method?

Q7. What is the way to offset the center of the larger filter?

Q8. What is the effect of the nonlinear function on the LBP feature set?

Q9. How many of the 256 8-bit patterns are uniform?

Q10. What is the local binary pattern operator?

Q11. How many pixels are used to place the center of the two eyes?

Q12. How many pixels of dk give the distance to the nearest image X pixel?

Q13. What is the way to improve the performance of standard LBP?