scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Heterogeneous Face Recognition Using Kernel Prototype Similarities

01 Jun 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 35, Iss: 6, pp 1410-1422
TL;DR: A generic HFR framework is proposed in which both probe and gallery images are represented in terms of nonlinear similarities to a collection of prototype face images, and Random sampling is introduced into the H FR framework to better handle challenges arising from the small sample size problem.
Abstract: Heterogeneous face recognition (HFR) involves matching two face images from alternate imaging modalities, such as an infrared image to a photograph or a sketch to a photograph. Accurate HFR systems are of great value in various applications (e.g., forensics and surveillance), where the gallery databases are populated with photographs (e.g., mug shot or passport photographs) but the probe images are often limited to some alternate modality. A generic HFR framework is proposed in which both probe and gallery images are represented in terms of nonlinear similarities to a collection of prototype face images. The prototype subjects (i.e., the training set) have an image in each modality (probe and gallery), and the similarity of an image is measured against the prototype images from the corresponding modality. The accuracy of this nonlinear prototype representation is improved by projecting the features into a linear discriminant subspace. Random sampling is introduced into the HFR framework to better handle challenges arising from the small sample size problem. The merits of the proposed approach, called prototype random subspace (P-RS), are demonstrated on four different heterogeneous scenarios: 1) near infrared (NIR) to photograph, 2) thermal to photograph, 3) viewed sketch to photograph, and 4) forensic sketch to photograph.

Summary (6 min read)

1 INTRODUCTION

  • AN emerging topic in face recognition is matchingbetween heterogeneous image modalities.
  • Coined heterogeneous face recognition (HFR) [1], the scenario offers potential solutions to many difficult face recognition scenarios.
  • While heterogeneous face recognition can involve matching between any two imaging modalities, the majority of scenarios involve a gallery dataset consisting of visible light photographs.
  • Probe images can be of any other modality, though the practical scenarios of interest to us are infrared images (NIR and thermal) and hand-drawn facial sketches.
  • When a subject’s face can only be acquired in nighttime environments, the use of infrared imaging may be the only modality for acquiring a useful face image of the subject.

2.1 Heterogeneous Face Recognition

  • A flurry of research has emerged providing solutions to various heterogeneous face recognition problems.
  • This began with sketch recognition using viewed sketches,1 and has continued into other modalities such as nearinfrared (NIR) and forensic sketches.
  • Published by the IEEE Computer Society [3].
  • Klare and Jain followed this work on NIR to VIS face recognition by also incorporating SIFT feature descriptors and an RS-LDA scheme [10].

2.2 Kernel Prototype Representation

  • The core of the proposed approach involves using a relational feature representation for face images (illustrated in Fig. 2).
  • One key to their framework is that each prototype has a pattern for each image modality.
  • Kernel PCA [21] and Kernel LDA [22], [23] approaches to face recognition have used a similar approach, where a face is represented as the kernel similarity to a collection of prototype images in a high-dimensional space.
  • The biometric indexing scheme by Gyaourova and Ross used similarity scores to a fixed set of references in the face and fingerprint modality [24].
  • These prior works differ from the proposed method because only a single prototype is used per training subject.

2.3 Proposed Method

  • The proposed method presents a new approach to heterogeneous face recognition, and extends existing methods in face recognition.
  • Unlike previous feature-based methods, where an image descriptor invariant to changes between the two HFR modalities was needed, the proposed framework only needs descriptors that are effective within each domain.
  • The accuracy of the HFR system is improved using a random subspace framework in conjunction with linear discriminant analysis (LDA), as described in Section 5.
  • While the authors demonstrate the strength of the proposed framework on many different HFR scenarios, the parameters controlling the framework are the same across all tested scenarios.

3 IMAGE PREPROCESSING AND REPRESENTATION

  • All face images are initially represented using a featurebased representation.
  • The use of local feature descriptors has been argued to closely resemble the postulated representation of the human visual processing system [26], and they have been shown to be well suited for face recognition [27].

3.1 Geometric Normalization

  • The first step in representing face images using feature descriptors is to geometrically normalize the face images with respect to the location of the eyes.
  • This step reduces the effect of scale, rotation, and translation variations.
  • The eye locations for the face images from all modalities are automatically estimated using Cognitec’s FaceVACS SDK [28].
  • The only exceptions are the thermal face images where the eyes are manually located for both the proposed method and the FaceVACS baseline.

3.2 Image Filtering

  • Face images are filtered with three different image filters.
  • These filters are intended to help compensate for both intensity variations within an image domain (such as nonuniform illumination changes), as well appearance variations between image domains.
  • The second aspect is of particular importance for the direct random subspace (D-RS) framework (see Section 6).
  • The three image filters used are as follows.

3.2.2 Center-Surround Divisive Normalization (CSDN)

  • Meyers and Wolf [30] introduced the center-surround divisive normalization filter in conjunction with their biologically inspired face recognition framework.
  • The CSDN filter divides the value of each pixel by the mean pixel value in the s s neighborhood surrounding the pixel.

3.2.3 Gaussian

  • The Gaussian smoothing filter has long been used in image processing applications to remove noise contained in high spatial frequencies while retaining the remainder of the signal.
  • The width of the filter used in their implementation was ¼.

3.3 Local Descriptor Representation

  • Once an image is geometrically normalized and filtered using one of the three filters, local feature descriptors are extracted from uniformly distributed patches across the face.
  • The authors use two different feature descriptors to represent the face image: the SIFT descriptor [14] and Local Binary Patterns [13].
  • LBP features have a longer history of successful use in face recognition.
  • Each patch overlaps its vertical and horizontal neighbors by 16 pixels.
  • Using uniform patterns at eight sampling locations, as decribed by Ojala et al. [13], the LBP descriptor yields a 59D feature descriptor.

4.1 Prototype Representation

  • The heterogeneous prototype framework begins with images from the probe and gallery modalities represented by (possibly different) feature descriptors for each of the N image patches, as described in the previous section.
  • The cosine kernel was chosen because it resulted in consistently higher accuracy on all tested scenarios compared to the radial basis function kernel and the polynomial kernel.
  • Additionally, because the feature vectors P ðP Þ and GðGÞ are a measure of the similarity between the test image and the prototype training images, the feature spaces for similarity computation do not have to be the same for the probe and gallery modalities.
  • Fc;DsP ðIÞ denotes the prototype similarity of image I when represented using the CSDN image filter and SIFT descriptors.

4.2 Discriminant Analysis

  • After representing the images in the training set T in the aforementioned prototype representation, the authors next learn linear subspaces using linear discriminant analysis [33] to enhance the discriminative capabilities of the prototype representation ð Þ. LDA (and its variants) has consistently demonstrated its ability to improve the accuracy of various recognition algorithms through feature extraction and dimensionality reduction.
  • The authors learn the linear projection matrix W by following the conventional approach for high-dimensional data, namely, by first applying PCA, followed by LDA [33].
  • In all experiments, the PCA step was used to retain 99.0 percent of the variance.
  • Next, the within-class and between-class scatter matrices of W 01 T X0 , SW and SB, are computed.
  • Letting denote the mean of X, the final representation for an unseen probe or gallery image I using the prototype framework is WT ð ðIÞ Þ. Subsequent uses of W in this work will assume the appropriate removal of the mean from ðIÞ for terseness.

5.1 Motivation

  • The proposed heterogeneous prototype framework uses training data to define the prototypes and to learn the linear subspace projection matrix W .
  • When applying a prototype representation to face recognition, a large number of classes (or subjects) and features are present.
  • Most are designed to handle deficiencies in the subspace W , such as dual-space LDA [34] and direct LDA [37].
  • These methods do not address the issue of too few prototypes for an expressive representation.
  • Their approach combined random subspaces and bagging by sampling both features and training instances.

5.2 Prototype Random Subspaces

  • The prototype random subspace framework uses B differ- ent bags (or samples) of the N face patches.
  • Let fðI; bÞ denote the concatenation of the N descriptors from the randomly selected patch indices in b.
  • For terseness the authors have omitted the superscript F and D in the previous equations.

5.4 Score Level Fusion

  • The proposed framework naturally lends to fusion of the different feature representations.
  • Given one image filter F and two feature descriptorsD1 andD2, one can utilize the following sum of similarity scores between probe image P and gallery image G : fSF;D1F;D1 ðP;GÞ þ S F;D2 F;D2 ðP;GÞ þ SF;D1F;D2 ðP;GÞ þ S F;D2 F;D1 ðP;GÞg. Min-max score normalization is performed prior to fusion.the authors.

6.1 Commercial Matcher

  • The accuracy of the proposed prototype random subspace framework is compared against Cognitec’s FaceVACS [28].
  • Comparing the accuracy of their system against a leading COTS FRS offers an unbiased baseline of a state-ofthe-art commercial matcher on each HFR scenario.
  • FaceVACS was chosen because it is considered as one of the best commercial face matchers and, in their internal tests, it excels at HFR scenarios (with respect to other commercial matchers).
  • The accuracy of FaceVACS on NIR to VIS [10] and Viewed Sketch to VIS [9] performed on par with some previously published HFR methods.

6.2 Direct Random Subspaces

  • In addition to a commercial face recognition system, the proposed prototype recognition system is also compared against a recognition system that directly measures the difference between probe and gallery images using a common feature descriptor representation.
  • The random subspace framework from [10] is used as the baseline because it is the most similar to the proposed prototype framework, thus helping to isolate the difference between using kernel prototype similarities versus directly measuring the similarity.
  • Further, because most of the datasets tested in Section 7 are in the public domain, the proposed framework may also be compared against any other published method on these datasets.
  • This follows from the fact that ff;D1ðIÞ and ff;D2ðIÞ are of generally different dimensionality and also have a different interpretation.
  • D-RS will be used in conjunction with the six filter/ descriptor representations presented in Section 3 (SIFT+DoG, MLBP+CSDN, etc.).

7 EXPERIMENTS

  • The results provided are based on the following parameter values: ¼ 0:1 and B ¼ 30.
  • A cosine kernel was used to compute the prototype similarity and 99.0 percent of the variance was retained in the PCA step of LDA.

7.1 Databases

  • Five different matching scenarios are tested in this paper: four heterogeneous face recognition scenarios and one standard face recognition scenario.
  • Example images from each of HFR dataset can be found in Fig.
  • Results shown on each dataset are the average and standard deviation of five random splits of training and testing subjects.
  • In every experiment, no subject that was used in training was used for testing.

7.1.1 Dataset 1—Near-Infrared to Visible (Fig. 1a)

  • The first dataset consists of 200 subjects with probe images captured in the near-infrared spectrum (~780-1,100 nm) and gallery images captured in the visible spectrum.
  • Portions of this dataset are publicly available for download.
  • Only one NIR and one VIS image per subject are used, making the scenario more difficult than previous experiments which benefited from multiple images per subject in training and 2.
  • The data was split as follows: nt ¼ 133 subjects were used for training set T and the remaining 67 subjects the authors used for testing.

7.1.2 Dataset 2—Thermal to Visible (Fig. 1b)

  • The second dataset is a private dataset collected by the Pinellas County Sheriff’s Office (PCSO) and consists of 1,000 subjects with thermal infrared probe images and visible (mug shot) gallery images.
  • The thermal infrared images were collected using a FLIR Recon III ObservIR camera, which has sensitivity in the range of 3-5 m.
  • The data was split as follows: nt ¼ 667 subjects were used for training set T and the remaining 333 subjects were used for testing.

7.1.3 Dataset 3—Viewed Sketch to Visible (Fig. 1c)

  • The third dataset is the CUHK sketch dataset,3 which was used by Tang and Wang [3], [5].
  • The CUHK dataset consists of 606 subjects with a viewed sketch image for probe and a visible photograph for gallery.
  • The 606 subjects were split to form a training set T with nt ¼ 404 subjects, and the remaining 202 subjects were used for testing.

7.1.4 Dataset 4—Forensic Sketch to Visible (Fig. 1d)

  • The fourth and final heterogeneous face dataset consists of real-world forensic sketches and mug shot photos of 159 subjects.
  • Forensic sketches are drawn by an artist based only on an eye witness description of the subject.
  • The forensic sketch dataset is a collection of images from Gibson [45], Taylor [46], the Michigan State Police, and the Pinellas County Sheriff’s Office.
  • Forensic sketches contain incomplete information regarding the subject and are one of the most difficult HFR scenarios because the sketches often do not closely resemble the photograph.
  • The number of subjects subjects used in T is 106, and 53 subjects are used for the test set.

7.1.5 Dataset 5: Standard Face Recognition

  • A fifth nonheterogeneous (i.e., homogeneous) dataset is used to demonstrate the ability of the proposed approach to operate in standard face recognition scenarios as well.
  • The dataset consists of one probe and one gallery photograph of 876 subjects, where 117 subjects were from the AR dataset [43], 294 subjects were from the XM2VTS dataset [44], 193 subjects from the FERET dataset [47], and 272 subjects were from a private dataset collected at the University of Notre Dame.

7.2 Results

  • Fig. 6 lists the rank retrieval results of P-RS, D-RS, and FaceVACS for each dataset using the additional 10,000 gallery images for each experiment.
  • Regardless, the improved accuracy using a smaller training set of subjects clearly demonstrates the value of the proposed P-RS method.
  • The lower accuracy of P-RS compared to D-RS on the forensic sketch dataset can be attributed to two factors.
  • As shown, the recognition accuracy generally saturates around 100 prototypes.
  • Using the standard face dataset, Fig. 10a compares the accuracy of P-RS, D-RS, and FaceVACS.

8 SUMMARY

  • A method for heterogeneous face recognition, called Prototype Random Subspaces, is proposed.
  • Probe and gallery images are initially filtered with three different image filters, and two different local feature descriptors are then extracted.
  • A training set acts as a set of prototypes in which each prototype subject has an image in both the gallery and probe modalities.
  • Results were compared against a leading commercial face recognition engine.
  • Tailoring the P-RS parameters and learning weighted fusion schemes for each HFR scenario separately should offer further accuracy improvements.

ACKNOWLEDGMENTS

  • The authors would like to thank Scott McCallum and the rest of the his team at the Pinellas County Sheriff’s Office, and Captain Greg Michaud from the Michigan State Police for their gracious support of this research.
  • They would also like to thank Rong Jin and Serhat Bucak for their feedback on this research.
  • This manuscript benefited from the value observations provided in the review process.
  • Anil Jain’s research was partially supported by the World Class University (WCU) program funded by the Ministry of Education, Science and Technology through the National Research Foundation of Korea (R31-10008).

Did you find this useful? Give us your feedback

Figures (10)

Content maybe subject to copyright    Report

Heterogeneous Face Recognition Using
Kernel Prototype Similarities
Brendan F. Klare, Member, IEEE, and Anil K. Jain, Fellow, IEEE
Abstract—Heterogeneous face recognition (HFR) involves matching two face images from alternate imaging modalities, such as an
infrared image to a photograph or a sketch to a photograph. Accurate HFR systems are of great value in various applications (e.g.,
forensics and surveillance), where the gallery databases are populated with photographs (e.g., mug shot or passport photographs) but
the probe images are often limited to some alternate modality. A generic HFR framework is proposed in which both probe and gallery
images are represented in terms of nonlinear similarities to a collection of prototype face images. The prototype subjects (i.e., the
training set) have an image in each modality (probe and gallery), and the similarity of an image is measured against the prototype
images from the corresponding modality. The accuracy of this nonlinear prototype representation is improved by projecting the
features into a linear discriminant subspace. Random sampling is introduced into the HFR framework to better handle challenges
arising from the small sample size problem. The merits of the proposed approach, called prototype random subspace (P-RS), are
demonstrated on four different heterogeneous scenarios: 1) near infrared (NIR) to photograph, 2) thermal to photograph, 3) viewed
sketch to photograph, and 4) forensic sketch to photograph.
Index Terms—Heterogeneous face recognition, prototypes, nonlinear similarity, discriminant analysis, local descriptors, random
subspaces, thermal image, infrared image, forensic sketch
Ç
1INTRODUCTION
A
N emerging topic in face recognition is matching
between heterogeneous image modalities. Coined
heterogeneous face recognition (HFR) [1], the scenario offers
potential solutions to many difficult face recognition
scenarios. While heterogeneous face recognition can involve
matching between any two imaging modalities, the majority
of scenarios involve a gallery dataset consisting of visible
light photographs. Probe images can be of any other
modality, though the practical scenarios of interest to us
are infrared images (NIR and thermal) and hand-drawn
facial sketches.
The motivation behind heterogeneous face recognition is
that circumstances exist in which only a particular modality
of a face image is available for querying a large database of
mug shots (visible band face images). For example, when a
subject’s face can only be acquired in nighttime environ-
ments, the use of infrared imaging may be the only modality
for acquiring a useful face image of the subject. Another
example is situations in which no imaging system was
available to capture the face image of a suspect during a
criminal act. In this case a forensic sketch, drawn by a police
artist based on a verbal description provided by a witness or
the victim, is likely to be the only available source of a face
image. Despite significant progress in the accuracy of face
recognition systems [2], most commercial off-the-shelf
(COTS) face recognition systems (FRS) are not designed to
handle HFR scenarios. The need fo r face r ecognition
systems specifically designed for the task of matching
heterogeneous face images is of substantial interest.
This paper proposes a unified approach to heteroge-
neous face recognition that
1. achieves leading accuracy on multiple HFR scenarios,
2. does not necessitate feature descriptors that are
invariant to changes in image modality,
3. facilitates recognition using different feature de-
scriptors in the probe and gallery modalities, and
4. naturally extends to additional HFR scenarios due to
properties 2 and 3 above.
2RELATED WORK
2.1 Heterogeneous Face Recognition
A flurry of research has emerged providing solutions to
various heterogeneous face recognition problems. This
began with sketch recognition using viewed sketches,
1
and has continued into other modalities such as near-
infrared (NIR) and forensic sketches. In this section, we will
highlight a representative selection of studies in hetero-
geneous face recognition as well as studies that use kernel-
based approaches for classification.
Tang et al. spearheaded the work in heterogeneous face
recognition with several approaches to synthesize a sketch
from a photograph (or vice versa) [3], [4], [5]. Tang and
Wang initially proposed an eigen-transformation method
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. X, XXXXXXX 2013 1
. B.F. Klare is with Noblis, 3150 Fairview Park Drive, Falls Church, VA
22042. E-mail: brendan.klare@noblis.org.
. A.K. Jain is with the Department of Computer Science and Engineering,
Michigan State University, Room 3115, 428 S. Shaw Lane, Engineering
Building, East Lansing, MI 48824-1226. E-mail: jain@cse.msu.edu.
Manuscript received 18 Dec. 2011; revised 23 July 2012; accepted 16 Sept.
2012; published online 12 Oct. 2012.
Recommended for acceptance by M. Tistarelli.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2011-12-0905.
Digital Object Identifier no. 10.1109/TPAMI.2012.229.
1. A viewed sketch is a facial sketch drawn while viewing a photograph
of the subject. The scenario is not practical because the photograph itself
could be queried in the FR system.
0162-8828/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society

[3]. Later, Liu et al. performed the transformation using
local linear embedding to estimate the corresponding photo
patch from a sketch patch [4]. Wang and Tang proposed a
Markov random field model for converting a sketch into a
photograph [5]. Other synthesis methods have been
proposed as well [6], [7]. The generative transformation-
based approaches have generally been surpassed in
performance by discriminative feature-based approaches.
A key advantage of synthesis methods is that once a sketch
has been converted to a photograph, matching can be
performed using existing face recognition algorithms. The
proposed prototype framework is similar in spirit to these
methods in that no direct comparison between face images
in the probe and gallery modalities is needed.
A number of discriminative feature-based approaches to
HFR have been proposed [8], [9], [10], [11], [12] which have
shown good matching accuracies in both the sketch and
NIR domains. These approaches first represent face images
using local feature descriptors, such as variants of local
binary patterns (LBPs) [13] and SIFT descriptors [14]. Liao
et al. first used this approach on NIR to VIS face recognition
by processing face images with a difference of Gaussian
(DoG) filter, and encoding them using multiblock local
binary patterns (MB-LBPs). Gentle AdaBoost feature selec-
tion was used in conjunction with R-LDA to improve the
recognition accuracy. Klare and Jain followed this work on
NIR to VIS face recognition by also incorporating SIFT
feature descriptors and an RS-LDA scheme [10]. Bhatt et al.
introduced an extended uniform circular local binary
pattern to the viewed sketch recognition scenario [11].
Klare et al. encoded both viewed sketches and forensic
sketches using SIFT and MLBP feature descriptors, and
performed lo cal feature-bas ed discrimin ant analy sis
(LFDA) to improve the recognition accuracy [9]. Yi et al.
[15] offered a local patch-based method to perform HFR on
partial NIR face images. Zhang et al. extracted local features
and performed recognition between sketches and photos
using coupled information-theoretic encoding [16]. Lei and
Li applied coupled spectral regression (CSR) for NIR to VIS
recognition [12]. In [12], CSR was extended to Kernel CSR,
which is similar to the proposed prototype representation
in this work.
The synthesis method by Li et al. is the only known
method to perform recognition between thermal IR and
visible face images [17]. The only method to perform
recognition between forensic sketches and visible face
images is Klare et al. [9], which is also one of two methods,
to our knowledge, that has been tested on two different
HFR scenarios (viewed sketch and forensic sketch). The
other method is Lin and Tang’s [18] common discriminant
recognition framework, which was applied to viewed
sketches and near-infrared images. In this work, the
proposed prototype random subspace (P-RS) framework
is tested on four different HFR scenarios.
2.2 Kernel Pr ototype Representation
The core of the proposed approach involves using a
relational feature representation for face images (illustrated
in Fig. 2). By using kernel similarities between a novel face
pattern and a set of prototypes, we are able to exploit the
kernel trick [19], which allows us to generate a high
dimensional, nonlinear representation of a face image using
compact feature vectors.
The benefit of a prototype-based approach is provided
by Balcan et al. [19]. Given access to the data distribution
and a kernel similarity function, a prototype representation
2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. X, XXXXXXX 2013
Fig. 1. Example images from each of the four heterogenous face recognition scenarios tested in our study. The top row contains probe images from
(a) near-infrared, (b) thermal infrared, (c) viewed sketch, and (d) forensic sketch modalities. The bottom row contains the corresponding gallery
photograph (visible band face image, called VIS) of the same subject.
Fig. 2. The proposed face recognition method describes a face as a
vector of kernel similarities to a set of prototypes. Each prototype has
one face image in the probe and gallery modalities.

is shown to approximately maintain the desired properties
of the high-dimensional kernel space in a more efficient
representation by using the kernel trick. While it is not
common to refer to kernel methods as prototype represen-
tations, in this work we emphasize the fact that kernel
methods use a training set of images (which serve as
prototypes) to implicitly estimate the distribution of the
nonlinear feature space. One key to our framework is that
each prototype has a pattern for each image modality.
The proposed kernel prototype approach is similar to the
object recognition method of Quattoni et al. [20]. Kernel
PCA [21] and Kernel LDA [22], [23] approaches to face
recognition have used a similar approach, where a face is
represented as the kernel si milarity t o a collection of
prototype images in a high-dimensional space. The bio-
metric indexing scheme by Gyaourova and Ross used
similarity scores to a fixed set of references in the face and
fingerprint modality [24].
These prior works differ from the proposed method
because only a single prototype is used per training subject.
By contrast, our approach is designed for heterogeneous
face recognition, and uses two prototype images per subject
(one per modality). Our earlier work [25] utilized a similar
approach that did not exploit the benefit of nonlinear
kernels, but did use a separate pattern from each image
modality (sketch and photo) for each prototype. The kernel
coupled spectral regression by Lei and Li used a similar
approach of representing heterogeneous face images as
nonlinear similarities to a set of prototypes [12].
2.3 Proposed Method
The proposed method presents a new approach to hetero-
geneous face recognition, and extends existing methods in
face recognition. The use of a nonlinear similarity repre-
sentation is well suited to the HFR problem because a set of
training subjects with an image from each modality can be
used as the prototypes and, depending on the modality of a
new imag e (probe or gallery), the image f rom each
prototype subject can be selected from the corresponding
modality. Unlike previous feature-based methods, where an
image descriptor invariant to changes between the two HFR
modaliti es was needed, the proposed framework only
needs descriptors that are effective within each domain.
Further, the proposed method is effective even when
different feature descriptors are used in the probe and
gallery domains. The proposed prototype framework is
described in detail in Section 4.
The accuracy of the HFR system is improved using a
random subspace framework in conjunction with linear
discriminant analysis (LDA), as described in Section 5. The
previous (or baseline) method of feature-based random
subspaces [10] is revisited in Section 6. Experimental results
on four different heterogeneous face recognition scenarios
(thermal, near-infrared, viewed sketch, and forensic sketch)
are provided in Section 7, and all the results are bench-
marked with a commercial face matcher.
While we demonstrate the strength of the proposed
framework on many different HFR scenarios, the parameters
controlling the framework are the same across all tested
scenarios. This shows that the contribution of this work is a
generic framework for improving solutions to the general
HFR problem. Future use of the proposed framework will
benefit from selecting parameters tailored to a specific
scenario; however, that is beyond the scope of this work.
3IMAGE PREPROCESSING AND REPRESENTATION
All face images are initially represented using a feature-
based representation. The use of local feature descriptors has
been argued to closely resemble the postulated representa-
tion of the human visual processing system [26], and they
have been shown to be well suited for face recognition [27].
3.1 Geometric Normalization
The first step in representing face images using feature
descriptors is to geometrically normalize the face images
with respect to the location of the eyes. This step reduces
the effect of scale, rotation, and translation variations. The
eye locations for the face images from all modalities are
automatically estimated using Cognitec’s FaceVACS SDK
[28]. The only exceptions are the thermal face images where
the eyes are manually located for both the proposed method
and the FaceVACS baseline.
Face images are geometrically normalized by 1) perform-
ing planar rotation to set the angle between the eyes to
0 degrees, 2) scaling the images so that the distance between
the two pupils is 75 pixels, and 3) cropping the images to a
height of 250 pixels and a width of 200 pixels, with the eyes
horizontally centered and vertically placed at row 115.
3.2 Image Filtering
Face images are filtered with three different image filters.
These filters are intended to help compensate for both
intensit y variations within an image domain (such as
nonuniform ill umination changes), as well appearance
variations between image domains. The second aspect is
of particular importance for the direct random subspace
(D-RS) framework (see Section 6). An example of the effects
of each image filter can be seen in Fig. 3.
The three image filters used are as follows.
3.2.1 Difference of Gaussian
A difference of Gaussian image filter has been shown by
Tan and Triggs to improve face recognition performance in
KLARE AND JAIN: HETEROGENEOU S FACE RECOGNITION USING KERNEL PROTOTYPE SIMILARITIES 3
Fig. 3. Example of thermal probe and visible gallery images after being
filtered by a difference of Gaussian, center surround divisive normal-
ization, and Gaus sian image filter . The SIFT and MLBP feature
descriptors are extracted from the filtered images, and kernel similarities
are computed within this image descriptor representation.

the presence of varying illumination [29], as well as in an
NIR to VIS matching scenario by Liao et al. [8]. A difference
of Gaussian image is generated by convolving an image
with a filter obtained by subtracting a Gaussian filter of
width
1
from a Gaussian filter of width
2
(
2
>
1
). In this
paper,
1
¼ 2 and
2
¼ 4.
3.2.2 Center-Surround Divisive Normalization (CSDN)
Meyers and Wolf [30] introduced the center-surround
divisive normalization filter in conjunction with t heir
biologically inspired face recognition framework. The CSDN
filter divides the value of each pixel by the mean pixel value
in the s s neighborhood surrounding the pixel. The
nonlinear nature of the CSDN filter is seen as a compliment
to the DoG filter. In our implementation, s ¼ 16.
3.2.3 Gaussian
The Gaussian smoothing filter has long been used in image
processing applications to remove noise contained in high
spatial frequencies while retaining the remainder of the
signal. The width of the filter used in our implementation
was ¼ 2.
3.3 Local Descriptor Representation
Once an image is geometrically normalized and filtered
using one of the three filters, local feature descriptors are
extracted from uniformly distributed patches across the
face. In this work, we use two different feature descriptors
to represent the face image: the SIFT descriptor [14] and
Local Binary Patterns [13]. The SIFT feature descriptor has
been used effectively in face recognition [27], sketch to VIS
matching [9], and NIR to VIS matching [10]. LBP features
have a longer history of successful use in face recognition.
Ahonen et al. o riginally proposed their use for face
recognition [31], Li et al. demonstrated their use in NIR to
NIR face matching [32], and they have also been success-
fully applied to several HFR scenarios [8], [9], [10], [11].
The SIFT and LBP feature representations are effective in
describing face images due to their ability to encode the
structure of the face and their stability in the presence of
minor external variations [27]. Each feature descriptor
describes an image patch as a d-dimensional vector that is
normalized to sum to one. The face image is divided into a
set of N overlapping patches of size 32 32. Each patch
overlaps its vertical and horizontal neighbors by 16 pixels.
With a face image of size 200 250, this results in a total of
154 total patches.
Multiscale local binary patterns (MLBP) [9], a variant of
the LBP descriptor, is used in place of LBP in this work.
MLBP is the concatenation of LBP feature descriptors with
radii r ¼f1; 3; 5; 7g.
Let I be a (normalized and filtered) face image. Let
f
F;D
ðI;aÞ denote the local feature descriptor extracted from
image I at patch a, 1 a N, using image filter F and
feature descriptor D. The DoG, CSDN, and Gaussian image
filters are, respectively, referred to as F
d
, F
c
, and F
g
. The
MLBP and SIFT descriptors are, respectively, referred to as
D
m
and D
s
. Using 16 histograms and 8 orientation bins, as
described by Lowe [14], the SIFT descriptor yields a 128D
feature descriptor. Using uniform patterns at eight sampling
locations, as decribed by Ojala et al. [13], the LBP descriptor
yields a 59D feature descriptor. This results in a 236D MLBP
feature descriptor ( f
F;D
m
ðI;aÞ2IR
236
). Finally, we have
f
F;D
ðIÞ¼½f
F;D
ðI;1Þ
T
; ...;f
F;D
ðI;NÞ
T
T
; ð1Þ
which is the concatenation of all N feature descriptors.
Thus, f
F;D
s
ðIÞ2IR
128N
and f
F;D
m
ðIÞ2IR
236N
.
Using the three filters and two descriptors, we have six
different representations available for face image I, namely,
f
F
d
;D
m
ðIÞ, f
F
c
;D
m
ðIÞ, f
F
g
;D
m
ðIÞ, f
F
d
;D
s
ðIÞ, f
F
c
;D
s
ðIÞ, and f
F
g
;D
s
ðIÞ.
4HETEROGENEOUS PROTOTYPE FRAMEWORK
4.1 Prototype Representation
The heterogeneous prototype framework begins with
images from the probe and gallery modalities represented
by (possibly different) feature descriptors for each of the
N image patches, as described in the previous section. For
compactness, let fðIÞ represent f
F;D
ðIÞ. The similarity
between two images is measured using a kernel function
k : fðIÞ x fðIÞ!IR .
Let T be a set of training images consisting of n
t
subjects.
The training set contains a probe image P
i
and gallery
image G
i
for each of the n
t
subjects. That is,
fP
1
;G
1
; ...;P
n
t
;G
n
t
g; ð2Þ
For both the probe and gallery modalities, two positive
semi-definite kernel matrices K
P
and K
G
are computed
between the training subjects. The probe kernel matrix is
K
P
2 IR
n
t
;n
t
, and the gallery kernel matrix is K
G
2 IR
n
t
;n
t
.
The entries in the ith row and jth column of K
P
and K
G
are
K
P
ði; jÞ¼kðfðP
i
Þ fðP
j
ÞÞ; ð3Þ
K
G
ði; jÞ¼kð fðG
i
Þ fðG
j
ÞÞ; ð4Þ
where kð; Þ is the kernel similarity function. Results in all
experiments in this work use the cosine kernel function:
kðfðP
i
Þ;fðG
i
ÞÞ ¼
hfðP
i
Þ;fðG
i
Þi
kfðP
i
Þk kfðG
i
Þk
: ð5Þ
The cosine kernel was chosen because it resulted in
consistently higher accuracy on all tested scenarios com-
pared to the radial basis function kernel and the polynomial
kernel. Additionally, we preferred the cosine kernel because
it is devoid of parameters.
Let P and G, respectively, be test probe and gallery face
images, i.e., ðP;G 62TÞ. The function
P
ðP Þ returns a vector
containing the kernel similarity of image P to each image P
i
in T . For gallery image G,
G
ðGÞ returns a vector of kernel
similarities to the gallery prototypes G
i
. Thus, face images
are represented as the relational vector
P
ðP Þ2IR
n
t
for a
probe image and
G
ðGÞ2IR
n
t
for a gallery image. More
precisely, we have
P
ðP Þ¼½kðfðP Þ;fðP
1
ÞÞ; ...;kðfðP Þ;fðP
n
t
ÞÞ
T
; ð6Þ
G
ðGÞ¼½kðfðGÞ;fðG
1
ÞÞ; ...;kðfðGÞ;fðG
n
t
ÞÞ
T
: ð7Þ
Using this prototype-based representation, extreme
inputs to the system (e.g., a nonface image) will cause the
4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. X, XXXXXXX 2013

kernel similarity to degenerate to the kernel minimum (0 in
the case of the cosine kernel). This allows the system to
remain stable with respect to scale.
Additionally, because the feature vectors
P
ðP Þ and
G
ðGÞ are a measure of the similarity between the test image
and the prototype training images, the feature spaces for
similarity computation do not have to be the same for the
probe and gallery modalities. For example, the probe images
could be represented using F
F;D
s
ðP Þ and the gallery images
could be represented using F
F;D
m
ðGÞ. Despite the fact that
the SIFT and MLBP feature descriptors are heterogeneous
features, the relational representation allows them to be
represented in a common feature space. This is based on the
assumption that
kðfðP Þ;fðP
i
ÞÞ kðfðGÞ;fðG
i
ÞÞ: ð8Þ
We will next introduce a discriminant subspace technique
to project these prototype features into a linear subspace that
better satisfies (8). When necessary, the tersely presented
notation of
P
ðIÞ or
G
ðIÞ will be expanded to the more
verbose notation
F;D
P
ðIÞ or
F;D
G
ðIÞ, respectively, in order to
specify which feature descriptor and image filter is initially
being used to represent the image I. For example,
F
c
;D
s
P
ðIÞ
denotes the prototype similarity of image I when repre-
sented using the CSDN image filter and SIFT descriptors.
4.2 Discriminant Analysis
After representing the images in the training set T in the
aforementioned prototype representation, we next learn
linear subspaces using linear discriminant analysis [33] to
enhance the discriminative capabilities of the prototype
representation ðÞ. LDA (and its variants) has consistently
demonstrated its ability to improve the accuracy of various
recognition algorithms through feature extraction and
dimensionality reduction. The benefits of LDA in the
context of face recognition have been demonstrated on
image pixel representations [33], [34], Gabor features [35],
and image descriptors [8], [9].
We learn the linear projection matrix W by following the
conventional approach for high-dimensional data, namely,
by first applying PCA, followed by LDA [33]. In all
experiments, the PCA step was used to retain 99.0 percent
of the variance. Let X be a matrix whose columns contain
the prototype representation of each image in T :
X ¼½
P
ðP
0
1
Þ;
G
ðG
0
1
Þ; ...;
P
ðP
0
n
t
Þ;
G
ðG
0
n
t
Þ: ð9Þ
Let X
0
denote the mean-centered version of X. The initial
step involves learning the subspace projection matrix W
0
1
by
performing principal component analysis (PCA) on X
0
to
reduce the dimensionality of the feature space. Next, the
within-class and between-class scatter matrices of W
0
1
T
X
0
(respectively), S
W
and S
B
, are computed. The dimension of
the subspace W
0
1
is such that S
W
will be of full rank. The
scatter matrices are built using each subject as a class; thus
one image from the probe and gallery modality represents
each class. A more detailed description of how to compute
S
W
and S
B
is described in [9]. Last, the matrix W
0
2
is learned
by solving the generalized eigenvalue problem:
S
B
W
0
2
¼ S
W
W
0
2
: ð10Þ
This yields the LDA projection matrix W , where
W ¼
W
0
2
T
W
0
1
T
T
: ð11Þ
Letting denote the mean of X, the final representation for
an unseen probe or gallery image I using the prototype
framework is W
T
ððIÞÞ. Subsequent uses of W in this
work will assume the appropriate removal of the mean
from ðIÞ for terseness.
5RANDOM SUBSPACES
5.1 Motivation
The proposed heterogeneous prototype framework uses
training data to define the prototypes and to learn the linear
subspace projection matrix W . This requirement on training
data raises two (somewhat exclusive) issues in the prototype
representation framework. The first issue is that the number
of subjects in T (i.e., the number of prototypes) is generally
too small for an expressive prototype representation. While
Balcan et al. demonstrated that the number of prototypes
does not need to be large to approximately replicate the data
distribution [19], their applications primarily dealt with
binary classification and a small number of features. When
applying a prototype representation to face recognition, a
large number of classes (or subjects) and features are
present. The small sample size problem implies that the
number of prototypes needed to approximate the under-
lying data distribution should be large [36].
The second issue is also related to the small sample size
problem [36]. This common problem in face recognition
arises from too few training subjects to learn model
parameters that are not susceptible to generalization errors.
In the heterogeneous prototype framework this involves
learning a W matrix that generalizes well.
A number of solutions exist to tackle the small sample
size problem in face recognition. Most are designed to
handle deficiencies in the subspace W, such as dual-space
LDA [34] and direct LDA [37]. Regularization methods such
as R-LDA [38] also address degenerative properties of W.
However, these methods do not address the issue of too few
prototypes for an expressive representation.
Another approach to handle deficiencies in learning
parameters is the use of random subspaces [39]. The
random subspace method samples a subset of features
and performs training i n this reduced feature space.
Multiple sets (or bags) of randomly sampled features are
generated, and for each bag the parameters are learned.
This approach is similar to the classical bagging classifica-
tion scheme [40], where the training instances are randomly
sampled into bags multiple times and training occurs on
each bag se parately. Ensemble methods such as Ho’s
random subspaces [39] and Breiman’s bagging classifiers
have been demonstrated to increase the generalization of an
arbitrary classifier [41].
Wang and Tang demonstrated the effectiveness of
random sampling LDA (RS-LDA) for face recognition
[42]. Their approach combined random subspaces and
bagging by sampling both features and training instances.
For each random sample space, a linear subspace was
learned. Klare and Jain utilized this approach in the HFR
KLARE AND JAIN: HETEROGENEOU S FACE RECOGNITION USING KERNEL PROTOTYPE SIMILARITIES 5

Citations
More filters
Journal ArticleDOI
TL;DR: A large scale performance evaluation for texture classification, empirically assessing forty texture features including thirty two recent most promising LBP variants and eight non-LBP descriptors based on deep convolutional networks on thirteen widely-used texture datasets.

304 citations


Cites background or methods from "Heterogeneous Face Recognition Usin..."

  • ...LBP and SIFT LBP and SIFT Face Recognition 2012 [99]...

    [...]

  • ...In addition to HOG and LBP, Hussain and Triggs [98] used LTP. Klare and Jain [99] exploited the combination of LBP and SIFT for heterogeneous face recognition....

    [...]

  • ...Klare and Jain [99] exploited the combination of LBP and SIFT for heterogeneous face recognition....

    [...]

Journal ArticleDOI
TL;DR: Wasserstein convolutional neural network (WCNN) as discussed by the authors was proposed to learn invariant features between near-infrared (NIR) and visual (VIS) face images, and the Wasserstein distance was introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions.
Abstract: Heterogeneous face recognition (HFR) aims at matching facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR presents more challenging issues than traditional face recognition because of the large intra-class variation among heterogeneous face images and the limited availability of training samples of cross-modality face image pairs. This paper proposes the novel Wasserstein convolutional neural network (WCNN) approach for learning invariant features between near-infrared (NIR) and visual (VIS) face images (i.e., NIR-VIS face recognition). The low-level layers of the WCNN are trained with widely available face images in the VIS spectrum, and the high-level layer is divided into three parts: the NIR layer, the VIS layer and the NIR-VIS shared layer. The first two layers aim at learning modality-specific features, and the NIR-VIS shared layer is designed to learn a modality-invariant feature subspace. The Wasserstein distance is introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. W-CNN learning is performed to minimize the Wasserstein distance between the NIR distribution and the VIS distribution for invariant deep feature representations of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected WCNN layers to reduce the size of the parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at the training stage and an efficient computation for heterogeneous data at the testing stage. Extensive experiments using three challenging NIR-VIS face recognition databases demonstrate the superiority of the WCNN method over state-of-the-art methods.

231 citations

Journal Article
TL;DR: Wang et al. as mentioned in this paper proposed a Common Discriminant Feature Extraction (CDFE) algorithm for inter-modality face recognition, where two transforms are simultaneously learned to transform the samples in both modalities respectively to the common feature space.
Abstract: Recently, the wide deployment of practical face recognition systems gives rise to the emergence of the inter-modality face recognition problem. In this problem, the face images in the database and the query images captured on spot are acquired under quite different conditions or even using different equipments. Conventional approaches either treat the samples in a uniform model or introduce an intermediate conversion stage, both of which would lead to severe performance degradation due to the great discrepancies between different modalities. In this paper, we propose a novel algorithm called Common Discriminant Feature Extraction specially tailored to the inter-modality problem. In the algorithm, two transforms are simultaneously learned to transform the samples in both modalities respectively to the common feature space. We formulate the learning objective by incorporating both the empirical discriminative power and tlie local smoothness of the feature transformation. By explicitly controlling the model complexity through the smoothness constraint, we can effectively reduce the risk of overfitting and enhance the generalization capability. Furthermore, to cope with the nongaussian distribution and diverse variations in the sample space, we develop two non-linear extensions of the algorithm: one is based on kernelization, while the other is a multi-mode framework. These extensions substantially improve the recognition performance in complex situation. Extensive experiments are conducted to test our algorithms in two application scenarios: optical image-infrared image recognition and photo-sketch recognition. Our algorithms show excellent performance in the experiments.

174 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel framework that employs modality-specific networks to tackle with the heterogeneous matching problem and demonstrates that the MSR effectively improves the performance of deep networks on VI-REID and remarkably outperforms the state-of-the-art methods.
Abstract: Traditional person re-identification (re-id) methods perform poorly under changing illuminations. This situation can be addressed by using dual-cameras that capture visible images in a bright environment and infrared images in a dark environment. Yet, this scheme needs to solve the visible-infrared matching issue, which is largely under-studied. Matching pedestrians across heterogeneous modalities is extremely challenging because of different visual characteristics. In this paper, we propose a novel framework that employs modality-specific networks to tackle with the heterogeneous matching problem. The proposed framework utilizes the modality-related information and extracts modality-specific representations (MSR) by constructing an individual network for each modality. In addition, a cross-modality Euclidean constraint is introduced to narrow the gap between different networks. We also integrate the modality-shared layers into modality-specific networks to extract shareable information and use a modality-shared identity loss to facilitate the extraction of modality-invariant features. Then a modality-specific discriminant metric is learned for each domain to strengthen the discriminative power of MSR. Eventually, we use a view classifier to learn view information. The experiments demonstrate that the MSR effectively improves the performance of deep networks on VI-REID and remarkably outperforms the state-of-the-art methods.

162 citations


Cites background from "Heterogeneous Face Recognition Usin..."

  • ...Early studies [49]–[51] design hand-crafted features to represent cross-modal images and use machine learning approaches to minimize the cross-modality gap....

    [...]

Journal ArticleDOI
TL;DR: A Bayesian framework for face sketch synthesis is proposed, which provides a systematic interpretation for understanding the common properties and intrinsic difference in different methods from the perspective of probabilistic graphical models.
Abstract: Exemplar-based face sketch synthesis has been widely applied to both digital entertainment and law enforcement. In this paper, we propose a Bayesian framework for face sketch synthesis, which provides a systematic interpretation for understanding the common properties and intrinsic difference in different methods from the perspective of probabilistic graphical models. The proposed Bayesian framework consists of two parts: the neighbor selection model and the weight computation model. Within the proposed framework, we further propose a Bayesian face sketch synthesis method. The essential rationale behind the proposed Bayesian method is that we take the spatial neighboring constraint between adjacent image patches into consideration for both aforementioned models, while the state-of-the-art methods neglect the constraint either in the neighbor selection model or in the weight computation model. Extensive experiments on the Chinese University of Hong Kong face sketch database demonstrate that the proposed Bayesian method could achieve superior performance compared with the state-of-the-art methods in terms of both subjective perceptions and objective evaluations.

141 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

16,118 citations

Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations


"Heterogeneous Face Recognition Usin..." refers methods in this paper

  • ...A difference of Gaussian image is generated by convolving an image with a filter obtained by subtracting a Gaussian filter of width 1 from a Gaussian filter of width 2 ( 2 > 1)....

    [...]

Journal ArticleDOI
TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Abstract: We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed "Fisherface" method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases.

11,674 citations


"Heterogeneous Face Recognition Usin..." refers background in this paper

  • ...The DoG, CSDN, and Gaussian image filters are, respectively, referred to as Fd, Fc, and Fg....

    [...]

  • ...For example, Fc;DsP ðIÞ denotes the prototype similarity of image I when represented using the CSDN image filter and SIFT descriptors....

    [...]

  • ...D-RS will be used in conjunction with the six filter/ descriptor representations presented in Section 3 (SIFT+DoG, MLBP+CSDN, etc.)....

    [...]

  • ...The nonlinear nature of the CSDN filter is seen as a compliment to the DoG filter....

    [...]

  • ...The CSDN filter divides the value of each pixel by the mean pixel value in the s s neighborhood surrounding the pixel....

    [...]

Journal ArticleDOI
Tin Kam Ho1
TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
Abstract: Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

5,984 citations

Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Heterogeneous face recognition using kernel prototype similarities" ?

The accuracy of this nonlinear prototype representation is improved by projecting the features into a linear discriminant subspace.