scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Matching Forensic Sketches to Mug Shot Photos

01 Mar 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 33, Iss: 3, pp 639-646
TL;DR: Compared to a leading commercial face recognition system, LFDA offers substantial improvements in matching forensic sketches to the corresponding face images and leads to state-of-the-art accuracys when matching viewed sketches.
Abstract: The problem of matching a forensic sketch to a gallery of mug shot images is addressed in this paper. Previous research in sketch matching only offered solutions to matching highly accurate sketches that were drawn while looking at the subject (viewed sketches). Forensic sketches differ from viewed sketches in that they are drawn by a police sketch artist using the description of the subject provided by an eyewitness. To identify forensic sketches, we present a framework called local feature-based discriminant analysis (LFDA). In LFDA, we individually represent both sketches and photos using SIFT feature descriptors and multiscale local binary patterns (MLBP). Multiple discriminant projections are then used on partitioned vectors of the feature-based representation for minimum distance matching. We apply this method to match a data set of 159 forensic sketches against a mug shot gallery containing 10,159 images. Compared to a leading commercial face recognition system, LFDA offers substantial improvements in matching forensic sketches to the corresponding face images. We were able to further improve the matching performance using race and gender information to reduce the target gallery size. Additional experiments demonstrate that the proposed framework leads to state-of-the-art accuracys when matching viewed sketches.

Summary (3 min read)

Introduction

  • Coronary artery disease (CAD) causes about 45% of all deaths in Europe.
  • Therefore, the main aim of exercisebased cardiac rehabilitation is to improve peak VO2.
  • The most common training programmes currently used in cardiac rehabilitation are continuous training (CT) and interval training (IT) or a mixture.
  • A decade ago, it was shown that IT, consisting of several bouts of exercise at a higher intensity (80–90% of the peak VO2) alternated by active recovery periods, may provide a safe alternative to CT in CAD patients.

Participants

  • The authors study included 20 male CAD patients (mean age 62.4 6.1), referred to the Cardiac Rehabilitation Unit from the University Hospitals of Leuven between July 2014–November 2015.
  • The study was approved by the local ethics committee (Commissie Medische Ethiek KU Leuven).
  • After inclusion, two patients dropped out before any tests were performed: one patient because of an old knee injury and one due to severe arrhythmias at high training intensities.
  • There were no significant differences between the baseline characteristics of the patients with or without the dropouts (data not shown).

Measurements

  • Height (cm) and weight (kg) were measured, and body mass index (BMI) (kg/m2) was calculated by dividing weight (kg) by height squared (m2).
  • Before starting the cardiac rehabilitation, subjects performed a maximal graded exercise test on a cycle ergometer (Oxycon Pro, Jaeger, CareFusion, Germany) supervised by a trained exercise physiologist.
  • The peak HR was defined as the highest HR reached at the end of the test.
  • The peak VO2 was determined as the at KU Leuven University Library on April 21, 2016cpr.sagepub.comDownloaded from VO2 during the last full bout of 30 s of the test.
  • A peak respiratory exchange ratio of at least 1.10 defined a maximal effort.28 Individual peak VO2 results were compared with predicted reference values of Wasserman et al. to determine percentage of predicted peak VO2.

IT and CT familiarisation sessions

  • After inclusion, subjects were randomly assigned on a 1:1 base to either training group 1 (starting with IT) or group 2 (starting with CT).
  • They performed a total of six supervised IT (n¼ 3) and CT (n¼ 3) sessions on a cycle ergometer (Ergo-fit, Gymna, Brussels, Belgium), three times per week for two weeks.
  • The IT and CT sessions were alternated and aimed to familiarise the subjects with both training programmes.
  • A Garmin chest strap and wristwatch continuously monitored HR (Garmin, Garmin International, Kansas, USA), and training loads were adapted throughout the training to ensure that participants would remain within the prescribed HR zones.

IT and CT test sessions

  • The two-week run-in period was followed by four test sessions within two weeks: the IT and CT according to the protocol of the SAINTEX-CAD study,25 which was based on Wisloff et al.
  • An online randomisation procedure was performed to determine the sequence of the four test sessions.
  • During the test sessions, HR was continuously monitored using a Polar chest strap (Polar, Polar Electro, Kempele, Finland).
  • As beta-blocking medication might influence HR, the authors also asked their patients to take their medication at the same time each morning of the test.
  • Breath by breath gas exchange measurements (VE, VO2, VCO2), averaged each 30 s, were provided by the Oxycon mobile device (Oxycon mobile Jaeger, CareFusion, Germany) and were used to calculate the EE (indirect calorimetry) according to the American College of Sports Medicine (ACSM) guidelines; one litre of O2 uptake equals 5 kcal.

Statistical analysis

  • All values are expressed as mean standard deviation (SD), median and range, or as number and percentage.
  • All data were normally distributed according to the Shapiro-Wilk test for normality.
  • To examine the differences in EE, exercise intensity or lactate between the ITw, CTw, ITc and CTc session, a repeated measures analysis of variance was performed.
  • A p-value< 0.05 was considered statistically significant.

Results

  • A total of 18 patients completed the six training sessions and the four tests; baseline characteristics are presented in Table 1.
  • Patients did not change medication during the study period.
  • In Figure 4 and Table 2, the mean EE per training session is presented.
  • The patients performed the tests within the prescribed HR zones (taking into account the end-interval values for the IT sessions) as shown in Table 2 and in Figure 5.

Discussion

  • This is the first study to objectively measure EE during IT and CT.
  • Therefore in previous studies, training sessions were designed to be isocaloric based on a theoretical calculation of their respective EE.
  • The present study examines whether the IT and CT programmes described in the protocol of the SAINTEX-CAD study25 (based on a study of Wisloff et al.: ITw and CTw),22 and the actually achieved intensities in the SAINTEX-CAD study (ITc and CTc),10 were truly isocaloric by objectively measuring the EE.
  • (a) a similar EE for the protocols of Wisloff et al, also known as The authors results showed.
  • (ITw vs CTw); (b) a significantly higher EE for the CT of the SAINTEX-CAD study compared to the IT (ITc vs CTc); (c) a significant increase in lactate after ITw, ITc and CTc, but not Table 1.

Medication

  • Nevertheless, most of the patients had difficulties maintaining this high HR zone and needed constant encouragement from the supervising exercise physiologist.
  • According to Skinner’s three-phase model, training modalities below the first ventilatory threshold (60– 70% peak HR) do not exceed a 2mmol/l lactate level.
  • When lactate levels exceed 4mmol/l, passing the second ventilatory threshold, the exercise is considered anaerobic (>90% peak HR).

Training intensity

  • At KU Leuven University Library on April 21, 2016cpr.sagepub.comDownloaded from between the lactate levels of the CT and IT, with IT showing lactate values> 4mmol/l.
  • This implies that the high intensity intervals may have been performed anaerobically for most of the individuals (ITw 14/18; ITc 13/18), but not in all, since this secondary threshold is highly individual.
  • The frequently used term ‘aerobic’ interval training may thus be inappropriate and misleading, but further research is needed.
  • The authors can however conclude that the participants remained in the aerobic zone for both CT sessions (CTw 2.45mmol/l and CTc 3.41mmol/l).
  • From the lactate levels of the CTw session, the authors can conclude that an intensity set at 70–75% of peak HR is simply insufficient as a training stimulus.

Limitations

  • The results of this study should be interpreted within the context of its limitations.
  • The first limitation of their study consists of a relatively small sample size.
  • The authors expect, however, these variations to be random across all subjects, which should, therefore, not significantly influence their final result.
  • As the authors previously reported that substantial changes in peak VO2 at KU Leuven University Library on April 21, 2016cpr.sagepub.comDownloaded from and peak HR already occur after six weeks of training,10 this run-in period could have influenced the high VO2 levels achieved at prescribed HR.
  • Fifth, the authors did not measure lactate levels during the maximal exercise test, hence they were unable to compare these individual maximal lactate levels to the lactate levels found during the tests.

Conclusion

  • The authors found that CT according to the SAINTEX-CAD study expended significantly more energy than IT.
  • As previous studies showed IT and CT to either yield similar improvement in peak VO2, or a larger improvements after IT compared to CT, the authors can conclude that IT is more efficient in improving peak VO2, producing a larger gain during a shorter training duration and at a lower energy cost.
  • Since EE is not a goal on its own in cardiac rehabilitation, the main question remains which training modality is the most appropriate to increase peak VO2.
  • The authors can suggest that IT and CT are equally effective, if CT sessions are performed at sufficiently high intensities.
  • The authors stress the importance of objectively measuring actual caloric expenditure by indirect calorimetry in pilot studies, rather than using a general formula for setting up isocaloric exercise training programmes.

Did you find this useful? Give us your feedback

Figures (8)

Content maybe subject to copyright    Report

Matching Forensic Sketches
to Mug Shot Photos
Brendan F. Klare, Student Member, IEEE,
Zhifeng Li, Member, IEEE,and
Anil K. Jain, Fellow, IEEE
Abstract—The problem of matching a forensic sketch to a gallery of mug shot
images is addressed in this paper. Previous research in sketch matching only
offered solutions to matching highly accurate sketches that were drawn while
looking at the subject (viewed sketches). Forensic sketches differ from viewed
sketches in that they are drawn by a police sketch artist using the description of the
subject provided by an eyewitness. To identify forensic sketches, we present a
framework called local feature-based discriminant analysis (LFDA). In LFDA, we
individually represent both sketches and photos using SIFT feature descriptors
and multiscale local binary patterns (MLBP). Multiple discriminant projections are
then used on partitioned vectors of the feature-based representation for minimum
distance matching. We apply this method to match a data set of 159 forensic
sketches against a mug shot gallery containing 10,159 images. Compared to a
leading commercial face recognition system, LFDA offers substantial
improvements in matching forensic sketches to the corresponding face images.
We were able to further improve the matching performance using race and gender
information to reduce the target gallery size. Additional experiments demonstrate
that the proposed framework leads to state-of-the-art accuracys when matching
viewed sketches.
Index Terms—Face recognition, forensic sketch, viewed sketch, local feature
discriminant analysis, feature selection, heterogeneous face recognition.
Ç
1INTRODUCTION
PROGRESS in biometric technology has provided law enforcement
agencies additional tools to help determine the identity of
criminals. In addition to DNA and circumstantial evidence, if a
latent fingerprint i s found at an investigative scene or a
surveillance camera captures an image of a suspect’s face, then
these cues may be used to help determine the culprit’s identity
using automated biometric identification. However, many crimes
occur where none of this information is present, but instead an
eyewitness account of the crime is available. In these circum-
stances, a forensic artist is often used to work with the witness in
order to draw a sketch that depicts the facial appearance of the
culprit according to the verbal description. Once the sketch image
of the transgressor is complete, it is then disseminated to law
enforcement officers and media outlets with the hopes of someone
knowing the suspect. These sketches are known as forensic
sketches and this paper describes a robust method for matching
forensic sketches to large mug shot (image) databases maintained
by law enforcement agencies.
Two different types of face sketches are discussed in this paper:
viewed sketches and forensic sketches (see Fig. 1). Viewed sketches are
sketches that are drawn while viewing a photograph of the person
or the person himself. Forensic sketches are drawn by interviewing a
witness to gain a description of the suspect. Published research on
sketch to photo matching to this point has primarily focused on
matching viewed sketches [1], [2], [3], [4], [5], despite the fact that
real-world scenarios only involve forensic sketches. Both forensic
sketches and viewed sketches pose challenges to face recognition
due to the fact that probe sketch images contain different textures
compared to the gallery photographs they are being matched
against. However, forensic sketches pose additional challenges due
to the inability of a witness to exactly remember the appearance of
a suspect and her subjective account of the description, which
often results in inaccurate and incomplete forensic sketches.
We highlight two key difficulties in matching forensic sketches:
1) matching across image modalities and 2) performing face
recognition despite possibly inaccurate depictions of the face. In
order to solve the first problem, we use local feature-based
discriminant analysis (LFDA) to perform minimum distance match-
ing between sketches and photos, which is described in Section 3
and summarized in Fig. 2. The second problem is considered in
Section 5, where analysi s and improvements are offered for
matching forensic sketches against large mug shot galleries.
The contributions of the paper are summarized as follows:
1. We observe a substantial i mprovement in matching
viewed sketches over published algorithms using the
proposed local feature-based discriminant analysis.
2. We present the first large-scale published experiment on
matching real forensic sketches.
3. Using a mug shot gallery of 10,159 images, we perform
race and gender filtering to improve the matching results.
4. All experiments are validated by comparing the proposed
method against a leading commercial face recognition
engine.
The last point is significant since earlier studies on viewed sketches
used PCA (eigenface) matcher as the baseline. It is now well
known that the performance of PCA matcher can be easily
surpassed by other face matchers.
2RELATED WORK
Most research on sketch matching has dealt with viewed sketches.
Much of the early work in matching viewed sketches was performed
by Tang et al. [6], [2], [3], [5], [7]. These studies share a common
approach in that a synthetic photograph is generated from a sketch
(or vice versa) and standard face recognition algorithms are then
used to match the synthetic photographs to gallery photographs.
The different synthesis methods used include an eigentransforma-
tion method (Tang and Wang [6], [2]), local linear embedding (Liu
et al. [3]), and belief propagation on a Markov random field (Wang
and Tang [5]). Other synthesis methods have been proposed as well
[4], [8], [9], [10], [11]. The impact of matching sketches drawn by
different artists was studied by Al Nizami et al. [12].
Klare and Jain [1] proposed a method of sketch matching that
uses the same feature-based approach that has been successful in
other heterog eneous face recogniti on scenarios (spec ifically ,
matching near-infrared face images to visible light). In using SIFT
feature descriptors [13], the intrapersonal variations between the
sketch and photo modality were diminished while still maintain-
ing sufficient information for interclass discrimination. Such an
approach is similar to other methods proposed in the literature
[14], [15], [16] of matching near-infrared images (NIR) to visible
light images ( VIS), where local binary pattern [17] feature
descriptors are used to describe both NIR and VIS images.
In this paper, we extend our previous feature-based approach to
sketch matching [1]. This is achieved by using local binary patterns
(LBP) in addition to the SIFT feature descriptor, which is motivated
by LBP’s success in a similar heterogeneous matching application by
Liao et al. [14]. Additionally, we extend our feature-based matching
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 639
. B.F. Klare and Z. Li are with the Department of Computer Science and
Engineering, Michigan State University, 3208 Engineering Building, East
Lansing, MI 48823. E-mail: {klarebre, zfli}@msu.edu.
. A.K. Jain is with the Department of Computer Science and Engineering,
Michigan State University, 3208 Engineering Building, East Lansing, MI
48823, and the Department of Brain and Cognitive Engineering, Korea
University, Seoul, Korea. E-mail: jain@cse.msu.edu.
Manuscript received 18 Dec. 2009; revised 29 Apr. 2010; accepted 10 Aug.
2010; published online 29 Sept. 2010.
Recommended for acceptance by S. Sclaroff.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2009-12-0836.
Digital Object Identifier no. 10.1109/TPAMI.2010.180.
0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

to learn discriminant projections on “slices” of feature patches,
which is similar to the method proposed by Lei and Li [15].
3FEATURE-BASED SKETCH MATCHING
Image feature descriptors describe an image or image region using a
feature vector that captures the distinct characteristics of the image
[18]. Image-based features have been shown to be successful in face
recognition, most notably with the use of local binary patterns [19].
3.1 Feature-Based Representation
We w ill now describe how to represent a face with image
descriptors. Because most image descriptors are not sufficiently
verbose to fully describe a face image, the descriptors are computed
640 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011
Fig. 2. An overview of the (a) training and (b) recognition using the LFDA framework. Each sketch and photo are represented by SIFT and MLBP feature descriptors
extracted from overlapping patches. After grouping “slices” of patches together into feature vectors ðkÞðk ¼ 1 NÞ, we learn a discriminant projection
k
for each slice.
Recognition is performed after combining each projected vector slice into a single vector and measuring the normed distance between a probe sketch and a gallery
photo.
Fig. 1. The difference between viewed sketches and forensic sketches. (a) Viewed sketches and their corresponding photographs, (b) two pairs of good quality forensic
sketches and the corresponding photographs, and (c) two pairs of poor quality forensic sketches and the corresponding photographs.

over a set of uniformly distributed subregions of the face. The
feature vectors at sampled regions are then concatenated together
to describe the entire face. The feature sampling points are chosen
by setting two parameters: a region (or patch) size s and a
displacement size . The region size s defines the size of the square
window over which the image feature is computed. The displace-
ment size states the number of pixels the patch is displaced for
each sample; thus, ðs Þ is the number of overlapping pixels in
two adjacent patches. This is analogous to sliding a window of size
s s across the face image in a raster scan fashion. For an H W
image, the number of horizontal (N) and vertical (M) sampling
locations is given by N ¼ðW sÞ= þ 1 and M ¼ðH sÞ= þ 1.At
each of the M N patches, we compute the d-dimensional image
feature vector . These image feature vectors are concatenated into
one single ðM N dÞ-dimensional image vector . Whereas fðIÞ :
I ! denotes the extraction of a single feature descriptor from an
image, sampling multiple features using overlapping patches is
denoted as FðIÞ : I ! . Minimum distance sketch matching can
be performed directly using this feature-based representation of
subjects i and j by computing the normed vector distance kF
ðI
i
Þ
F ðI
j
Þk [1].
In our sketch matching framework, two feature descriptors are
used: SIFT and LBP. The SIFT feature descriptor quantizes both the
spatial locations and gradient orientations within an s s-sized
image patch, and computes a histogram in which ea ch bin
corresponds to a combination of a particular spatial location and
orientation. For each image pixel, the histogram bin corresponding
to its quantized orientation and location is incremented by the
product of 1) the magnitude of the image gradient at that pixel and
2) the value of a Gaussian function centered on the patch with a
standard deviation of s=2. Trilinear interpolation is used on the
quantized location of the pixel, which addresses image translation
noise. The final vector of histogram values is normalized to sum to
one. The reader is referred to [13] for a more detailed description of
how the SIFT feature descriptor is designed. It is important to
reiterate that because we are sampling SIFT feature descriptors
from a fixed grid and we do not use SIFT keypoint detection, the
SIFT feature descriptor is computed at predetermined locations.
For the local binary pattern feature descriptor [17], we extended
the LBP to describe the face at multiple scales by combining the
LBP descriptors computed with radii r 2f1; 3; 5; 7g. We refer to
this as the multiscale local binary pattern (MLBP). MLBP is similar
to other variants of the LBP, such as MB-LBP [14], but we obtained
slightly improved accuracy using MLBP.
The choice of the MLBP and SIFT feature descriptors was based
on reported success in heterogeneous face recognition and through
a quantitative evaluation of their ability to discriminate between
subjects in sketches and photos [20]. Though variants of LBPs have
led to substantial success in previous heterogeneous face recogni-
tion scenarios, the use of SIFT feature descriptors for this
application is quite novel. However, recent work [1] clea rly
demonstrates the success of SIFT feature descriptors for viewed
sketch recognition. SIFT feature descriptors have also been shown
to perform comparatively with LBP feature descriptors in a
standard face recognition scenario [21]. These feature descriptors
are well-suited for sketch recognition because they describe the
distribution of the direction of edges in the face; this is the
information that both sketches and photos contain. By densely
sampling these descriptors, sufficient discriminatory information
is retained to more accurately determine a subject’s identity over
previously used synthesis methods [1].
The feature-based representation requires each sketch and
photo image to be normalized by rotating the angle between the
two eyes to 0 degree, scaling the images to a 75 interocular pixel
distance, and cropping the image size to 200 by 250 pixels. The
experimental results reported in Sections 4 and 6 for each of the
two descriptors are based on the sum of score fusion of the match
scores generated from computing descriptors with patch sizes of
s ¼ 16 and s ¼ 32. This also holds for the global discriminant
described in Section 3.2; we fuse the matching scores computed
using two separate patch sizes of 16 and 32. When combining the
SIFT and MLBP features, the sum of score fusion is used as well.
3.2 Local Feature-Based Discriminant Analysis
With both sketches and photos characterized using SIFT and
MLBP image descriptors, we further refine this feature space using
discriminant analysis. This is done to reduce the large dimension-
ality of the feature vector . A straightforward approach would be
to apply classical subspace analysis (such as LDA) directly on ,
and to extract discriminant features for classification. However,
there are several problems with this approach. First, the feature
dimensionality is too high for direct subspace analysis. In our
experiments, each image is divided into either 154 overlapping
patches (for s ¼ 32Þ or 720 overlapping patches (for s ¼ 16Þ, with
each patch producing a 128-dimensional SIFT descriptor or a 236-
dimensional MLBP descriptor. The second problem i s the
possibility of overfitting due to the small sample size (SSS) [22].
In order to handle the combination of a large feature size and
small sample size, an ensemble of linear discriminant classifiers
called LFDA is proposed. Other discriminant analysis methods
have been proposed to handle the SSS problem, such as random
sampling LDA [23], regularized LDA [24], and direct LDA [25].
However, we chose the proposed LFDA method because it is
designed to work with a feature descriptor representation (as
opposed to an image pixel representation), and it resulted in high
recognition accuracy.
In the LFDA framework, each image feature vector is first
divided into “slices” of smaller dimensionality, where slices
correspond to the concatenation of feature descriptor vectors from
each column of image patches. Next, discriminant analysis is
performed separately on each slice by performing the following
three steps: PCA, within class whitening, and between class
discriminant analysis. Finally, PCA is applied to the new feature
vector to remove redundant information among the feature slices
to extract the final feature vector.
To train the LFDA, we use a training set consisting of pairs of
a corresponding sketch and photo of n subjects (which are the
n training classes). This results in a total of 2n training images
with two supports for each subject i:theimagefeature
representation of the sketch
i
s
¼ FðI
i
s
Þ and the photo
i
p
¼ F ðI
i
p
Þ. We combine these feature vectors as a column vector
in training matrices and refer to them as X
s
¼½
1
s
2
s
...
n
s
for
the sketch, X
p
¼½
1
p
2
p
...
n
p
for the photo, and X ¼
½
1
s
...
n
s
1
p
...
n
p
for the photo and sketch combined.
The first step in LFDA is to separate the image feature vector
into multiple subvectors or slices. Given the M N array of
patches consisting of SIFT or MLBP descriptors, we create one slice
for each of the N patch columns. With a d-dimensional feature
descriptor, each of the N slices is of dimensionality ðM dÞ. We call
this a “slice” because it is similar to slicing an image into N pieces.
After separating the feature vectors into slices, t he training
matrices now become X
s
k
2 IR
Md;n
, X
p
k
2 IR
Md;n
, and X
k
2 IR
Md;2n
(k ¼ 1...N), which are all mean-centered.
We next reduce the dimensionality of each training slice
matrix X
k
using the PCA matrix W
k
2 IR
Md;r
with r eigenvectors.
The purpose is to remove the noisy features which are usually
associated with the trailing eigenvectors with the smallest
eigenvalues. In our experiments, we use the 100 eigenvectors with
the largest eigenvalues (which preserves about 90 percent of the
variance). The discriminant extraction proceeds by generating the
mean projected class vectors
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 641

Y
k
¼ W
T
k
X
s
k
þ X
p
k
=2; ð1Þ
which are used to center the sketch and photo training instances of
each class by
~
X
s
k
¼ W
T
k
X
s
k
Y
k
;
~
X
p
k
¼ W
T
k
X
p
k
Y
k
:
ð2Þ
To reduce the intrapersonal variation between the sketch and the
photo, a whitening transform is performed. Whitening the within-
class scatter matrix reduces the large feature dimensions that
represent the principal intrapersonal variations, which in this case
correspond to intrapersonal differences between sketches and
photos. To do so, we recombine the training instances into
~
X
k
¼½
~
X
s
k
~
X
p
k
. PCA analysis is performed on
~
X
k
such that the
computed PCA projection matrix
~
V
k
2 IR
100;100
retains all data
variance from
~
X
k
. Let
k
2 IR
100;100
be a diagonal matrix whose
entries are the eigenvalues of the corresponding PCA eigenvec-
tors
~
V
k
. The whitening transform matrix is V
k
¼ð
1
2
k
V
T
k
Þ
T
.
The final step is to compute a projection matrix that maximizes
the intraperson scatter by performing PCA on V
T
Y
k
(which is the
whitening transform of the mean class vectors). Using all but one
of the eigenvectors in the PCA projection matrix, the resultant
projection matrix is denoted as U
k
2 IR
100;99
. This results in the final
projection matrix for slice k:
k
¼ W
k
V
k
U
k
: ð3Þ
With each local feature-based discriminant trained, we match
sketches to photos using the nearest neighbor matching on the
concatenated slice vectors. We first separate the feature represen-
tation of an image into individual slices
¼½ð1Þ
T
ð2Þ
T
...ðNÞ
T
T
; ð4Þ
where ðiÞ2IR
Md
is the ith slice feature vector. We then project
each slice using the LFDA projection matrix
k
, yielding the new
vector representation 2 IR
M99
:
¼

T
k
ð1Þ
T
T
k
ð2Þ
T
...
T
k
ðNÞ
T
T
: ð5Þ
With the LFDA representation of the sketch
s
and photo
p
, the
normed distance k
s
p
k is used to select the gallery photo with
the minimum distance to the probe sketch.
The proposed LFDA algorithm is a simple yet effective method.
From the results in Section 4, we can clearly see that LFDA is able
to significantly improve the recognition performance over the basic
feature-bas ed sketch matching framework. Similar to other
variants of LDA that are designed to handle the small sample
size problem [25], [24], [23], LFDA has several advantages over
trad itional li near discriminant analysis. First, LFDA is more
effective in handling large feature vectors. The idea of segregating
the feature vectors i nto slices allows us to work on more
manageable sized data with respect to the number of training
images. Second, because the subspace dimension is fixed by the
number of training subjects, when dealing with the smaller sized
slices, the LFDA algorithm is able to extract a larger number of
meaningful features. This is because the dimensionality of each
slice subspace is bounded by the same number of subjects as a
subspace on the entire feature representation would be.
4VIEWED SKETCH M ATCHING RESULTS
In order to compare our proposed LFDA framework to published
methods on sketch matching, we evaluated our method using
viewed sketches from the CUHK data set
1
[5]. This data set
consists of 606 corresponding sketch/photo pairs that were drawn
from three face data sets: 1) 123 pairs from the AR face database
[27], 2) 295 pairs from the XM2VTS database [28], and 3) 188 pairs
from the CUHK student database [6]. Each of these sketch images
was drawn by an artist while looking at the corresponding
photograph of the subject. Two examples of these viewed sketches
are shown in Fig. 1a. For the methods presented in this paper, all
results shown are the recognition rates averaged over five separate
random splits of 306 training subjects and 300 test subjects.
The results of viewed sketch matching experiment are
summarized in Table 1. The first column of the table shows the
baseline methods, which includes the top two performing methods
in the literature [1], [5] (each used 306 training subjects and 300 test
subjects) and Cognitec’s FaceVACS commercial face recognition
engine [26]. FaceVACS has been shown [1] to perform at the same
level as earlier solutions specifically trained for viewed sketch
recognition [2]. In the second column, the matching accuracies
achieved by directly comparing SIFT and MLBP feature vectors
are listed. The method “SIFT + MLBP” indicates a sum of score
fusion [29] of the match scores from SIFT matching and MLBP
matching. While both the SIFT and MLBP methods offer similar
levels of performance, using LFDA (third column) the accuracy
increases to the point where (one average) less than two sketches
are incorrectly identified out of the 300 sketches in the probe set.
While LFDA was able to reduce the error in half, the use of LDA
actually induced higher error. In the same experiment shown in
Table 1, we applied LDA on the entire feature vector instead of
breaking it into slices and performing LDA on each slice vector as
is done in LFDA. The accuracy of LDA+SIFT was 95.47 percent,
LDA+MLBP was 91.53 percent, and (SIFT+MLBP)+LDA was
97.07 percent. In each case, LDA actually lowered the accuracy
from the non-LFDA case. The decrease in accuracy observed when
applying the standard LDA is due to the small sample size
problem and the curse of dimensionality [22]. Given our large
feature repres entation (for a 32-pixel patch size, the SIFT
representation contains 19,712 components and the MLBP repre-
sentation contains 36,344 components), the subspace projections
are too tightly fit to the training data. Because the LFDA method is
an ensemble method, it is better suited to avoid this overfitting
problem. Other LDA variants have been shown to handle the small
sample size problem as well, such as random sampling LDA [23]
and regularized LDA [24].
642 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011
TABLE 1
Rank-1 Recognition Rates for Matching Viewed Sketches Using the CUHK Public Data Set
The standard deviation across the five random splits for each method in the middle and right columns is less than 1 percent.
1. The CUHK Face Sketch Database is available for download at: http://
mmlab.ie.cuhk.edu.hk/facesketch.html.

5MATCHING FORENSIC SKETCHES
The available methods for matching forensic sketches to photos are
limited. Uhl and Lobo [30] proposed a now antiquated method of
matching sketches drawn by forensic artists using photometric
standardization and facial features. Yuen and Man [31] matched
lab generated forensic composites to photographs based on point
distribution models.
5.1 Forensic Sketch Database
In our study, we used a data set consisting of 159 forensic sketches,
each with a corresponding photograph of the subject who was later
identified by the law enforcement agency. All of these sketches
were drawn by forensic sketch artists working with witnesses who
provided verbal descriptions after crimes were committed by an
unknown culprit. The corresponding photographs (mug shots) are
the result of the subject later being identified. The forensic sketch
data set used here comes from four different sources:
1. 73 images from the forensic sketch artist Lois Gibson [32],
2. 43 images from the forensic sketch artist Karen Taylor [33],
3. 39 forensic sketches provided by the Michigan State Police
Department, and
4. 4 forensic sketches provided by the Pinellas County
Sheriff’s Office.
In addition to these 159 corresponding forensic sketch and photo
pairs, we also made use of a data set of 10,000 mug shot images
provided by the Michigan State Police to populate the gallery. Thus,
the matching experiments closely replicate real-world scenarios
where a law enforcement agency would query a large gallery of
mug shot images with a forensic sketch. Examples of the forensic
sketches used in our experiments are shown in Figs. 1, 6, and 7.
Certain sketch images in our collection of forensic sketches are
of poor quality in terms of not capturing all of the facial features of
the suspect. For most of these sketches, it is unlikely that they can
be successfully matched automatically to the corresponding photos
because they barely resemble the subject. For this reason, we
separated our forensic sketches into two categories: good quality
and poor quality. This separation was performed subjectively by
looking at the corresponding pairs (sketch and photo) and labeling
them as good if the sketch possessed a reasonable resemblance of
the subject in the photo and labeling them as poor if the sketch was
grossly inaccurate. We believe that this leads to a more accurate
portrayal of how accurately forensic sketches can be matched.
Fig. 1 shows the difference between good quality and poor quality
sketches.
5.2 Large-Scale Forensic Sketch Matching
Matching forensic sketches to large mug shot galleries is different
in several respects from traditional face identification scenarios.
When presenting face recognition results in normal recognition
scenarios, we are generally concerned with exactly identifying the
subject in question in a fully automated manner. For example,
when preventing multiple passports from being issued to the
same person, human interaction should be limited to only
ambiguous cases. This is due to the large volume of requests
such a system must process. The same is true for matching
arrested criminals against existing mug shot databases to confirm
their identity. However, when matching forensic sketches, it is not
critical for the top retrieval result to be the correct subject as long
as it is in the top R retrieved results, say R ¼ 50. This is because
the culprit being depicted in a forensic sketch typically has
committed a heinous crime (e.g., murder, rape, and armed
robbery) that will receive a large amount of attention from
investigators. Instead of accepting or dismissing only the top
retrieved photo, law enforcement officers will consider the top R
retrieval results as potential suspects. Generally, many of the
returned subjects can be immediately eliminated as suspects for
various reasons, such as if they are currently incarcerated or
deceased. The remaining candidates can each then be investigated
for their culpability of committing the crime. This scenario is also
true of crimes in which a photograph of a suspect is available.
Investigators will consider the top R retrieval results instead of
only the highest match. Based on the practice followed in
forensics, we would like R to be around 50; that is, we are
mainly concerned with whether or not the true subject is within
the top 50 retrieved images.
In order to improve the accuracy of matching forensic sketches,
we utilize ancillary or demographic information provided by the
witness, to be used as a soft biometric [34]. For example, suppose the
witness reports that the race of the culprit is Caucasian; then we can
eliminate all non-Caucasian members of the gallery to not only
speed up the matching bu t also t o improve the matching
performance. The same is true for gender: If the suspect is reported
to be a female, then we disregard any male subject in the gallery. To
use this approach, we manually labeled all of the 10,159 mug shot
images and all of the forensic sketch/photo pairs in our database
with race and gender. For gender, we considered one of three
possible categories: male, female, and (in rare cases) unknown. For
race, we considered one of three categories: Caucasian, African-
American, and “other.” The “other” includes individuals who are of
Hispanic, Asian, or multiple races. Table 2 lists the percentage of
members from each race and gender category in the forensic
sketches and the mug shot gallery used in our experiments.
We lack additional ancillary information (e.g., age, height, scars,
marks, and tattoos) that could potentially be used to further
improve the matching accuracy.
6FORENSIC SKETCH MATCHING RESULTS
Forensic sketch recognition performance using the 159 forensic
sketch images (probe set) and 10,159 mug shot images (gallery)
will now be presented. In these matching experiments, we use the
local feature-based discriminant analysis framework presented in
Section 3. Our matching uses the sum of score fusion of MLBP and
SIFT LFDA, as this was the highest performing method for
matching viewed sketches (Table 1).
The performance of matching sketches classified as good and
poor can be found in Fig. 3. There is a substantial difference in the
matching performance of good sketches and poor sketches. Despite
the fact that poor sketches are extremely difficult to match, the
CMC plots in Fig. 3 show that the proposed method performs
roughly the same on the poor sketches than a state-of-the-art
commercial matcher (FaceVACS) does on the good sketches.
Fig. 4 and Table 3 show the recognition performance when race
and gender information is used to filter the gallery. By utilizing
this ancillary information, we c an significantly increase the
performance of forensic sketch recognition. We noticed a larger
performance gain by using race information than the gender
information. This is likely due to the more uniform distribution of
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 643
TABLE 2
Demographics of the 159 Forensic Sketch Images
and the 10,159 Mug Shot Gallery Images

Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a Multi-view Discriminant Analysis (MvDA) approach, which seeks for a single discriminant common space for multiple views in a non-pairwise manner by jointly learning multiple view-specific linear transforms.
Abstract: In many computer vision systems, the same object can be observed at varying viewpoints or even by different sensors, which brings in the challenging demand for recognizing objects from distinct even heterogeneous views. In this work we propose a Multi-view Discriminant Analysis (MvDA) approach, which seeks for a single discriminant common space for multiple views in a non-pairwise manner by jointly learning multiple view-specific linear transforms. Specifically, our MvDA is formulated to jointly solve the multiple linear transforms by optimizing a generalized Rayleigh quotient, i.e., maximizing the between-class variations and minimizing the within-class variations from both intra-view and inter-view in the common space. By reformulating this problem as a ratio trace problem, the multiple linear transforms are achieved analytically and simultaneously through generalized eigenvalue decomposition. Furthermore, inspired by the observation that different views share similar data structures, a constraint is introduced to enforce the view-consistency of the multiple linear transforms. The proposed method is evaluated on three tasks: face recognition across pose, photo versus. sketch face recognition, and visual light image versus near infrared image face recognition on Multi-PIE, CUFSF and HFB databases respectively. Extensive experiments show that our MvDA achieves significant improvements compared with the best known results.

610 citations


Cites methods from "Matching Forensic Sketches to Mug S..."

  • ...In [23], a local feature-based discriminant analysis method was proposed to match a forensic sketch and a mug shot photo, and also other effective features can be used such as [24]....

    [...]

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper uses Partial Least Squares to linearly map images in different modalities to a common linear subspace in which they are highly correlated, and forms a generic intermediate subspace comparison framework for multi-modal recognition.
Abstract: This paper presents a novel way to perform multi-modal face recognition. We use Partial Least Squares (PLS) to linearly map images in different modalities to a common linear subspace in which they are highly correlated. PLS has been previously used effectively for feature selection in face recognition. We show both theoretically and experimentally that PLS can be used effectively across modalities. We also formulate a generic intermediate subspace comparison framework for multi-modal recognition. Surprisingly, we achieve high performance using only pixel intensities as features. We experimentally demonstrate the highest published recognition rates on the pose variations in the PIE data set, and also show that PLS can be used to compare sketches to photos, and to compare images taken at different resolutions.

382 citations

Journal ArticleDOI
TL;DR: A generic HFR framework is proposed in which both probe and gallery images are represented in terms of nonlinear similarities to a collection of prototype face images, and Random sampling is introduced into the H FR framework to better handle challenges arising from the small sample size problem.
Abstract: Heterogeneous face recognition (HFR) involves matching two face images from alternate imaging modalities, such as an infrared image to a photograph or a sketch to a photograph. Accurate HFR systems are of great value in various applications (e.g., forensics and surveillance), where the gallery databases are populated with photographs (e.g., mug shot or passport photographs) but the probe images are often limited to some alternate modality. A generic HFR framework is proposed in which both probe and gallery images are represented in terms of nonlinear similarities to a collection of prototype face images. The prototype subjects (i.e., the training set) have an image in each modality (probe and gallery), and the similarity of an image is measured against the prototype images from the corresponding modality. The accuracy of this nonlinear prototype representation is improved by projecting the features into a linear discriminant subspace. Random sampling is introduced into the HFR framework to better handle challenges arising from the small sample size problem. The merits of the proposed approach, called prototype random subspace (P-RS), are demonstrated on four different heterogeneous scenarios: 1) near infrared (NIR) to photograph, 2) thermal to photograph, 3) viewed sketch to photograph, and 4) forensic sketch to photograph.

358 citations


Additional excerpts

  • ...In our implementation, s ¼ 16....

    [...]

  • ...In this paper, 1 ¼ 2 and 2 ¼ 4....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a method to learn a discriminant face descriptor (DFD) in a data-driven way and applies it to the heterogeneous (cross-modality) face recognition problem and learns DFD in a coupled way to reduce the gap between features of heterogeneous face images to improve the performance of this challenging problem.
Abstract: Local feature descriptor is an important module for face recognition and those like Gabor and local binary patterns (LBP) have proven effective face descriptors. Traditionally, the form of such local descriptors is predefined in a handcrafted way. In this paper, we propose a method to learn a discriminant face descriptor (DFD) in a data-driven way. The idea is to learn the most discriminant local features that minimize the difference of the features between images of the same person and maximize that between images from different people. In particular, we propose to enhance the discriminative ability of face representation in three aspects. First, the discriminant image filters are learned. Second, the optimal neighborhood sampling strategy is soft determined. Third, the dominant patterns are statistically constructed. Discriminative learning is incorporated to extract effective and robust features. We further apply the proposed method to the heterogeneous (cross-modality) face recognition problem and learn DFD in a coupled way (coupled DFD or C-DFD) to reduce the gap between features of heterogeneous face images to improve the performance of this challenging problem. Extensive experiments on FERET, CAS-PEAL-R1, LFW, and HFB face databases validate the effectiveness of the proposed DFD learning on both homogeneous and heterogeneous face recognition problems. The DFD improves POEM and LQP by about 4.5 percent on LFW database and the C-DFD enhances the heterogeneous face recognition performance of LBP by over 25 percent.

342 citations

Proceedings ArticleDOI
20 Jun 2011
TL;DR: A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches by reducing the modality gap at the feature extraction stage.
Abstract: Automatic face photo-sketch recognition has important applications for law enforcement. Recent research has focused on transforming photos and sketches into the same modality for matching or developing advanced classification algorithms to reduce the modality gap between features extracted from photos and sketches. In this paper, we propose a new inter-modality face recognition approach by reducing the modality gap at the feature extraction stage. A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled information-theoretic projection tree, which is extended to the randomized forest to further boost the performance. We create the largest face sketch database including sketches of 1, 194 people from the FERET database. Experiments on this large scale dataset show that our approach significantly outperforms the state-of-the-art methods.

338 citations


Cites background or methods from "Matching Forensic Sketches to Mug S..."

  • ...[14] proposed local feature-based discriminant analysis (LFDA)....

    [...]

  • ...The second family of approaches [17, 15, 14] focused on the classification stage and tried to design advanced classifiers to reduce the modality gap between features extracted from photos and sketches....

    [...]

  • ...This application leads to a number of studies on this topic [26, 27, 28, 31, 9, 14, 6]....

    [...]

  • ...We test our method with the same configuration of training and testing splits as [31, 14]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations


"Matching Forensic Sketches to Mug S..." refers methods in this paper

  • ...Such an approach is similar to other methods proposed in the literature [14], [15], [16] of matching near-infrared images (NIR) to visible light images (VIS), where local binary pattern [17] feature descriptors are used to describe both NIR and VIS images....

    [...]

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features that is assessed in the face recognition problem under different challenges.
Abstract: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed

5,563 citations


"Matching Forensic Sketches to Mug S..." refers background in this paper

  • ...Image-based features have been shown to be successful in face recognition, most notably with the use of local binary patterns [19]....

    [...]

Journal Article

2,952 citations


"Matching Forensic Sketches to Mug S..." refers background in this paper

  • ...[27], 2) 295 pairs from the XM2VTS database [28], and 3) 188 pairs...

    [...]

Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "Matching forensic sketches to mug shot photos" ?

The problem of matching a forensic sketch to a gallery of mug shot images is addressed in this paper. Forensic sketches differ from viewed sketches in that they are drawn by a police sketch artist using the description of the subject provided by an eyewitness. To identify forensic sketches, the authors present a framework called local feature-based discriminant analysis ( LFDA ). In LFDA, the authors individually represent both sketches and photos using SIFT feature descriptors and multiscale local binary patterns ( MLBP ). The authors were able to further improve the matching performance using race and gender information to reduce the target gallery size. 

In order to handle the combination of a large feature size and small sample size, an ensemble of linear discriminant classifiers called LFDA is proposed. 

Other discriminant analysis methods have been proposed to handle the SSS problem, such as random sampling LDA [23], regularized LDA [24], and direct LDA [25]. 

Because most image descriptors are not sufficientlyverbose to fully describe a face image, the descriptors are computedover a set of uniformly distributed subregions of the face. 

the authors believe that with a larger number of forensic sketches, the authors could more properly train their discriminant and further improve the matching performance. 

This is because the culprit being depicted in a forensic sketch typically has committed a heinous crime (e.g., murder, rape, and armed robbery) that will receive a large amount of attention from investigators. 

A straightforward approach would be to apply classical subspace analysis (such as LDA) directly on , and to extract discriminant features for classification. 

The authors highlight two key difficulties in matching forensic sketches: 1) matching across image modalities and 2) performing face recognition despite possibly inaccurate depictions of the face.