Journal Article•DOI•

Matching Forensic Sketches to Mug Shot Photos

Brendan Klare¹, Zhifeng Li¹, Anil K. Jain¹•Institutions (1)

01 Mar 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 33, Iss: 3, pp 639-646

TL;DR: Compared to a leading commercial face recognition system, LFDA offers substantial improvements in matching forensic sketches to the corresponding face images and leads to state-of-the-art accuracys when matching viewed sketches.

read less

Abstract: The problem of matching a forensic sketch to a gallery of mug shot images is addressed in this paper. Previous research in sketch matching only offered solutions to matching highly accurate sketches that were drawn while looking at the subject (viewed sketches). Forensic sketches differ from viewed sketches in that they are drawn by a police sketch artist using the description of the subject provided by an eyewitness. To identify forensic sketches, we present a framework called local feature-based discriminant analysis (LFDA). In LFDA, we individually represent both sketches and photos using SIFT feature descriptors and multiscale local binary patterns (MLBP). Multiple discriminant projections are then used on partitioned vectors of the feature-based representation for minimum distance matching. We apply this method to match a data set of 159 forensic sketches against a mug shot gallery containing 10,159 images. Compared to a leading commercial face recognition system, LFDA offers substantial improvements in matching forensic sketches to the corresponding face images. We were able to further improve the matching performance using race and gender information to reduce the target gallery size. Additional experiments demonstrate that the proposed framework leads to state-of-the-art accuracys when matching viewed sketches.

...read moreread less

Summary (3 min read)

Jump to: [Introduction] – [Participants] – [Measurements] – [IT and CT familiarisation sessions] – [IT and CT test sessions] – [Statistical analysis] – [Results] – [Discussion] – [Medication] – [Training intensity] – [Limitations] and [Conclusion]

Introduction

Coronary artery disease (CAD) causes about 45% of all deaths in Europe.
Therefore, the main aim of exercisebased cardiac rehabilitation is to improve peak VO2.
The most common training programmes currently used in cardiac rehabilitation are continuous training (CT) and interval training (IT) or a mixture.
A decade ago, it was shown that IT, consisting of several bouts of exercise at a higher intensity (80–90% of the peak VO2) alternated by active recovery periods, may provide a safe alternative to CT in CAD patients.

Participants

The authors study included 20 male CAD patients (mean age 62.4 6.1), referred to the Cardiac Rehabilitation Unit from the University Hospitals of Leuven between July 2014–November 2015.
The study was approved by the local ethics committee (Commissie Medische Ethiek KU Leuven).
After inclusion, two patients dropped out before any tests were performed: one patient because of an old knee injury and one due to severe arrhythmias at high training intensities.
There were no significant differences between the baseline characteristics of the patients with or without the dropouts (data not shown).

Measurements

Height (cm) and weight (kg) were measured, and body mass index (BMI) (kg/m2) was calculated by dividing weight (kg) by height squared (m2).
Before starting the cardiac rehabilitation, subjects performed a maximal graded exercise test on a cycle ergometer (Oxycon Pro, Jaeger, CareFusion, Germany) supervised by a trained exercise physiologist.
The peak HR was defined as the highest HR reached at the end of the test.
The peak VO2 was determined as the at KU Leuven University Library on April 21, 2016cpr.sagepub.comDownloaded from VO2 during the last full bout of 30 s of the test.
A peak respiratory exchange ratio of at least 1.10 defined a maximal effort.28 Individual peak VO2 results were compared with predicted reference values of Wasserman et al. to determine percentage of predicted peak VO2.

IT and CT familiarisation sessions

After inclusion, subjects were randomly assigned on a 1:1 base to either training group 1 (starting with IT) or group 2 (starting with CT).
They performed a total of six supervised IT (n¼ 3) and CT (n¼ 3) sessions on a cycle ergometer (Ergo-fit, Gymna, Brussels, Belgium), three times per week for two weeks.
The IT and CT sessions were alternated and aimed to familiarise the subjects with both training programmes.
A Garmin chest strap and wristwatch continuously monitored HR (Garmin, Garmin International, Kansas, USA), and training loads were adapted throughout the training to ensure that participants would remain within the prescribed HR zones.

IT and CT test sessions

The two-week run-in period was followed by four test sessions within two weeks: the IT and CT according to the protocol of the SAINTEX-CAD study,25 which was based on Wisloff et al.
An online randomisation procedure was performed to determine the sequence of the four test sessions.
During the test sessions, HR was continuously monitored using a Polar chest strap (Polar, Polar Electro, Kempele, Finland).
As beta-blocking medication might influence HR, the authors also asked their patients to take their medication at the same time each morning of the test.
Breath by breath gas exchange measurements (VE, VO2, VCO2), averaged each 30 s, were provided by the Oxycon mobile device (Oxycon mobile Jaeger, CareFusion, Germany) and were used to calculate the EE (indirect calorimetry) according to the American College of Sports Medicine (ACSM) guidelines; one litre of O2 uptake equals 5 kcal.

Statistical analysis

All values are expressed as mean standard deviation (SD), median and range, or as number and percentage.
All data were normally distributed according to the Shapiro-Wilk test for normality.
To examine the differences in EE, exercise intensity or lactate between the ITw, CTw, ITc and CTc session, a repeated measures analysis of variance was performed.
A p-value< 0.05 was considered statistically significant.

Results

A total of 18 patients completed the six training sessions and the four tests; baseline characteristics are presented in Table 1.
Patients did not change medication during the study period.
In Figure 4 and Table 2, the mean EE per training session is presented.
The patients performed the tests within the prescribed HR zones (taking into account the end-interval values for the IT sessions) as shown in Table 2 and in Figure 5.

Discussion

This is the first study to objectively measure EE during IT and CT.
Therefore in previous studies, training sessions were designed to be isocaloric based on a theoretical calculation of their respective EE.
The present study examines whether the IT and CT programmes described in the protocol of the SAINTEX-CAD study25 (based on a study of Wisloff et al.: ITw and CTw),22 and the actually achieved intensities in the SAINTEX-CAD study (ITc and CTc),10 were truly isocaloric by objectively measuring the EE.
(a) a similar EE for the protocols of Wisloff et al, also known as The authors results showed.
(ITw vs CTw); (b) a significantly higher EE for the CT of the SAINTEX-CAD study compared to the IT (ITc vs CTc); (c) a significant increase in lactate after ITw, ITc and CTc, but not Table 1.

Medication

Nevertheless, most of the patients had difficulties maintaining this high HR zone and needed constant encouragement from the supervising exercise physiologist.
According to Skinner’s three-phase model, training modalities below the first ventilatory threshold (60– 70% peak HR) do not exceed a 2mmol/l lactate level.
When lactate levels exceed 4mmol/l, passing the second ventilatory threshold, the exercise is considered anaerobic (>90% peak HR).

Training intensity

At KU Leuven University Library on April 21, 2016cpr.sagepub.comDownloaded from between the lactate levels of the CT and IT, with IT showing lactate values> 4mmol/l.
This implies that the high intensity intervals may have been performed anaerobically for most of the individuals (ITw 14/18; ITc 13/18), but not in all, since this secondary threshold is highly individual.
The frequently used term ‘aerobic’ interval training may thus be inappropriate and misleading, but further research is needed.
The authors can however conclude that the participants remained in the aerobic zone for both CT sessions (CTw 2.45mmol/l and CTc 3.41mmol/l).
From the lactate levels of the CTw session, the authors can conclude that an intensity set at 70–75% of peak HR is simply insufficient as a training stimulus.

Limitations

The results of this study should be interpreted within the context of its limitations.
The first limitation of their study consists of a relatively small sample size.
The authors expect, however, these variations to be random across all subjects, which should, therefore, not significantly influence their final result.
As the authors previously reported that substantial changes in peak VO2 at KU Leuven University Library on April 21, 2016cpr.sagepub.comDownloaded from and peak HR already occur after six weeks of training,10 this run-in period could have influenced the high VO2 levels achieved at prescribed HR.
Fifth, the authors did not measure lactate levels during the maximal exercise test, hence they were unable to compare these individual maximal lactate levels to the lactate levels found during the tests.

Conclusion

The authors found that CT according to the SAINTEX-CAD study expended significantly more energy than IT.
As previous studies showed IT and CT to either yield similar improvement in peak VO2, or a larger improvements after IT compared to CT, the authors can conclude that IT is more efficient in improving peak VO2, producing a larger gain during a shorter training duration and at a lower energy cost.
Since EE is not a goal on its own in cardiac rehabilitation, the main question remains which training modality is the most appropriate to increase peak VO2.
The authors can suggest that IT and CT are equally effective, if CT sessions are performed at sufficiently high intensities.
The authors stress the importance of objectively measuring actual caloric expenditure by indirect calorimetry in pilot studies, rather than using a general formula for setting up isocaloric exercise training programmes.

Did you find this useful? Give us your feedback

Figures (8)

Fig. 4. Performance of matching good sketches with and without using ancillary demographic information (race and gender) to filter the results.

Fig. 5. Drop in Rank-50 accuracy using LFDA and FaceVACS on the good sketches, without race and gender filtering, as the gallery size increases.

Fig. 3. Performance of matching forensic sketches that were labeled as good (49 sketches) and poor (110 sketches) against a gallery of 10,159 mug shot images without using race/gender filtering.

Fig. 1. The difference between viewed sketches and forensic sketches. (a) Viewed sketches and their corresponding photographs, (b) two pairs of good quality forensic sketches and the corresponding photographs, and (c) two pairs of poor quality forensic sketches and the corresponding photographs.

Fig. 2. An overview of the (a) training and (b) recognition using the LFDA framework. Each sketch and photo are represented by SIFT and MLBP feature descriptors extracted from overlapping patches. After grouping “slices” of patches together into feature vectors ðkÞðk ¼ 1 NÞ, we learn a discriminant projection k for each slice. Recognition is performed after combining each projected vector slice into a single vector ’ and measuring the normed distance between a probe sketch and a gallery photo.

Fig. 6. Two examples of failed retrievals on good sketches where the first column shows the probe sketches, the second column shows the incorrect Rank-1 matches, and the third column shows the true mated photographs. These are typical cases in which the true subject photo was not retrieved at rank 1, but the impostor subject retrieved at rank 1 visually looks more similar to the sketch than the true subject.

Fig. 7. Example matches from good quality forensic sketches with a background gallery of 10,159 subjects. We show (a) three of the best matches and (b) three of the worst matches. Below each example is the rank of correct match obtained by the proposed LFDA method and FaceVACS.

TABLE 2 Demographics of the 159 Forensic Sketch Images and the 10,159 Mug Shot Gallery Images

Content maybe subject to copyright Report

Matching Forensic Sketches

to Mug Shot Photos

Brendan F. Klare, Student Member, IEEE,

Zhifeng Li, Member, IEEE,and

Anil K. Jain, Fellow, IEEE

Abstract—The problem of matching a forensic sketch to a gallery of mug shot

images is addressed in this paper. Previous research in sketch matching only

offered solutions to matching highly accurate sketches that were drawn while

looking at the subject (viewed sketches). Forensic sketches differ from viewed

sketches in that they are drawn by a police sketch artist using the description of the

subject provided by an eyewitness. To identify forensic sketches, we present a

framework called local feature-based discriminant analysis (LFDA). In LFDA, we

individually represent both sketches and photos using SIFT feature descriptors

and multiscale local binary patterns (MLBP). Multiple discriminant projections are

then used on partitioned vectors of the feature-based representation for minimum

distance matching. We apply this method to match a data set of 159 forensic

sketches against a mug shot gallery containing 10,159 images. Compared to a

leading commercial face recognition system, LFDA offers substantial

improvements in matching forensic sketches to the corresponding face images.

We were able to further improve the matching performance using race and gender

information to reduce the target gallery size. Additional experiments demonstrate

that the proposed framework leads to state-of-the-art accuracys when matching

viewed sketches.

Index Terms—Face recognition, forensic sketch, viewed sketch, local feature

discriminant analysis, feature selection, heterogeneous face recognition.

1INTRODUCTION

PROGRESS in biometric technology has provided law enforcement

agencies additional tools to help determine the identity of

criminals. In addition to DNA and circumstantial evidence, if a

latent fingerprint i s found at an investigative scene or a

surveillance camera captures an image of a suspect’s face, then

these cues may be used to help determine the culprit’s identity

using automated biometric identification. However, many crimes

occur where none of this information is present, but instead an

eyewitness account of the crime is available. In these circum-

stances, a forensic artist is often used to work with the witness in

order to draw a sketch that depicts the facial appearance of the

culprit according to the verbal description. Once the sketch image

of the transgressor is complete, it is then disseminated to law

enforcement officers and media outlets with the hopes of someone

knowing the suspect. These sketches are known as forensic

sketches and this paper describes a robust method for matching

forensic sketches to large mug shot (image) databases maintained

by law enforcement agencies.

Two different types of face sketches are discussed in this paper:

viewed sketches and forensic sketches (see Fig. 1). Viewed sketches are

sketches that are drawn while viewing a photograph of the person

or the person himself. Forensic sketches are drawn by interviewing a

witness to gain a description of the suspect. Published research on

sketch to photo matching to this point has primarily focused on

matching viewed sketches [1], [2], [3], [4], [5], despite the fact that

real-world scenarios only involve forensic sketches. Both forensic

sketches and viewed sketches pose challenges to face recognition

due to the fact that probe sketch images contain different textures

compared to the gallery photographs they are being matched

against. However, forensic sketches pose additional challenges due

to the inability of a witness to exactly remember the appearance of

a suspect and her subjective account of the description, which

often results in inaccurate and incomplete forensic sketches.

We highlight two key difficulties in matching forensic sketches:

1) matching across image modalities and 2) performing face

recognition despite possibly inaccurate depictions of the face. In

order to solve the first problem, we use local feature-based

discriminant analysis (LFDA) to perform minimum distance match-

ing between sketches and photos, which is described in Section 3

and summarized in Fig. 2. The second problem is considered in

Section 5, where analysi s and improvements are offered for

matching forensic sketches against large mug shot galleries.

The contributions of the paper are summarized as follows:

1. We observe a substantial i mprovement in matching

viewed sketches over published algorithms using the

proposed local feature-based discriminant analysis.

2. We present the first large-scale published experiment on

matching real forensic sketches.

3. Using a mug shot gallery of 10,159 images, we perform

race and gender filtering to improve the matching results.

4. All experiments are validated by comparing the proposed

method against a leading commercial face recognition

engine.

The last point is significant since earlier studies on viewed sketches

used PCA (eigenface) matcher as the baseline. It is now well

known that the performance of PCA matcher can be easily

surpassed by other face matchers.

2RELATED WORK

Most research on sketch matching has dealt with viewed sketches.

Much of the early work in matching viewed sketches was performed

by Tang et al. [6], [2], [3], [5], [7]. These studies share a common

approach in that a synthetic photograph is generated from a sketch

(or vice versa) and standard face recognition algorithms are then

used to match the synthetic photographs to gallery photographs.

The different synthesis methods used include an eigentransforma-

tion method (Tang and Wang [6], [2]), local linear embedding (Liu

et al. [3]), and belief propagation on a Markov random field (Wang

and Tang [5]). Other synthesis methods have been proposed as well

[4], [8], [9], [10], [11]. The impact of matching sketches drawn by

different artists was studied by Al Nizami et al. [12].

Klare and Jain [1] proposed a method of sketch matching that

uses the same feature-based approach that has been successful in

other heterog eneous face recogniti on scenarios (spec ifically ,

matching near-infrared face images to visible light). In using SIFT

feature descriptors [13], the intrapersonal variations between the

sketch and photo modality were diminished while still maintain-

ing sufficient information for interclass discrimination. Such an

approach is similar to other methods proposed in the literature

[14], [15], [16] of matching near-infrared images (NIR) to visible

light images ( VIS), where local binary pattern [17] feature

descriptors are used to describe both NIR and VIS images.

In this paper, we extend our previous feature-based approach to

sketch matching [1]. This is achieved by using local binary patterns

(LBP) in addition to the SIFT feature descriptor, which is motivated

by LBP’s success in a similar heterogeneous matching application by

Liao et al. [14]. Additionally, we extend our feature-based matching

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 639

. B.F. Klare and Z. Li are with the Department of Computer Science and

Engineering, Michigan State University, 3208 Engineering Building, East

Lansing, MI 48823. E-mail: {klarebre, zfli}@msu.edu.

. A.K. Jain is with the Department of Computer Science and Engineering,

Michigan State University, 3208 Engineering Building, East Lansing, MI

48823, and the Department of Brain and Cognitive Engineering, Korea

University, Seoul, Korea. E-mail: jain@cse.msu.edu.

Manuscript received 18 Dec. 2009; revised 29 Apr. 2010; accepted 10 Aug.

2010; published online 29 Sept. 2010.

Recommended for acceptance by S. Sclaroff.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number

TPAMI-2009-12-0836.

Digital Object Identifier no. 10.1109/TPAMI.2010.180.

0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

to learn discriminant projections on “slices” of feature patches,

which is similar to the method proposed by Lei and Li [15].

3FEATURE-BASED SKETCH MATCHING

Image feature descriptors describe an image or image region using a

feature vector that captures the distinct characteristics of the image

[18]. Image-based features have been shown to be successful in face

recognition, most notably with the use of local binary patterns [19].

3.1 Feature-Based Representation

We w ill now describe how to represent a face with image

descriptors. Because most image descriptors are not sufficiently

verbose to fully describe a face image, the descriptors are computed

640 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011

Fig. 2. An overview of the (a) training and (b) recognition using the LFDA framework. Each sketch and photo are represented by SIFT and MLBP feature descriptors

extracted from overlapping patches. After grouping “slices” of patches together into feature vectors ðkÞðk ¼ 1 NÞ, we learn a discriminant projection 

for each slice.

Recognition is performed after combining each projected vector slice into a single vector ’ and measuring the normed distance between a probe sketch and a gallery

photo.

Fig. 1. The difference between viewed sketches and forensic sketches. (a) Viewed sketches and their corresponding photographs, (b) two pairs of good quality forensic

sketches and the corresponding photographs, and (c) two pairs of poor quality forensic sketches and the corresponding photographs.

over a set of uniformly distributed subregions of the face. The

feature vectors at sampled regions are then concatenated together

to describe the entire face. The feature sampling points are chosen

by setting two parameters: a region (or patch) size s and a

displacement size . The region size s defines the size of the square

window over which the image feature is computed. The displace-

ment size  states the number of pixels the patch is displaced for

each sample; thus, ðs  Þ is the number of overlapping pixels in

two adjacent patches. This is analogous to sliding a window of size

s  s across the face image in a raster scan fashion. For an H  W

image, the number of horizontal (N) and vertical (M) sampling

locations is given by N ¼ðW  sÞ= þ 1 and M ¼ðH  sÞ= þ 1.At

each of the M  N patches, we compute the d-dimensional image

feature vector . These image feature vectors are concatenated into

one single ðM  N  dÞ-dimensional image vector . Whereas fðIÞ :

I !  denotes the extraction of a single feature descriptor from an

image, sampling multiple features using overlapping patches is

denoted as FðIÞ : I ! . Minimum distance sketch matching can

be performed directly using this feature-based representation of

subjects i and j by computing the normed vector distance kF

ðI

Þ

F ðI

Þk [1].

In our sketch matching framework, two feature descriptors are

used: SIFT and LBP. The SIFT feature descriptor quantizes both the

spatial locations and gradient orientations within an s  s-sized

image patch, and computes a histogram in which ea ch bin

corresponds to a combination of a particular spatial location and

orientation. For each image pixel, the histogram bin corresponding

to its quantized orientation and location is incremented by the

product of 1) the magnitude of the image gradient at that pixel and

2) the value of a Gaussian function centered on the patch with a

standard deviation of s=2. Trilinear interpolation is used on the

quantized location of the pixel, which addresses image translation

noise. The final vector of histogram values is normalized to sum to

one. The reader is referred to [13] for a more detailed description of

how the SIFT feature descriptor is designed. It is important to

reiterate that because we are sampling SIFT feature descriptors

from a fixed grid and we do not use SIFT keypoint detection, the

SIFT feature descriptor is computed at predetermined locations.

For the local binary pattern feature descriptor [17], we extended

the LBP to describe the face at multiple scales by combining the

LBP descriptors computed with radii r 2f1; 3; 5; 7g. We refer to

this as the multiscale local binary pattern (MLBP). MLBP is similar

to other variants of the LBP, such as MB-LBP [14], but we obtained

slightly improved accuracy using MLBP.

The choice of the MLBP and SIFT feature descriptors was based

on reported success in heterogeneous face recognition and through

a quantitative evaluation of their ability to discriminate between

subjects in sketches and photos [20]. Though variants of LBPs have

led to substantial success in previous heterogeneous face recogni-

tion scenarios, the use of SIFT feature descriptors for this

application is quite novel. However, recent work [1] clea rly

demonstrates the success of SIFT feature descriptors for viewed

sketch recognition. SIFT feature descriptors have also been shown

to perform comparatively with LBP feature descriptors in a

standard face recognition scenario [21]. These feature descriptors

are well-suited for sketch recognition because they describe the

distribution of the direction of edges in the face; this is the

information that both sketches and photos contain. By densely

sampling these descriptors, sufficient discriminatory information

is retained to more accurately determine a subject’s identity over

previously used synthesis methods [1].

The feature-based representation requires each sketch and

photo image to be normalized by rotating the angle between the

two eyes to 0 degree, scaling the images to a 75 interocular pixel

distance, and cropping the image size to 200 by 250 pixels. The

experimental results reported in Sections 4 and 6 for each of the

two descriptors are based on the sum of score fusion of the match

scores generated from computing descriptors with patch sizes of

s ¼ 16 and s ¼ 32. This also holds for the global discriminant

described in Section 3.2; we fuse the matching scores computed

using two separate patch sizes of 16 and 32. When combining the

SIFT and MLBP features, the sum of score fusion is used as well.

3.2 Local Feature-Based Discriminant Analysis

With both sketches and photos characterized using SIFT and

MLBP image descriptors, we further refine this feature space using

discriminant analysis. This is done to reduce the large dimension-

ality of the feature vector . A straightforward approach would be

to apply classical subspace analysis (such as LDA) directly on ,

and to extract discriminant features for classification. However,

there are several problems with this approach. First, the feature

dimensionality is too high for direct subspace analysis. In our

experiments, each image is divided into either 154 overlapping

patches (for s ¼ 32Þ or 720 overlapping patches (for s ¼ 16Þ, with

each patch producing a 128-dimensional SIFT descriptor or a 236-

dimensional MLBP descriptor. The second problem i s the

possibility of overfitting due to the small sample size (SSS) [22].

In order to handle the combination of a large feature size and

small sample size, an ensemble of linear discriminant classifiers

called LFDA is proposed. Other discriminant analysis methods

have been proposed to handle the SSS problem, such as random

sampling LDA [23], regularized LDA [24], and direct LDA [25].

However, we chose the proposed LFDA method because it is

designed to work with a feature descriptor representation (as

opposed to an image pixel representation), and it resulted in high

recognition accuracy.

In the LFDA framework, each image feature vector  is first

divided into “slices” of smaller dimensionality, where slices

correspond to the concatenation of feature descriptor vectors from

each column of image patches. Next, discriminant analysis is

performed separately on each slice by performing the following

three steps: PCA, within class whitening, and between class

discriminant analysis. Finally, PCA is applied to the new feature

vector to remove redundant information among the feature slices

to extract the final feature vector.

To train the LFDA, we use a training set consisting of pairs of

a corresponding sketch and photo of n subjects (which are the

n training classes). This results in a total of 2n training images

with two supports for each subject i:theimagefeature

representation of the sketch 

¼ FðI

Þ and the photo



¼ F ðI

Þ. We combine these feature vectors as a column vector

in training matrices and refer to them as X

¼½



...

 for

the sketch, X

¼½



...

 for the photo, and X ¼

½

...



...

 for the photo and sketch combined.

The first step in LFDA is to separate the image feature vector

into multiple subvectors or slices. Given the M  N array of

patches consisting of SIFT or MLBP descriptors, we create one slice

for each of the N patch columns. With a d-dimensional feature

descriptor, each of the N slices is of dimensionality ðM  dÞ. We call

this a “slice” because it is similar to slicing an image into N pieces.

After separating the feature vectors into slices, t he training

matrices now become X

2 IR

Md;n

, X

2 IR

Md;n

, and X

2 IR

Md;2n

(k ¼ 1...N), which are all mean-centered.

We next reduce the dimensionality of each training slice

matrix X

using the PCA matrix W

2 IR

Md;r

with r eigenvectors.

The purpose is to remove the noisy features which are usually

associated with the trailing eigenvectors with the smallest

eigenvalues. In our experiments, we use the 100 eigenvectors with

the largest eigenvalues (which preserves about 90 percent of the

variance). The discriminant extraction proceeds by generating the

mean projected class vectors

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 641

¼ W



þ X



=2; ð1Þ

which are used to center the sketch and photo training instances of

each class by

¼ W

 Y

;

¼ W

 Y

ð2Þ

To reduce the intrapersonal variation between the sketch and the

photo, a whitening transform is performed. Whitening the within-

class scatter matrix reduces the large feature dimensions that

represent the principal intrapersonal variations, which in this case

correspond to intrapersonal differences between sketches and

photos. To do so, we recombine the training instances into

¼½

. PCA analysis is performed on

such that the

computed PCA projection matrix

2 IR

100;100

retains all data

variance from

. Let 

2 IR

100;100

be a diagonal matrix whose

entries are the eigenvalues of the corresponding PCA eigenvec-

tors

. The whitening transform matrix is V

¼ð



The final step is to compute a projection matrix that maximizes

the intraperson scatter by performing PCA on V

(which is the

whitening transform of the mean class vectors). Using all but one

of the eigenvectors in the PCA projection matrix, the resultant

projection matrix is denoted as U

2 IR

100;99

. This results in the final

projection matrix for slice k:



¼ W

: ð3Þ

With each local feature-based discriminant trained, we match

sketches to photos using the nearest neighbor matching on the

concatenated slice vectors. We first separate the feature represen-

tation of an image into individual slices

 ¼½ð1Þ

ð2Þ

...ðNÞ



; ð4Þ

where ðiÞ2IR

Md

is the ith slice feature vector. We then project

each slice using the LFDA projection matrix 

, yielding the new

vector representation ’ 2 IR

M99

’ ¼





ð1Þ







ð2Þ



...





ðNÞ





: ð5Þ

With the LFDA representation of the sketch ’

and photo ’

, the

normed distance k ’

 ’

k is used to select the gallery photo with

the minimum distance to the probe sketch.

The proposed LFDA algorithm is a simple yet effective method.

From the results in Section 4, we can clearly see that LFDA is able

to significantly improve the recognition performance over the basic

feature-bas ed sketch matching framework. Similar to other

variants of LDA that are designed to handle the small sample

size problem [25], [24], [23], LFDA has several advantages over

trad itional li near discriminant analysis. First, LFDA is more

effective in handling large feature vectors. The idea of segregating

the feature vectors i nto slices allows us to work on more

manageable sized data with respect to the number of training

images. Second, because the subspace dimension is fixed by the

number of training subjects, when dealing with the smaller sized

slices, the LFDA algorithm is able to extract a larger number of

meaningful features. This is because the dimensionality of each

slice subspace is bounded by the same number of subjects as a

subspace on the entire feature representation would be.

4VIEWED SKETCH M ATCHING RESULTS

In order to compare our proposed LFDA framework to published

methods on sketch matching, we evaluated our method using

viewed sketches from the CUHK data set

[5]. This data set

consists of 606 corresponding sketch/photo pairs that were drawn

from three face data sets: 1) 123 pairs from the AR face database

[27], 2) 295 pairs from the XM2VTS database [28], and 3) 188 pairs

from the CUHK student database [6]. Each of these sketch images

was drawn by an artist while looking at the corresponding

photograph of the subject. Two examples of these viewed sketches

are shown in Fig. 1a. For the methods presented in this paper, all

results shown are the recognition rates averaged over five separate

random splits of 306 training subjects and 300 test subjects.

The results of viewed sketch matching experiment are

summarized in Table 1. The first column of the table shows the

baseline methods, which includes the top two performing methods

in the literature [1], [5] (each used 306 training subjects and 300 test

subjects) and Cognitec’s FaceVACS commercial face recognition

engine [26]. FaceVACS has been shown [1] to perform at the same

level as earlier solutions specifically trained for viewed sketch

recognition [2]. In the second column, the matching accuracies

achieved by directly comparing SIFT and MLBP feature vectors 

are listed. The method “SIFT + MLBP” indicates a sum of score

fusion [29] of the match scores from SIFT matching and MLBP

matching. While both the SIFT and MLBP methods offer similar

levels of performance, using LFDA (third column) the accuracy

increases to the point where (one average) less than two sketches

are incorrectly identified out of the 300 sketches in the probe set.

While LFDA was able to reduce the error in half, the use of LDA

actually induced higher error. In the same experiment shown in

Table 1, we applied LDA on the entire feature vector  instead of

breaking it into slices and performing LDA on each slice vector as

is done in LFDA. The accuracy of LDA+SIFT was 95.47 percent,

LDA+MLBP was 91.53 percent, and (SIFT+MLBP)+LDA was

97.07 percent. In each case, LDA actually lowered the accuracy

from the non-LFDA case. The decrease in accuracy observed when

applying the standard LDA is due to the small sample size

problem and the curse of dimensionality [22]. Given our large

feature repres entation (for a 32-pixel patch size, the SIFT

representation contains 19,712 components and the MLBP repre-

sentation contains 36,344 components), the subspace projections

are too tightly fit to the training data. Because the LFDA method is

an ensemble method, it is better suited to avoid this overfitting

problem. Other LDA variants have been shown to handle the small

sample size problem as well, such as random sampling LDA [23]

and regularized LDA [24].

642 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011

TABLE 1

Rank-1 Recognition Rates for Matching Viewed Sketches Using the CUHK Public Data Set

The standard deviation across the five random splits for each method in the middle and right columns is less than 1 percent.

1. The CUHK Face Sketch Database is available for download at: http://

mmlab.ie.cuhk.edu.hk/facesketch.html.

5MATCHING FORENSIC SKETCHES

The available methods for matching forensic sketches to photos are

limited. Uhl and Lobo [30] proposed a now antiquated method of

matching sketches drawn by forensic artists using photometric

standardization and facial features. Yuen and Man [31] matched

lab generated forensic composites to photographs based on point

distribution models.

5.1 Forensic Sketch Database

In our study, we used a data set consisting of 159 forensic sketches,

each with a corresponding photograph of the subject who was later

identified by the law enforcement agency. All of these sketches

were drawn by forensic sketch artists working with witnesses who

provided verbal descriptions after crimes were committed by an

unknown culprit. The corresponding photographs (mug shots) are

the result of the subject later being identified. The forensic sketch

data set used here comes from four different sources:

1. 73 images from the forensic sketch artist Lois Gibson [32],

2. 43 images from the forensic sketch artist Karen Taylor [33],

3. 39 forensic sketches provided by the Michigan State Police

Department, and

4. 4 forensic sketches provided by the Pinellas County

Sheriff’s Office.

In addition to these 159 corresponding forensic sketch and photo

pairs, we also made use of a data set of 10,000 mug shot images

provided by the Michigan State Police to populate the gallery. Thus,

the matching experiments closely replicate real-world scenarios

where a law enforcement agency would query a large gallery of

mug shot images with a forensic sketch. Examples of the forensic

sketches used in our experiments are shown in Figs. 1, 6, and 7.

Certain sketch images in our collection of forensic sketches are

of poor quality in terms of not capturing all of the facial features of

the suspect. For most of these sketches, it is unlikely that they can

be successfully matched automatically to the corresponding photos

because they barely resemble the subject. For this reason, we

separated our forensic sketches into two categories: good quality

and poor quality. This separation was performed subjectively by

looking at the corresponding pairs (sketch and photo) and labeling

them as good if the sketch possessed a reasonable resemblance of

the subject in the photo and labeling them as poor if the sketch was

grossly inaccurate. We believe that this leads to a more accurate

portrayal of how accurately forensic sketches can be matched.

Fig. 1 shows the difference between good quality and poor quality

sketches.

5.2 Large-Scale Forensic Sketch Matching

Matching forensic sketches to large mug shot galleries is different

in several respects from traditional face identification scenarios.

When presenting face recognition results in normal recognition

scenarios, we are generally concerned with exactly identifying the

subject in question in a fully automated manner. For example,

when preventing multiple passports from being issued to the

same person, human interaction should be limited to only

ambiguous cases. This is due to the large volume of requests

such a system must process. The same is true for matching

arrested criminals against existing mug shot databases to confirm

their identity. However, when matching forensic sketches, it is not

critical for the top retrieval result to be the correct subject as long

as it is in the top R retrieved results, say R ¼ 50. This is because

the culprit being depicted in a forensic sketch typically has

committed a heinous crime (e.g., murder, rape, and armed

robbery) that will receive a large amount of attention from

investigators. Instead of accepting or dismissing only the top

retrieved photo, law enforcement officers will consider the top R

retrieval results as potential suspects. Generally, many of the

returned subjects can be immediately eliminated as suspects for

various reasons, such as if they are currently incarcerated or

deceased. The remaining candidates can each then be investigated

for their culpability of committing the crime. This scenario is also

true of crimes in which a photograph of a suspect is available.

Investigators will consider the top R retrieval results instead of

only the highest match. Based on the practice followed in

forensics, we would like R to be around 50; that is, we are

mainly concerned with whether or not the true subject is within

the top 50 retrieved images.

In order to improve the accuracy of matching forensic sketches,

we utilize ancillary or demographic information provided by the

witness, to be used as a soft biometric [34]. For example, suppose the

witness reports that the race of the culprit is Caucasian; then we can

eliminate all non-Caucasian members of the gallery to not only

speed up the matching bu t also t o improve the matching

performance. The same is true for gender: If the suspect is reported

to be a female, then we disregard any male subject in the gallery. To

use this approach, we manually labeled all of the 10,159 mug shot

images and all of the forensic sketch/photo pairs in our database

with race and gender. For gender, we considered one of three

possible categories: male, female, and (in rare cases) unknown. For

race, we considered one of three categories: Caucasian, African-

American, and “other.” The “other” includes individuals who are of

Hispanic, Asian, or multiple races. Table 2 lists the percentage of

members from each race and gender category in the forensic

sketches and the mug shot gallery used in our experiments.

We lack additional ancillary information (e.g., age, height, scars,

marks, and tattoos) that could potentially be used to further

improve the matching accuracy.

6FORENSIC SKETCH MATCHING RESULTS

Forensic sketch recognition performance using the 159 forensic

sketch images (probe set) and 10,159 mug shot images (gallery)

will now be presented. In these matching experiments, we use the

local feature-based discriminant analysis framework presented in

Section 3. Our matching uses the sum of score fusion of MLBP and

SIFT LFDA, as this was the highest performing method for

matching viewed sketches (Table 1).

The performance of matching sketches classified as good and

poor can be found in Fig. 3. There is a substantial difference in the

matching performance of good sketches and poor sketches. Despite

the fact that poor sketches are extremely difficult to match, the

CMC plots in Fig. 3 show that the proposed method performs

roughly the same on the poor sketches than a state-of-the-art

commercial matcher (FaceVACS) does on the good sketches.

Fig. 4 and Table 3 show the recognition performance when race

and gender information is used to filter the gallery. By utilizing

this ancillary information, we c an significantly increase the

performance of forensic sketch recognition. We noticed a larger

performance gain by using race information than the gender

information. This is likely due to the more uniform distribution of

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 643

TABLE 2

Demographics of the 159 Forensic Sketch Images

and the 10,159 Mug Shot Gallery Images

HTML Viewer

Frequently Asked Questions (8)

Q1. What contributions have the authors mentioned in the paper "Matching forensic sketches to mug shot photos" ?

The problem of matching a forensic sketch to a gallery of mug shot images is addressed in this paper. Forensic sketches differ from viewed sketches in that they are drawn by a police sketch artist using the description of the subject provided by an eyewitness. To identify forensic sketches, the authors present a framework called local feature-based discriminant analysis ( LFDA ). In LFDA, the authors individually represent both sketches and photos using SIFT feature descriptors and multiscale local binary patterns ( MLBP ). The authors were able to further improve the matching performance using race and gender information to reduce the target gallery size.

Q2. What is the proposed method for combining large feature size and small sample size?

In order to handle the combination of a large feature size and small sample size, an ensemble of linear discriminant classifiers called LFDA is proposed.

Q3. What other methods have been proposed to handle the SSS problem?

Other discriminant analysis methods have been proposed to handle the SSS problem, such as random sampling LDA [23], regularized LDA [24], and direct LDA [25].

Q4. Why are image descriptors not sufficiently verbose to describe a face?

Because most image descriptors are not sufficientlyverbose to fully describe a face image, the descriptors are computedover a set of uniformly distributed subregions of the face.

Q5. What is the main reason why the authors have not found a large number of forensic sketches?

the authors believe that with a larger number of forensic sketches, the authors could more properly train their discriminant and further improve the matching performance.

Q6. Why is the culprit being depicted in a forensic sketch considered a suspect?

This is because the culprit being depicted in a forensic sketch typically has committed a heinous crime (e.g., murder, rape, and armed robbery) that will receive a large amount of attention from investigators.

Q7. What is the approach to extract discriminant features?

A straightforward approach would be to apply classical subspace analysis (such as LDA) directly on , and to extract discriminant features for classification.

Q8. What are the key difficulties in matching forensic sketches?

The authors highlight two key difficulties in matching forensic sketches: 1) matching across image modalities and 2) performing face recognition despite possibly inaccurate depictions of the face.

Matching Forensic Sketches to Mug Shot Photos

Summary (3 min read)

Introduction

Participants

Measurements

IT and CT familiarisation sessions

IT and CT test sessions

Statistical analysis

Results

Discussion

Medication

Training intensity

Limitations

Conclusion

Figures (8)

Citations

Cites methods from "Matching Forensic Sketches to Mug S..."

Additional excerpts

Cites background or methods from "Matching Forensic Sketches to Mug S..."

References

"Matching Forensic Sketches to Mug S..." refers methods in this paper

"Matching Forensic Sketches to Mug S..." refers background in this paper

"Matching Forensic Sketches to Mug S..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (8)

Q1. What contributions have the authors mentioned in the paper "Matching forensic sketches to mug shot photos" ?

Q2. What is the proposed method for combining large feature size and small sample size?

Q3. What other methods have been proposed to handle the SSS problem?

Q4. Why are image descriptors not sufficiently verbose to describe a face?

Q5. What is the main reason why the authors have not found a large number of forensic sketches?

Q6. Why is the culprit being depicted in a forensic sketch considered a suspect?

Q7. What is the approach to extract discriminant features?

Q8. What are the key difficulties in matching forensic sketches?