scispace - formally typeset
Open AccessProceedings ArticleDOI

Coupled information-theoretic encoding for face photo-sketch recognition

Reads0
Chats0
TLDR
A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches by reducing the modality gap at the feature extraction stage.
Abstract
Automatic face photo-sketch recognition has important applications for law enforcement. Recent research has focused on transforming photos and sketches into the same modality for matching or developing advanced classification algorithms to reduce the modality gap between features extracted from photos and sketches. In this paper, we propose a new inter-modality face recognition approach by reducing the modality gap at the feature extraction stage. A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled information-theoretic projection tree, which is extended to the randomized forest to further boost the performance. We create the largest face sketch database including sketches of 1, 194 people from the FERET database. Experiments on this large scale dataset show that our approach significantly outperforms the state-of-the-art methods.

read more

Content maybe subject to copyright    Report

Coupled Information-Theoretic Encoding for Face Photo-Sketch Recognition
Wei Zhang
1
1
Department of Information Engineering
The Chinese University of Hong Kong
zw009@ie.cuhk.edu.hk
Xiaogang Wang
2,3
2
Department of Electronic Engineering
The Chinese University of Hong Kong
xgwang@ee.cuhk.edu.hk
Xiaoou Tang
1,3
3
Shenzhen Institutes of Advanced Technology
Chinese Academy of Sciences, China
xtang@ie.cuhk.edu.hk
Abstract
Automatic face photo-sketch recognition has important
applications for law enforcement. Recent research has fo-
cused on transforming photos and sketches into the same
modality for matching or developing advanced classifica-
tion algorithms to reduce the modality gap between features
extracted from photos and sketches. In this paper, we pro-
pose a new inter-modality face recognition approach by re-
ducing the modality gap at the feature extraction stage. A
new face descriptor based on coupled information-theoretic
encoding is used to capture discriminative local face struc-
tures and to effectively match photos and sketches. Guided
by maximizing the mutual information between photos and
sketches in the quantized feature spaces, the coupled en-
coding is achieved by the proposed coupled information-
theoretic projection tree, which is extended to the random-
ized forest to further boost the performance. We create the
largest face sketch database including sketches of 1, 194
people from the FERET database. Experiments on this large
scale dataset show that our approach significantly outper-
forms the state-of-the-art methods.
1. Introduction
Face photo-sketch recognition is to match a face sketch
drawn by an artist to one of many face photos in the
database. In law enforcement, it is desired to automati-
cally search photos from police mug-shot databases using
a sketch drawing when the photo of a suspect is not avail-
able. This application leads to a number of studies on this
topic [26, 27, 28, 31, 9, 14, 6]. Photo-sketch generation and
recognition are also useful in digital entertainment industry.
The major challenge of face photo-sketch recognition is
to match images in different modalities. Sketches are a con-
cise representation of human faces, often containing shape
exaggeration and having different textures than photos. It
is infeasible to directly apply face photo recognition algo-
rithms. Recently, great progress has been made in two di-
rections. The first family of approaches [27, 18, 31] fo-
Sketch vector space
Photo vector space
CITP tree
Photos
Sketches
Figure 1. A CITP tree with three levels for illustration purpose.
The local structures of photos and sketches are sampled and cou-
pled encoded via the CITP tree. Each leaf node of the CITP tree
corresponds to a cell in the photo vector space and in the sketch
vector space. The sampled vectors in the same cell are assigned the
same code, so that different local structures have different codes
and the same structures in different modalities have the same code.
cused on the preprocessing stage and synthesized a pseudo-
photo from the query sketch or pseudo-sketches from the
gallery photos to transform inter-modality face recognition
into intra-modality face recognition. Face photo/sketch syn-
thesis is actually a harder problem than recognition. Imper-
fect synthesis results significantly degrade the recognition
performance. The second family of approaches [17, 15, 14]
focused on the classification stage and tried to design ad-
vanced classifiers to reduce the modality gap between fea-
tures extracted from photos and sketches. If the inter-
modality difference between the extracted features is large,
the discriminative power of the classifiers will be reduced.
In this paper, we propose a new approach of reducing
the modality gap at the feature extraction stage. A new face
descriptor is designed by the coupled information-theoretic
encoding, which quantizes the local structures of face pho-
tos and sketches into discrete codes. In order to effectively
match photos and sketches, it requires that the extracted
513

codes are uniformly distributed across different subjects,
which leads to high discriminative power, and that the codes
of the same subject’s photo and sketch are highly correlated,
which leads to small inter-modality gap. These require-
ments can be well captured under the criterion of maximiz-
ing the mutual information between photos and sketches
in the quantized feature spaces. The coupled encoding is
achieved by the proposed randomized coupled information-
theoretic projection forest, which is learned with the maxi-
mum mutual information (MMI) criterion.
Another contribution of this work is to release CUHK
Face Sketch FERET Database (CUFSF)
1
, a large scale face
sketch database. It includes the sketches of 1, 194 people
from the FERET database [22]. Wang and Tang [31] pub-
lished the CUFS database with sketches of
606
people. The
sketches in the CUFS database had less shape distortion.
The new database is not only larger in size but also more
challenging because its sketches have more shape exagger-
ation and thus are closer to practical applications. Exper-
iments on this large scale dataset show that our approach
significantly outperforms the state-of-the-art methods.
1.1. Related work
To synthesize pseudo photos (sketches) from sketches
(photos), Tang and Wang [27] proposed to apply the eigen-
transform globally. Another global approach proposed by
Gao et al. [9] was based on the embedded hidden Markov
model and the selective ensemble strategy. Liu et al. [18]
proposed patch-based face sketch reconstruction using lo-
cal linear embedding based mapping. The sketch patches
were synthesized independently ignoring the spatial rela-
tionship. Wang and Tang [31] used a multiscale Markov
random field (MRF) to model the dependency of neighbor-
ing sketch patches. Photos and sketches were matched once
they were transformed to the same modality.
In order to reduce the inter-modality gap at the classifi-
cation stage, Lin and Tang [17] mapped features from two
modalities into a common discriminative space. Lei and
Li [15] proposed coupled spectral regression (CSR). CSR
was computationally efficient in learning projections to map
data from two modalities into a common subspace. Klare et
al. [14] proposed local feature-based discriminant analysis
(LFDA). They used multiple projections to extract a dis-
criminative representation from partitioned vectors of SIFT
and LBP features.
There is an extensive literature on descriptor-based face
recognition [1, 32, 36], due to its advantages of computa-
tional efficiency and relative robustness to illumination and
pose variations. They are relevant to our coupled encoding.
However, those handcrafted features, such as LBP [1] and
SIFT [19], were not designed for inter-modality face recog-
nition. The extracted features from photos and sketches
1
Available at http://mmlab.ie.cuhk.edu.hk/cufsf/.
may have large inter-modality variations.
Although information-theoretic concepts were explored
in building decision trees and decision forests for vector
quantization [2, 21, 23] in the application of object recogni-
tion, these algorithms were applied in a single space and did
not address the problem of inter-modality matching. With
the supervision of object labels, their tree construction pro-
cesses were much more straightforward than ours.
2. Information-Theoretic Projection Tree
Vector quantization was widely used to create discrete
image representations, such as textons [20] and visual
words [24], for object recognition and face recognition. Im-
age pixels [5, 23], filter-bank responses [20] or invariant de-
scriptors [24, 33] were computed either sparsely or densely
on a training set, and clustered to produce a codebook by
algorithms such as k-means, mean shift [12], random pro-
jection tree [5, 8, 33] and random forest [21, 23]. Then with
the codebook any image could be turned into an encoded
representation.
However, to the best of our knowledge, it has not been
clear how to apply vector quantization to cross-modality ob-
ject matching yet. In this section, we present a new cou-
pled information-theoretic projection (CITP) tree for cou-
pled quantization across modalities. We further extend the
CITP tree to the randomized CITP tree and forest. For clar-
ity of exposition, we present the method in the photo-sketch
recognition scenario.
2.1. Projection Tree
A projection tree [8] partitions a feature space R
D
into
cells. It is built in a recursive manner, splitting the data
along one projection direction at a time. The succession
of splits leads to a binary tree, whose leaves are individual
cells in R
D
. With a built projection tree, a code is assigned
to each test sample x, according to the cell (i.e. leaf node)
it belongs to. The sample is simply propagated down the
tree, starting from the root node and branching left or right
until a leaf node is reached. Each node is associated with a
learned binary function f(x) = sign(w
T
x τ ). The node
propagates x to its left child if f (x) = 1 and to its right
child if f(x) = 1.
2.2. Mutual Information Maximization
Since quantization needs to be done in both the photo
space and the sketch space, we extend a projection tree to
a coupled projection tree. In a coupled projection tree, vec-
tors sampled from photos and sketches share the same tree
structure, but are input to different binary functions f
p
(x
p
)
and f
s
(x
s
) at each node. A vector x
p
sampled from the
neighborhood of a photo pixel is quantized with f
p
and a
vector x
s
sampled from the neighborhood of a sketch pixel
is quantized with f
s
. Then the sampled photo vectors and
514

sketch vectors are mapped to the same codebook, but their
coding functions represented by the tree are different, de-
noted by C
p
and C
s
, respectively.
To train a coupled projection tree, a set of vector pairs
X = {(x
p
i
, x
s
i
), i = 1, ..., N} is prepared, where x
p
i
, x
s
i
R
D
. In this paper, x
p
i
and x
s
i
are the normalized vectors
of sampled gradients around the same location
2
in a photo
and a sketch of the same subject, respectively. Denote that
X
p
= [x
p
1
, ..., x
p
N
], X
s
= [x
s
1
, ..., x
s
N
]. Since x
p
i
and x
s
i
are
sampled from the same subject at the same location, it is
expected that they are quantized into the same code by the
coupled projection tree. In the meanwhile, in order to in-
crease the discriminative power, it is expected that the codes
of X
p
and X
s
are uniformly distributed across different
subjects. To achieve these goals, our coupled information-
theoretic projection (CITP) trees are learned using the max-
imum mutual information (MMI) criterion (see Fig. 2).
Mutual information, which is a symmetric measure to
quantify the statistical information shared between two ran-
dom variables [7], provides a sound indication of the match-
ing quality between coded photo vectors and coded sketch
vectors. Formally, the objective function is as follows.
3
I(C
p
(X
p
); C
s
(X
s
)) = H(C
p
(X
p
)) H(C
s
(X
p
)|C
p
(X
s
)).
(1)
To increase the discriminative power, the quantization
should maximize the entropy H(C
p
(X
p
)) so that the sam-
ples are nearly uniformly distributed over the codebook. To
reduce the inter-modality gap, the quantization should min-
imize the conditional entropy H(C
p
(X
p
)|C
s
(X
s
)).
2.3. Tree Construction with MMI
Similar to random projection tree [8], the CITP tree is
also built top down recursively. However, it is different in
that the CITP tree is not a balanced binary tree, i.e. the leaf
nodes are at different levels. So the tree building process
consists of searching for both the best tree structure and the
optimal parameters at each node.
Tree structure searching. We adopt a greedy algorithm
to build the tree structure. At each iteration, we search the
node whose splitting can maximize the mutual information
between the codes of sampled photo and sketch vectors.
The mutual information, given in Eqn. (1), can be eas-
ily approximated in a nonparametric way. All the sampled
photo and sketch vectors in the training set are quantized
into codes with the current tree after splitting the candidate
node, and the joint distribution of photo and sketch codes is
2
We sample the gradients (i.e. the first-order derivatives in the hori-
zontal and vertical directions) I
u
and I
v
for an image I. Please refer to
Section 3 for details.
3
The mutual information is originally defined between two random
variables C
p
(x
p
i
) and C
s
(x
s
i
). We use the empirical mutual information
estimated on the training set throughout this paper.
I = 1.51
If we split
Node B,
I = 0.08
If we split
Node A,
A
B
C
If we split
Node C,
I = 1.41
Freq.
Freq.
0.5
0.25
0
Freq.
Freq.
0.5
0.25
0
Freq.
Freq.
0.5
0.25
0
Sketch
codes
Photo codesPhoto codes
Sketch
codes
Photo codes
Sketch
codes
-- Non-leaf node
-- Leaf node
Figure 2. An illustration of tree construction with MMI. In each
step, all current leaf nodes are tested and the one with the maxi-
mum mutual information is selected to split. For a leaf node, we
try to split it and obtain a tree to encode photo vectors and sketch
vectors. The selected leaf node should satisfy: (1) the codes are
uniformly distributed; (2) the codes of photo vectors and corre-
sponding sketch vectors are highly correlated. These requirements
can be well captured under the MMI criterion. In this example,
if we split node A, requirement (2) will not be satisfied, and if
we split node C, requirement (1) will not be satisfied. The cor-
responding mutual information I of both are relatively small. So
node B with the maximum mutual information is selected. The
histograms and joint histograms of photo and sketch codes are vi-
sualized. In joint histograms, the colors represent the joint proba-
bility densities.
computed to estimate the mutual information. A toy exam-
ple is shown in Fig. 2.
Node parameter searching. It is critical to search for
optimal parameters of binary functions f
p
(x
p
) and f
s
(x
s
)
to determine how to split the node. Formally, we aim at
finding projection vectors w
p
, w
s
and thresholds τ
p
, τ
s
for
node k
4
, such that
y
p
i
= w
T
p
x
p
i
τ
p
, y
p
i
= sign(y
p
i
),
y
s
i
= w
T
s
x
s
i
τ
s
, y
s
i
= sign(y
s
i
).
(2)
Then a binary value y
p
i
(or y
s
i
) is assigned to each vector x
p
i
(or x
s
i
), to split the training data into two subsets and prop-
agate them to the two child nodes. The node propagates a
training vector pair (x
p
i
, x
s
i
) to its children only if the binary
values ˆy
p
i
and ˆy
s
i
are the same. Otherwise, the vector pair is
treated as an outlier and discarded.
Suppose that the input of a node k is a set of vec-
tor pairs X
k
= {(x
p
k
i
, x
s
k
i
), 1 i N
k
}. Denote
that X
p
k
= [x
p
k
1
, ..., x
p
k
N
k
], X
s
k
= [x
s
k
1
, ..., x
s
k
N
k
], Y
p
k
=
[y
p
k
1
, ..., y
p
k
N
k
], Y
s
k
= [y
s
k
1
, ..., y
s
k
N
k
],
Y
p
k
= [y
p
k
1
, ..., y
p
k
N
k
]
4
We omit index k of the parameters, for conciseness.
515

and
Y
s
k
= [y
s
k
1
, ..., y
s
k
N
k
]. The node is split according to the
MMI criterion, i.e. maximizing
I(
Y
p
k
;
Y
s
k
) = H(
Y
p
k
) + H(
Y
s
k
) H(
Y
p
k
,
Y
s
k
). (3)
Instead of solving the above maximization problem di-
rectly, an approximate objective I(Y
p
k
; Y
s
k
) is maximized
first. Through maximizing I(Y
p
k
; Y
s
k
), w
p
and w
s
are es-
timated without considering τ
p
and τ
s
. Assume that y
p
k
i
and y
s
k
i
are jointly Gaussian distributed. The entropy of a
jointly Gaussian random vector g is
1
2
ln[det(Σ
g
)] + const
[7], where Σ
g
is the covariance matrix of g. Following this,
the mutual information can be rewritten in a simple form
I(Y
p
k
; Y
s
k
) =
1
2
ln
det(Σ
p
k
) det(Σ
s
k
)
det(Σ
k
)
+ const, (4)
where Σ
p
k
, Σ
s
k
and Σ
k
are the covariance of Y
p
k
, Y
s
k
and
[(Y
p
k
)
T
, (Y
s
k
)
T
]
T
, respectively. According to Eqn (2),
Σ
p
k
= w
T
p
C
p
k
w
p
, Σ
s
k
= w
T
s
C
s
k
w
s
,
Σ
k
=
w
T
p
C
p
k
w
p
w
T
p
C
p,s
k
w
s
w
T
p
C
p,s
k
w
s
T
w
T
s
C
s
k
w
s
,
(5)
where C
p
k
and C
s
k
are the covariance matrix of X
p
k
, X
s
k
,
respectively, and C
p,s
k
is the covariance matrix between X
p
k
and X
s
k
.
Substituting Eqn. (5) into Eqn. (4), we find the equiv-
alence between maximizing (4) and the Canonical Correla-
tion Analysis (CCA) model
max
w
p
,w
s
w
T
p
C
p,s
k
w
s
w
T
p
C
p
k
w
p
w
T
s
C
s
k
w
s
. (6)
So the optimal w
p
and w
s
are obtained by solving CCA
(details are given later). CCA is found with good trade-off
between the scalability and performance, when the input set
is usually of a large size (about 2.5 million sample pairs in
our experiments).
To estimate the thresholds τ
p
and τ
s
, we use brute-force
search to maximize (3) in the region (τ
p
, τ
s
) [ˆµ
p
ˆσ
p
, ˆµ
p
+ ˆσ
p
]×[ˆµ
s
ˆσ
s
, ˆµ
s
+ ˆσ
s
], where ˆµ
p
= median
i
(y
p
i
)
and ˆσ
p
= median
i
(|y
p
i
ˆµ
p
|) are the median and median
of absolute deviation of y
p
i
, respectively, and ˆµ
s
and ˆσ
s
are
the median and median of absolute deviation of y
s
i
, respec-
tively.
Canonical Correlation Analysis. CCA was introduced
by Hotelling for correlating linear relationships between
two sets of vectors [10]. It was used in some computer vi-
sion applications [34, 13, 25]. However, it has not been ex-
plored as a component of a vector quantization algorithm.
Blaschko and Lampert [4] proposed an algorithm for spec-
tral clustering with paired data based on kernel CCA. How-
ever, this method is not appropriate for quantization, as the
Algorithm 1 Algorithm of building a CITP Tree
1: Input: a set of vector pairs X = {(x
p
i
, x
s
i
), i =
1, ..., N}, where x
p
i
, x
s
i
R
D
, and the expected num-
ber of codes (i.e. leaf nodes) n
L
.
2: Create an empty set S, and add the root node to S.
3: repeat
4: for each node k in S and its associated vector set X
k
do
5: Compute the possible node splitting:
(i) Generate projection vectors w
p
, w
s
and thresh-
olds τ
p
, τ
s
with X
k
;
(ii) For its left child L and right child R,
X
L
{(x
p
i
, x
s
i
)|w
T
p
x
p
i
τ
p
, w
T
s
x
s
i
τ
s
},
X
R
{(x
p
i
, x
s
i
)|w
T
p
x
p
i
> τ
p
, w
T
s
x
s
i
> τ
s
},
(X
L
X
k
, X
R
X
k
);
6: end for
7: Select the best node splitting with the maximum mu-
tual information in Eqn. (1);
8: Split the node, remove the node from S and add its
child nodes to S;
9: until the number of leaf nodes is n
L
.
10: Output: the CITP tree with projection vectors and
thresholds at each node.
kernel trick causes high computational and memory cost
due to the very large size of the training set, and the near-
est centroid assignment may be unstable (there is no hard
constraint to require a pair of vectors in the same cluster).
To solve CCA in (6), let
S
m
=
0 C
p,s
k
(C
p,s
k
)
T
0
, S
n
=
C
p
k
C
p,s
k
(C
p,s
k
)
T
C
s
k
,
and then w = [w
T
p
, w
T
s
]
T
can be solved as the eigenvec-
tor associated with the largest eigenvalue of the generalized
eigenvalue problem S
m
w = λ(S
n
+ εI)w, where ε is a
small positive number for regularization.
The whole algorithm for building a CITP tree is summa-
rized as Algorithm 1.
2.4. Randomized CITP Forest
Randomization is an effective way to create an ensem-
ble of trees to boost the performance of tree structured al-
gorithms [21, 23, 33]. The randomized counterpart of the
CITP tree includes two modifications on node splitting as
follows.
Randomization in sub-vector choice. At each node, we
randomly sample α percent (empirically α = 80) of the el-
ement indices of the sampled vectors, i.e. use a sub-vector
of each sampled vector, to learn the projections. To im-
prove the strength of generated trees, the random choice is
repeated for 10 times empirically at each node, and the one
516

PCA+LDA
classifier
ï
ï
þ
ï
ï
ý
ü
ï
ï
î
ï
ï
í
ì
p
wh
p
h
p
h
p
w
pp
p
w
pp
,2,1,
,22,21,2
,12,11,1
x
x
x
x
x
x
xxx
L
LLLL
L
L
ï
ï
þ
ï
ï
ý
ü
ï
ï
î
ï
ï
í
ì
s
wh
s
h
s
h
s
w
ss
s
w
ss
,2,1,
,22,21,2
,12,11,1
xxx
xxx
xxx
L
LLLL
L
L
Photo
Sketch
Gradient extraction,
Sampling and normalization
Gradient extraction,
Sampling and normalization
Coupled encoding
via CITP Tree
Coupled encoding
via CITP Tree
Coded Image
Coded Image
Sampling pattern
Concatenated
histogram
Concatenated
histogram
r = 2
Figure 3. The pipeline of extracting CITE descriptors.
with the maximum mutual information in Eqn. (3) is se-
lected. The randomization at each node results in random-
ized trees with different tree structures and utilizing differ-
ent information from the training data. Therefore, the ran-
domized trees are more complementary.
Randomization in parameter selection. The eigenvec-
tors associated with the first d largest eigenvalues in the
CCA model are first selected. Then a set of
n
vectors are
generated by randomly linearly combining the d selected
eigenvectors.
5
According to the MMI criterion in Eqn. (3),
the best one is selected from the set of n random vectors
and used as the projection vectors w
p
and w
s
. In our exper-
iments, we choose d = 3 and n = 20.
The creation of a random ensemble of diverse trees can
significantly improve the performance over a single tree,
which is verified by our experiments.
3. Coupled Encoding Based Descriptor
In this section, we introduce our coupled information-
theoretic encoding (CITE) based descriptor. With a CITP
tree, a photo or a sketch can be converted into an image
of discrete codes. The CITE descriptor is a collection of
region-based histograms of the “code” image. The pipeline
of photo-sketch recognition using a single CITP tree is
shown in Fig. 3. The details are given as follows.
Preprocessing. The same geometric rectification and
photometric rectification are applied to all the photos and
sketches. With affine transform, the images are cropped
to 80 × 64, and the two eye centers and the mouth cen-
ter of all the face images are at fixed positions. Then
both the photo and sketch images are processed with a
Difference-of-Gaussians (DoG) filter [11] to remove both
high-frequency and low-frequency illumination variations.
Empirical investigations show that (σ
1
, σ
2
) = (1, 2) is the
best in our experiments.
5
The eigenvectors are orthogonalized with Gram-Schmidt orthogonal-
ization and normalized with L
2
-norm.
Sampling and normalization. At each pixel, its neigh-
boring pixels are sampled in a certain pattern to form a vec-
tor. A sampling pattern is a combination of one or several
rings and the pixel itself. On a ring with radius r , 8r pixels
are sampled evenly. Fig. 3 shows the sampling pattern of
r = 2. We denote a CITE descriptor by a sampling pattern
with rings of radius r
1
, ..., r
s
as CITE
r
1
,...,r
s
.
We find that sampling the gradients
I
u
and
I
v
results in
a better descriptor than sampling the intensities [5]. The
gradient domain explicitly reflects relationships between
neighboring pixels. Therefore, it has more discriminat-
ing power to discover key facial features than the inten-
sity domain. In addition, the similarity between photos and
sketches are easier to compare in the gradient domain than
intensity domain [35].
After the sampling, each sampled vector is normalized
such that its L
2
-norm is unit.
Coupled Information-Theoretic Encoding. In the en-
coding step, the sampled vectors are turned into discrete
codes using the proposed CITP tree (Section 2). Then
each pixel has a code and the input image is converted
into a “code” image. The vectors sampled from photos and
sketches for training CITP tree are paired according to the
facial landmarks detected by a state-of-the-art alignment al-
gorithm [16].
6
Specifically, a pixel in the sketch image finds
its counterpart in the photo image using a simple warping
based on the landmarks. Note that the pairing is performed
after sampling so that local structures are not deformed by
the warping.
CITE Descriptor. The image is divided into 7 × 5 lo-
cal regions with equal size, and a histogram of the codes
is computed in each region. Then the local histograms are
concatenated to form a histogram representation of the im-
age, i.e. the CITE descriptor.
6
According to our observation, a general face alignment algorithm
trained on commonly used face photo data sets is actually also effective
for sketch alignment. We did not separately train a face alignment algo-
rithm for sketches.
517

Figures
Citations
More filters
Proceedings ArticleDOI

DualGAN: Unsupervised Dual Learning for Image-to-Image Translation

TL;DR: DualGAN as mentioned in this paper learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task, which enables image translators to be trained from two sets of unlabeled images from two domains.
Proceedings ArticleDOI

Face Alignment at 3000 FPS via Regressing Local Binary Features

TL;DR: This paper presents a highly efficient, very accurate regression approach for face alignment that achieves the state-of-the-art results when tested on the current most challenging benchmarks.
Journal ArticleDOI

Multi-View Discriminant Analysis

TL;DR: This work proposes a Multi-view Discriminant Analysis (MvDA) approach, which seeks for a single discriminant common space for multiple views in a non-pairwise manner by jointly learning multiple view-specific linear transforms.
Posted Content

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

TL;DR: This work presents a generic image-to-image translation framework, pixel2style2pixel (pSp), based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended latent space.
Journal ArticleDOI

A Comprehensive Survey to Face Hallucination

TL;DR: This paper comprehensively surveys the development of face hallucination, including both face super-resolution and face sketch-photo synthesis techniques, and presents a comparative analysis of representative methods and promising future directions.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Book ChapterDOI

Relations Between Two Sets of Variates

TL;DR: The concept of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions as discussed by the authors, where the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions mentioned in the paper "Coupled information-theoretic encoding for face photo-sketch recognition" ?

In this paper, the authors propose a new inter-modality face recognition approach by reducing the modality gap at the feature extraction stage. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled informationtheoretic projection tree, which is extended to the randomized forest to further boost the performance. The authors create the largest face sketch database including sketches of 1, 194 people from the FERET database. Experiments on this large scale dataset show that their approach significantly outperforms the state-of-the-art methods. 

In the future work, the authors would like to further investigate the system with more cross-modality recognition problems. 

Pseudo photos are synthesized from query sketches, and random sampling LDA (RS-LDA) [30] is used to match them to gallery photos. 

At each node, the authors randomly sample α percent (empirically α = 80) of the element indices of the sampled vectors, i.e. use a sub-vector of each sampled vector, to learn the projections. 

To train a coupled projection tree, a set of vector pairs X = {(xpi ,x s i ), i = 1, ..., N} is prepared, where x p i ,x s i ∈ RD. 

Due to small performance gain and high computational cost of a large leaf node number, the authors choose 256 leaf nodes as their default setting. 

The first family of approaches [27, 18, 31] fo-cused on the preprocessing stage and synthesized a pseudophoto from the query sketch or pseudo-sketches from the gallery photos to transform inter-modality face recognition into intra-modality face recognition. 

kernel trick causes high computational and memory cost due to the very large size of the training set, and the nearest centroid assignment may be unstable (there is no hard constraint to require a pair of vectors in the same cluster). 

In order to reduce the inter-modality gap at the classification stage, Lin and Tang [17] mapped features from two modalities into a common discriminative space. 

The vote of a pixel to the histogram is weighted by its gradient magnitude and a Gaussian window with parameter σ centered at the center of the region. 

Another global approach proposed by Gao et al. [9] was based on the embedded hidden Markov model and the selective ensemble strategy. 

The ROC curves are shown in Fig. 5. Even 32- dimensional CITE2 (please refer to Section 3 for this notation) significantly outperforms the 59-dimensional LBP and 128-dimensional SIFT. 

The authors investigate the effect of various free parameters on the performance of the system, including the number of leaf nodes, the projected dimension by PCA+LDA, the size of randomized forest and the effect of using different sampling patterns.