scispace - formally typeset
Open AccessProceedings ArticleDOI

Face alignment by coarse-to-fine shape searching

Reads0
Chats0
TLDR
A novel face alignment framework based on coarse-to-fine shape searching that prevents the final solution from being trapped in local optima due to poor initialisation, and improves the robustness in coping with large pose variations.
Abstract
We present a novel face alignment framework based on coarse-to-fine shape searching. Unlike the conventional cascaded regression approaches that start with an initial shape and refine the shape in a cascaded manner, our approach begins with a coarse search over a shape space that contains diverse shapes, and employs the coarse solution to constrain subsequent finer search of shapes. The unique stage-by-stage progressive and adaptive search i) prevents the final solution from being trapped in local optima due to poor initialisation, a common problem encountered by cascaded regression approaches; and ii) improves the robustness in coping with large pose variations. The framework demonstrates real-time performance and state-of-the-art results on various benchmarks including the challenging 300-W dataset.

read more

Content maybe subject to copyright    Report

Face Alignment by Coarse-to-Fine Shape Searching
Shizhan Zhu
1,2
Cheng Li
2
Chen Change Loy
1,3
Xiaoou Tang
1,3
1
Department of Information Engineering, The Chinese University of Hong Kong
2
SenseTime Group
3
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
zs014@ie.cuhk.edu.hk, chengli@sensetime.com, ccloy@ie.cuhk.edu.hk, xtang@ie.cuhk.edu.hk
Abstract
We present a novel face alignment framework based on
coarse-to-fine shape searching. Unlike the conventional
cascaded regression approaches that start with an initial
shape and refine the shape in a cascaded manner, our ap-
proach begins with a coarse search over a shape space that
contains diverse shapes, and employs the coarse solution
to constrain subsequent finer search of shapes. The unique
stage-by-stage progressive and adaptive search i) prevents
the final solution from being trapped in local optima due
to poor initialisation, a common problem encountered by
cascaded regression approaches; and ii) improves the ro-
bustness in coping with large pose variations. The frame-
work demonstrates real-time performance and state-of-the-
art results on various benchmarks including the challenging
300-W dataset.
1. Introduction
Face alignment aims at locating facial key points au-
tomatically. It is essential to many facial analysis tasks,
e.g. face verification and recognition [11], expression recog-
nition [2], or facial attributes analysis [16]. Among the
many different approaches for face alignment, cascaded
pose regression [8, 10, 29, 37] has emerged as one of the
most popular and state-of-the-art methods. The algorithm
typically starts from an initial shape, e.g. mean shape of
training samples, and refines the shape through sequentially
trained regressors.
In this study, we re-consider the face alignment prob-
lem from a different view point by taking a coarse-to-fine
shape searching approach (Fig. 1(a)). The algorithm begins
with a coarse searching in a shape space that encompasses
a large number of candidate shapes. The coarse search-
ing stage identifies a sub-region within the shape space for
further searching in subsequent finer stages and simultane-
ously discards unpromising shape space sub-regions. Sub-
sequent finer stages progressively and adaptively shrink the
plausible region and converge the space to a small region
where the final shape is estimated. In practice, only three
stages are required.
In comparison to the conventional cascaded regression
approaches, the coarse-to-fine framework is attractive in
two aspects:
1) Initialisation independent: A widely acknowledged
shortcoming of cascaded regression approach is its depen-
dence on initialisation [32]. In particular, if the initialised
shape is far from the target shape, it is unlikely that the
discrepancy will be completely rectified by subsequent it-
erations in the cascade. As a consequence, the final solu-
tion may be trapped in local optima (Fig. 1(c)). Existing
methods often circumvent this problem by adopting some
heuristic assumptions or strategies (see Sec. 2 for details),
which mitigate the problem to certain extent, but do not
fully resolve the issue. The proposed coarse-to-fine frame-
work relaxes the need of shape initialisation. It starts its first
stage by exploring the whole shape space, without locking
itself on a specific single initialisation point. This frees the
alignment process from being affected by poor initialisa-
tion, leading to more robust face alignment.
2) Robust to large pose variation: The early stages in the
coarse-to-fine search is formulated to simultaneously ac-
commodate and consider diverse pose variations, e.g. with
different degrees of head pose, and face contours. The
search then progressively focus the processing on dedicated
shape sub-region to estimate the best shape. Experimental
results show that this searching mechanism is more robust
in coping with large pose variations in comparison to the
cascaded regression approach.
Since searching through shape space is challenging w.r.t.
speed issue, we propose a hybrid features setting to achieve
practical speed. Owing to the unique error tolerance in the
coarse-to-fine searching mechanism, our framework is ca-
pable of exploiting the advantages and characteristics of
different features. For instance, we have the flexibility to
employ less accurate but computationally efficient feature,
e.g. BRIEF [9] at the early stages, and use more accurate but

(b) Steps of the proposed coarse-to-fine search
Decision on sub-region center 𝑥
(l)
Ground truth location 𝑥
Shape space region visualized in 2D
Error: 12.04
(a) Sub-region searching
Regression
Sampling
Unpromising
sub-regions
Stage 2
Stage 3
Stage 1
(c) Steps of cascaded regression (baseline)
Mean shape
Error: 23.05
given P
R
(l=0)
estimate 𝑥
(l=1)
estimate P
R
(l=1)
P
R
(l=0)
P
R
(l=1)
given P
R
(l=1)
estimate 𝑥
(l=2)
estimate P
R
(l=2)
given P
R
(l=2)
estimate 𝑥
(l=3)
P
R
(l=1)
P
R
(l=2)
P
R
(l=2)
Candidate shapes
Figure 1. (a) A diagram that illustrates the coarse-to-fine shape searching method for estimating the target shape. (b) to (c) Comparison of
the steps between proposed coarse-to-fine search and cascaded regression. Landmarks on nose and mouth are trapped in local optima in
cascaded regression due to poor initialisation, and latter cascaded iterations seldom contribute much to rectifying the shape. The proposed
method overcomes these problems through coarse-to-fine shape searching.
relatively slow feature, e.g. SIFT [23], at later stage. Such a
setting allows the proposed framework to achieve improved
computational efficiency, whilst it is still capable of main-
taining high accuracy rate without using accurate features
in all stages. Our MATLAB implementation achieves 25
fps real-time performance on a single core i5-4590. It is
worth pointing out that impressive alignment speed (more
than 1000 fps even for 194 landmarks) has been achieved
by Ren et al. [29] and Kazemi et al. [20]. Though it is be-
yond the scope of this work to explore learning-based shape
indexed feature, we believe the proposed shape searching
framework could benefit from such high-speed feature.
Experimental results demonstrate that the coarse-to-
fine shape searching framework is a compelling alterna-
tive to the popular cascaded regression approaches. Our
method outperforms existing methods in various benchmark
datasets including the challenging 300-W dataset [30]. Our
code is available in project page mmlab.ie.cuhk.edu.
hk/projects/CFSS.html.
2. Related work
A number of methods have been proposed for face align-
ment, including the classic active appearance model [12, 22,
24] and constrained local model [13, 35, 31, 15].
Face alignment by cascaded regression: There are a few
successful methods that adopt the concept of cascaded pose
regression [17]. Supervised descent method (SDM) [37]
is proposed to solve nonlinear least squares optimisation
problem. The non-linear SIFT [23] feature and linear re-
gressors are applied. Feature learning based method, e.g.
Cao et al. [10] and Burgos-Artizzu et al. [8], regress se-
lected discriminative pixel-difference features with random
ferns [27]. Ren et al. [29] learns the local binary features
with random forest [6], achieving very fast performance.
All the aforementioned methods assume the initial shape
is provided in some forms, typically a mean shape [37, 29].
Mean shape is used with the assumption that the test sam-
ples are distributed close to the mean pose of the training
samples. This assumption does not always hold especially
for faces with large pose variations. Cao et al. [10] propose
to run the algorithm several times using different initiali-
sations and take as final output the median of all predic-
tions. Burgos-Artizzu et al. [8] improve the strategy by a
smart restart method but it requires cross-validation to de-
termine a threshold and the number of runs. In general,
these strategies mitigate the problem to some extents, but
still do not fully eliminate the dependence on shape initial-
isation. Zhang et al. [38] propose to obtain initialisation
by predicting a rough estimation from global image patch,
still followed by sequentially trained auto-encoder regres-
sion networks. Our method instead solves the initialisation
problem via optimising shape sub-region. We will show in
Sec. 4 that our proposed searching method is robust to large
pose variation and outperforms previous methods.

Coarse-to-fine methods: The coarse-to-fine approach has
been widely used to address various image processing and
computer vision problems such as face detection [18], shape
detection [1] and optical flow [7]. Some existing face align-
ment methods also adopt a coarse-to-fine approach but with
a significantly different notion than our shape searching
framework. Sun et al. [33] first have a coarse estimation of
landmark locations and apply cascaded deep models to re-
fine the position of landmarks of each facial part. Zhang et
al. [38] define coarse-to-fine as applying cascaded of auto-
encoder networks on images with increasing resolution.
3. Coarse-to-fine shape searching
Conventional cascaded regression methods refine a
shape via sequentially regressing local appearance patterns
indexed by the current estimated shape. In particular,
x
k+1
= x
k
+ r
k
(φ(I; x
k
)), (1)
where the 2n dimensional shape vector x
k
represents the
current estimate of (x, y) coordinates of the n landmarks
after the k
th
iteration. The local appearance patterns indexed
by the shape x on the face image I is denoted as φ(I; x),
and r
k
is the k
th
learned regressor. For simplicity we always
omit I in Eq. 1.
The estimation by cascaded regression can be easily
trapped in local optima given a poor shape initialisation
since the method refines a shape by optimising a single
shape vector x (Fig. 1(c)). In our approach, we overcome
the problem through a coarse-to-fine shape searching within
a shape space (Fig. 1(a) and (b)).
3.1. Overview of coarse-to-fine shape searching
Formally, we form a 2n dimensional shape space.
We denote N candidate shapes in the space as S =
{s
1
, s
2
, ..., s
N
} (N 2n). The candidate shapes in S are
obtained from training set pre-processed by Procrustes anal-
ysis [19]. S is fixed throughout the whole shape searching
process.
Given a face image, face alignment is performed through
l = 1, . . . , L stages of shape searching, as depicted in
Fig. 1(a). In each l
th
stage, we aim to find a finer shape
sub-region, which is represented by
¯
x
(l)
, P
R
(l)
, where
¯
x
(l)
denotes the center of the estimated shape sub-region, and
P
R
(l)
represents the probability distribution that defines the
scope of estimated sub-region around the center. When the
searching progresses through stages, e.g. from Stage 1 to
2, the algorithm adaptively determines the values of
¯
x and
P
R
, leading to a finer shape sub-region for the next search-
ing stage, with closer estimate to the target shape. The pro-
cess continues until convergence and the center of the last
finest sub-region is the final shape estimation.
In each stage, we first determine the sub-region center
¯
x
based on the given sub-region for this stage, and then es-
Algorithm 1 Training of coarse-to-fine shape searching
1: procedure TRAINING(Shapes S, Training set {I
i
; x
i
}
N
i=1
)
2: Set P
R
(0)
to be uniform distribution over S
3: for l = 1, 2, . . . , L do
4: Sample candidate shapes x
ij
0
according to P
R
(l1)
5: Learn K
l
regressors {r
k
}
K
l
k=1
with {x
ij
0
, x
i
}
N, N
l
i=1,j=1
6: Get regressed shapes x
ij
K
l
based on the K
l
regressors
7: Set initial weight to be equal: w
i
(0) = e/N
l
8: Construct G
i
and edge weight according to Eq. 4
9: for t = 0, 1, . . . , T 1 do
10: Update w
i
(t + 1) according to Eq. 6
11: end for
12: Compute sub-region center
¯
x
i
(l)
via Eq. 3
13: if l < L then
14: Learn distribution with {
¯
x
i
(l)
, x
i
}
N
i=1
15: Set probabilistic distribution P
R
(l)
via Eq. 7
16: end if
17: end for
18: end procedure
timate the finer sub-region used for further searching. A
larger/coarser region is expected at earlier stages, whilst
a smaller/finer region is expected at latter stages. In the
first searching stage, the given ‘sub-region’ P
R
(l=0)
is set to
be a uniform distribution over all candidate shapes, i.e. the
searching region is over the full set of S. In the subsequent
stages, the given sub-region is the estimated P
R
(l1)
from the
preceding stage.
As an overview of the whole approach, we list the ma-
jor training steps in Algorithm 1, and introduce the learning
method in Sec. 3.2 and Sec. 3.3. Testing procedure of the
approach is similar excluding the learning steps. More pre-
cisely, the learning steps involve learning the regressors in
each stage (Eq. 2 and Step 5 in Algorithm 1) and parameters
for estimating probabilistic distribution (Eq. 8 and 10, Step
14 in Algorithm 1).
3.2. Learn to estimate sub-region center
¯
x given P
R
To learn to compute the sub-region center
¯
x
(l)
for the l
th
searching stage, three specific steps are conducted:
Step-1: In contrast to cascaded regression that employs
a single initial shape (typically the mean shape) for regres-
sion, we explore a larger area in the shape space guided by
the probabilistic distribution P
R
(l1)
. In particular, for each
training sample, we randomly draw N
l
initial shapes from
S based on P
R
(l1)
. We denote the N
l
initial shapes of the
i
th
training sample as x
ij
0
, with i = 1 . . . N representing
the index of training sample, and j = 1 . . . N
l
denoting the
index of the randomly drawn shapes.
Step-2: This step aims to regress each initial shape x
ij
0
to a shape closer to the ground truth shape x
i
. Specifically,
we learn K
l
regressors in a sequential manner with iteration

k = 0, . . . , K
l
1, i.e.
r
k
= argmin
r
N
X
i=1
N
l
X
j=1
kx
i
x
ij
k
r(φ(x
ij
k
))k
2
2
+ Φ(r),
x
ij
k+1
= x
ij
k
+ r
k
(φ(x
ij
k
)) k = 0, . . . , K
l
1
(2)
where Φ(r) denotes the `
2
regularisation term for each pa-
rameter in model r. It is worth pointing out that K
l
is
smaller than the number of regression iterations typically
needed in cascaded regression. This is because i) due to the
error tolerance of coarse-to-fine searching, regressed shapes
for early stages need not be accurate, and ii) for later stages
initial candidate shapes x
ij
0
tend to be similar to the target
shape, thus fewer iterations are needed for convergence.
Step-3: After we learn the regressors and obtain the set
of regressed shapes,
n
x
ij
K
l
o
N
l
j=1
, we wish to learn a weight
vector w
i
= (w
i1
, . . . w
iN
l
)
>
to linearly combine all the
regressed shapes for collectively estimating the sub-region
center
¯
x
i
(l)
for i-th training sample
¯
x
i
(l)
=
X
N
l
j=1
w
ij
x
ij
K
l
. (3)
A straightforward method to obtain
¯
x
i
(l)
is to average all
the regressed shapes by fixing w
ij
= 1/N
l
. However, this
simple method is found susceptible to small quantity of er-
roneous regressed shapes caused by local optima. In or-
der to suppress their influence in computing the sub-region
center, we adopt the dominant set approach [28] for estimat-
ing w
i
. Intuitively, a high weight is assigned to regressed
shapes that form a cohesive cluster, whilst a low weight is
given to outliers. This amounts to finding a maximal clique
in an undirected graph. Note that this step is purely unsu-
pervised.
More precisely, we construct an undirected graph, G
i
=
(V
i
, E
i
), where the vertices are the regressed shapes V
i
=
n
x
ij
K
l
o
N
l
j=1
, and each edge in the edge set E
i
is weighted by
affinity defined as
a
pq
= sim(x
ip
K
l
, x
iq
K
l
)
=
exp(βkx
ip
K
l
x
iq
K
l
k
2
2
), p 6= q
0, p = q
.
(4)
Representing all the elements a
pq
in a matrix forms an affin-
ity matrix, A. Note that we set the diagonal elements of
A to zero to avoid self-loops. Following [28], we find the
weight vector w
i
by optimising the following problem,
max
w
i
w
i>
Aw
i
s.t. w
i
N
l
.
(5)
We denote the simplex as
n
= {x R
n
|x 0, e
>
x =
1}, where e = (1, 1, ..., 1)
>
. An efficient way to optimise
Eq. 5 is by using continuous optimisation technique known
as replicator dynamics [28, 36]
w
i
(t + 1) =
w
i
(t) (Aw
i
(t))
w
i
(t)
>
Aw
i
(t)
, (6)
where t = 0, 1, . . . , T 1, and symbol denotes elemen-
tary multiplication. Intuitively, in each weighting iteration
t, each vertex votes all its weight to other vertex, w.r.t. the
affinity between the two vertices. After optimising Eq. 6
for T iterations, we obtain w
i
(t = T ) and plug the weight
vector into Eq. 3 for estimating the sub-region center.
3.3. Learn to estimate probabilistic distribution P
R
given
¯
x
We then learn to estimate the probabilistic distribution
P
R
(l)
based on the estimated sub-region center
¯
x
(l)
. We aim
to determine the probabilistic distribution, P
R
(l)
(s|
¯
x
(l)
) =
P (s
¯
x
(l)
|φ(
¯
x
(l)
)), where s S and
P
s∈S
P
R
(l)
(s|
¯
x
(l)
) =
1. For clarity, we drop the subscripts (l) from
¯
x
(l)
and P
R
(l)
.
We model the probabilistic distribution P
R
(l)
as
P (s
¯
x|φ(
¯
x)) =
P (s
¯
x)P (φ(
¯
x)|s
¯
x)
P
y ∈S
P (y
¯
x)P (φ(
¯
x)|y
¯
x)
. (7)
The denominator is a normalising factor. Thus, when es-
timating the posterior probability of each shape s in S we
focus on the two factors P(s
¯
x), and P (φ(
¯
x)|s
¯
x).
The factor P (s
¯
x), referred as shape adjustment prior,
is modelled as
P (s
¯
x) exp(
1
2
(s
¯
x)
>
Σ
1
(s
¯
x)). (8)
The covariance matrix is learned by {
¯
x
i
, x
i
}
N
i=1
pairs on
training data, where x
denotes the ground truth shape
1
.
In practice, Σ is restricted to be diagonal and we decor-
relate the shape residual by principle component analysis.
This shape adjustment prior aims to approximately delin-
eate the searching scope near
¯
x, and typically the distribu-
tion is more concentrated for later searching stages.
The other factor P(φ(
¯
x)|s
¯
x) is referred as feature sim-
ilarity likelihood. Following [5], we divide this factor into
different facial parts,
P (φ(
¯
x)|s
¯
x) =
Y
j
P (φ(
¯
x
(j)
)|s
(j)
¯
x
(j)
), (9)
where j represents the facial part index. The probabilis-
tic independence comes from our conditioning on the given
1
We assume E(x
¯
x) = 0.

exemplar candidate shapes s and
¯
x, and throughout our ap-
proach, all intermediate estimated poses are strictly shapes.
Again by applying Baye’s rule, we can rewrite Eq. 9 into
P (φ(
¯
x)|s
¯
x) =
Q
j
P (φ(
¯
x
(j)
))
Q
j
P (s
(j)
)
Y
j
P (s
(j)
¯
x
(j)
|φ(
¯
x
(j)
))
Y
j
P (s
(j)
¯
x
(j)
|φ(
¯
x
(j)
)),
(10)
which could be learned via discriminative mapping for each
facial part. This feature similarity likelihood aims to guide
shapes moving towards more plausible shape region, by
separately considering local appearance from each facial
part.
By combining the two factors, we form the probabilis-
tic estimate for the shape space and could sample candidate
shapes for next stage. Such probabilistic sampling enables
us to estimate current shape error and refine current esti-
mate via local appearance, while at the same time the shape
constraints are still strictly encoded.
3.4. Shape searching with hybrid features
In the conventional cascaded regression framework, one
often selects a particular features for regression, e.g. SIFT
in [37]. The selection of features involves the tradeoff be-
tween alignment accuracy and speed. It can be observed
from Fig. 2 that different features (e.g. HoG [14], SIFT [23],
LBP [26], SURF [4], BRIEF [9]) exhibit different charac-
teristics in accuracy and speed. It is clear that if one adheres
to the SIFT feature throughout the whole regression proce-
dure, the best performance in our method can be obtained.
However, the run time efficiency is much lower than that of
the BRIEF feature.
Our coarse-to-fine shape searching framework is capable
of exploiting different types of features at different stages,
taking advantages of their specific characteristics, i.e. speed
and accuracy. Based on the feature characteristics observed
in Fig. 2, we can operate the coarse-to-fine framework in
two different feature settings through switching features in
different searching stages:
CFSS - The SIFT feature is used in all stages to obtain
the best accuracy in our approach.
CFSS-Practical - Since our framework only seeks
for a coarse shape sub-region in the early stages,
thus relatively weaker features with much faster speed
(e.g. BRIEF) would be a better choice for early stage,
and SIFT is only used in the last stage for refinement.
In our 3-stage implementation, we use the BRIEF fea-
ture in the first two stages, and SIFT in the last stage.
In the experiments we will demonstrate that the CFSS-
Practical performs competitively to the CFSS, despite us-
ing the less accurate BRIEF for the first two stages. The
0 5
10 15 20 25 30
0
5
10
15
Averaged Initialization Error
Averaged Output Error
SURF
HoG
LBP
BRIEF
SIFT
(a) Regression curves for features.
0
1
2
3
4
5
6
SIFT LBP SURF HoG BRIEF
ms
Time cost per frame
(b) Speed for features.
Figure 2. We evaluate each feature’s accuracy and speed using a
validation set extracted from the training set. (a) We simulate dif-
ferent initial conditions with different initialisation errors to eval-
uate the averaged output error of cascaded regression. We ensure
that the result has converged for each initialisation condition. (b)
Comparison of speed of various features measured under the same
quantity of regression tasks.
CFSS enjoys such feature switching flexibility thanks to the
error tolerance of the searching framework. In particular,
CFSS allows for less accurate shape sub-region in the ear-
lier searching stages, since subsequent stages can rapidly
converge to the desired shape space location for target shape
estimation.
3.5. Time complexity analysis
The most time consuming module is feature extraction,
which directly influences the time complexity. We assume
the complexity for feature extraction is O(F ). The com-
plexity of CFSS is thus O(F(L 1 +
P
L
l=1
N
l
K
l
)). By ap-
plying the hybrid feature setting, the complexity reduces to
O(F N
L
K
L
), since only the last searching stage utilises the
more accurate feature, and the time spent on the fast feature
contributes only a small fraction to the whole processing
time. As is shown in Sec. 4.2, the efficiency of the search-
ing approach is in the same order of magnitude compared
with cascaded regression method, but with much more ac-
curate prediction.
3.6. Implementation details
In practice, we use L = 3 searching stages in the CFSS.
Increasing the number of stages only leads to marginal
improvement. The number of regressors, K
l
, and initial
shapes N
l
, are set without optimisation. In general, we
found setting K
l
= 3, and N
l
= 15 works well for CFSS.
Only marginal improvement is obtained with larger num-
ber of K
l
and N
l
. For CFSS-Practical, we gain further run
time efficiency by reducing the regression iterations K
l
, and
decreasing N
l
without sacrificing too much accuracy. We
choose K
l
in the range of 1 to 2, and N
l
in the range of 5
to 10. We observe that the alignment accuracy is not sen-
sitive to these parameters. We set T = 10 in Eq. 6. β (in

Figures
Citations
More filters
Posted Content

Deep High-Resolution Representation Learning for Visual Recognition

TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.
Journal ArticleDOI

HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition

TL;DR: HyperFace as discussed by the authors combines face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNNs) and achieves significant improvement in performance by fusing intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features.
Proceedings ArticleDOI

OpenFace 2.0: Facial Behavior Analysis Toolkit

TL;DR: OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Proceedings ArticleDOI

Face Alignment Across Large Poses: A 3D Solution

TL;DR: 3D Dense Face Alignment (3DDFA), in which a dense 3D face model is fitted to the image via convolutional neutral network (CNN), is proposed, and a method to synthesize large-scale training samples in profile views to solve the third problem of data labelling is proposed.
Proceedings ArticleDOI

How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks)

TL;DR: Recently, Bulat et al. as mentioned in this paper proposed LS3D-W, which is the largest and most challenging 3D facial landmark dataset to date, and trained a neural network for 3D face alignment.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Journal ArticleDOI

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What contributions have the authors mentioned in the paper "Face alignment by coarse-to-fine shape searching" ?

The authors present a novel face alignment framework based on coarse-to-fine shape searching. Unlike the conventional cascaded regression approaches that start with an initial shape and refine the shape in a cascaded manner, their approach begins with a coarse search over a shape space that contains diverse shapes, and employs the coarse solution to constrain subsequent finer search of shapes. 

The authors plan to incorporate learning-based feature in their framework in the future to further improve the accuracy and efficiency.