scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Active Appearance Models Revisited

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 135-164
TL;DR: This work proposes an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm and shows that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp.
Abstract: Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities. Fitting an AAM to an image consists of minimising the error between the input image and the closest model instances i.e. solving a nonlinear optimisation problem. We propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. We show that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp, typically a 2D similarity transformation. We evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.

Summary (8 min read)

1 Introduction

  • Active Appearance Models (AAMs) [Cootes et al., 2001], first proposed in [Cootes et al., 1998], and the closely related concepts of Active Blobs [Sclaroff and Isidoro, 1998] and Morphable Models [Vetter and Poggio, 1997, Jones and Poggio, 1998, Blanz and Vetter, 1999], are non-linear, generative, and parametric models of a certain visual phenomenon.
  • The parameters could be passed to a classifier to yield a face recognition algorithm.
  • The usual approach [Lanitis et al., 1997, Cootes et al., 1998, Cootes et al., 2001, Cootes, 2001] is to iteratively solve for incremental additive updates to the parameters (the shape and appearance coefficients.).
  • The inverse compositional algorithm is only applicable to sets of warps that form a group.
  • The linear shape variation of AAMs is often augmented by combining it with a 2D similarity transformation to “normalize” the shape.

2 Linear Shape and Appearance Models: AAMs

  • Active Appearance Models are just one instance in a large class of closely related linear shape and appearance models (and their associated fitting algorithms.).
  • The authors also wanted to avoid introducing any new, and potentially confusing, terminology.
  • One thing that is particularly confusing is that the terminology often refers to the combination of a model and a fitting algorithm.
  • In particular, the authors use the term AAM to refer to the model, independent of the fitting algorithm.
  • In essence there are just two type of linear shape and appearance models, those which model shape and appearance independently, and those which parameterize shape and appearance with a single set of linear parameters.

2.1.1 Shape

  • As the name suggests, independent AAMs model shape and appearance separately.
  • The shape of an independent AAM is defined by a mesh and in particular the vertex locations of the mesh.
  • Usually the mesh is triangulated (although there are ways to avoid triangulating the mesh by using thin plate splines rather than piecewise affine warping [Cootes, 2001].).
  • Since the authors can easily perform a linear reparameterization, wherever necessary they assume that the vectors are orthonormal.
  • In the remainder of the figure, the base mesh + is overlayed with arrows corresponding to each of the first four shape vectors , , , and .

2.1.2 Appearance

  • The appearance of an independent AAM is defined within the base mesh .
  • That way only pixels that are relevant to the phenomenon are modeled, and background pixels can be ignored.
  • Let , also denote the set of pixels - . / 0 that lie inside the base mesh , a convenient abuse of terminology.
  • Since the authors can easily perform a linear reparameterization, wherever necessary they assume that the images are orthonormal.
  • The base appearance is set to be the mean image and the images to be the eigenimages corresponding to the largest eigenvalues.

2.1.3 Model Instantiation

  • Equations (2) and (3) describe the AAM shape and appearance variation.
  • They do not describe how to generate a model instance.
  • The AAM model instance with shape parameters and appearance parameters is then created by warping the appearance from the base mesh to the model shape .
  • In particular, the pair of meshes and define a piecewise affine warp from to .
  • Thin plate splines could be used instead [Cootes, 2001].

2.2 Combined AAMs

  • Independent AAMs have separate shape and appearance parameters.
  • See the discussion at the end of this paper.
  • On the other hand, combined AAMs have a number of advantages.
  • First, the combined formulation is more general and is a strict superset of the independent formulation.
  • Since the authors will project out the appearance variation, as discussed in Section 4.2, the computational cost of their new algorithm is mainly just dependent on the number of shape parameters " and does not depend significantly on the number of appearance parameters .

2.3.1 Fitting Goal

  • Suppose the authors are given an input image - that they wish to fit an AAM to.
  • Suppose for now that the authors know the optimal shape and appearance parameters in the fit.
  • This means that the image - and the model instance A - - must be similar.
  • At the pixel - , the input image has the intensity - .
  • The error image is defined in the coordinate frame of the AAM and can be computed as follows.

2.3.2 Inefficient Gradient Descent Algorithms

  • Perhaps the most natural way of minimizing the expression in Equation (7) is to use a standard gradient descent optimization algorithm.
  • LevenbergMarquardt was used in [Sclaroff and Isidoro, 1998] and a stochastic gradient descent algorithm was used in [Jones and Poggio, 1998, Blanz and Vetter, 1999].
  • The advantage of these algorithms is that they use a principled, analytical algorithm, the convergence properties of which are well understood.
  • The disadvantage of these gradient descent algorithms is that they are very slow.
  • The partial derivatives, Hessian, and gradient direction all need to be recomputed in each iteration.

2.3.3 Efficient Ad-Hoc Fitting Algorithms

  • Because all previous gradient descent algorithms are so slow, a considerable amount of effort has been devoted in the past to developing other fitting algorithms that are more efficient [Cootes et al., 2001, Cootes et al., 1998, Sclaroff and Isidoro, 1998].
  • To improve the efficiency, previous AAM fitting algorithms such as [Cootes et al., 2001,Cootes et al., 1998, Sclaroff and Isidoro, 1998] have either explicitly or implicitly simply assumed that 3 - and 4 - do not depend on the model parameters.
  • But this time by not as far.
  • Similarly, although in this counter example the direction of 2 * is correct and it is just the magnitude that is wrong, other counterexamples can be provided where there the error images are the same, but the directions of the 2 * are different.
  • The use of difference decomposition in [Sclaroff and Isidoro, 1998] makes the constant linear assumption in Equation (24) of that paper.).

3 Efficient Gradient Descent Image Alignment

  • As described above, existing AAM fitting algorithms fall into one of two categories.
  • Instead it is possible to update the entire warp by composing the current warp with the computed incremental warp with parameters 2 .
  • In particular, it is possible to update: - 5 - - 2 (11) This compositional approach is different, yet provably equivalent, to the usual additive approach [Baker and Matthews, 2003].

3.1 Lucas-Kanade Image Alignment

  • The goal of image alignment is to find the location of a constant template image in an input image.
  • The goal of Lucas-Kanade is to find the locally “best” alignment by minimizing the sum of squares difference between a constant template image, + - say, and an example image - with respect to the warp parameters : ' + - - (12) Note the similarity with Equation (7).
  • As in Section 2 above, - is a warp that maps the pixels - from the template (i.e. the base mesh) image to the input image and has parameters .
  • Solving for is a nonlinear optimization problem, even if - is linear in because, in general, the pixels values - are nonlinear in (and essentially unrelated to) the pixel coordinates - .
  • In [Baker and Matthews, 2003] the authors refer to this as the forwards-additive algorithm.

3.2 Forwards Compositional Image Alignment

  • In the Lucas-Kanade algorithm the warp parameters are computed by estimating a 2 offset from the current warp parameters .
  • The compositional framework computes an incremental warp - 2 to be composed with the current warp - .
  • There are then two differences between Equation (19) and Equation (14).
  • The composition update step is computationally more costly than the update step for an additive algorithm, but this is offset by not having to compute the Jacobian in each iteration.
  • The key point in the forwards compositional algorithm, illustrated in Figure 6(a), is that the update is computed with respect to each time.

3.3 Inverse Compositional Image Alignment

  • The inverse compositional algorithm is a modification of the forwards compositional algorithm where the roles of the template and example image are reversed.
  • Rather than computing the incremental warp with respect to - it is computed with respect to the template - .
  • See Figure 7 for the details of the algorithm.
  • The only additional computation is Steps 8 and 9 which are very efficient.
  • These two steps essentially correct for the current estimates of the parameters and avoid the problem illustrated in Figure 4.

4 Applying the Inverse Compositional Algorithm to AAMs

  • The authors now show how the inverse compositional algorithm can be applied to independent AAMs.
  • The algorithm does not apply to combined AAMs.
  • See Section 6.2 for more discussion of why not.

4.1 Application Without Appearance Variation

  • The authors first describe how the algorithm applies without any appearance variation; i.e. when .
  • Comparing Equation (7) with Equation (12) the authors see that if there is no appearance variation, the inverse compositional algorithm applies as is.
  • Examining Figure 7 the authors find that most of the steps in the algorithm are standard vector, matrix, and image operations such as computing image gradients and image differences.
  • The only non-standard steps are: Step 1 warping with the piecewise affine warp - , Step 4 computing the Jacobian of the piecewise affine warp, and Step 9 inverting the incremental piecewise affine warp and composing it with the current estimate of the piecewise affine warp.
  • The authors now describe how each of these steps is performed.

4.1.1 Piecewise Affine Warping

  • The image - is computed by backwards warping the input image with the warp - ; i.e. for each pixel - in the base mesh the authors compute - and sample (bilinearly interpolate) the image at that location.
  • Suppose that the vertices of that triangle are , , and .
  • These vertices can be computed from the shape parameters using Equation (2).
  • One way to implement the piecewise affine warp is illustrated in Figure 8.
  • This computation only needs to be performed once per triangle, not once per pixel.

4.1.2 Computing the Warp Jacobian

  • The destination of the pixel - under the piecewise affine warp - depends on the AAM shape parameters through the vertices of the mesh .
  • From Equation (1) remember that these vertices are denoted: +.
  • The first components of the Jacobian are and , the Jacobians of the warp with respect to the vertices of the mesh .
  • (32) where denotes the component of that corresponds to and similarly for, also known as Differentiating Equation (2) gives.
  • The array of 7 (base mesh shaped) images correspond to the Jacobian for the four shape vectors in Figure 1.

4.1.3 Warp Inversion

  • It therefore follows that to first order in 2 : - 2 - 2 (35) Note that the two Jacobians in Equation (34) are not evaluated at exactly the same location, but since they are evaluated at points 2 apart, they are equal to zeroth order in 2 .
  • Since the difference is multiplied by 2 the authors can ignore the first and higher order terms.
  • Also note that the composition of two warps is not strictly defined and so the argument in Equation (34) is informal.
  • The essence of the argument is correct, however.
  • Once the authors have the derived the first order approximation to the composition of two piecewise affine warps below, they can then use that definition of composition in the argument above.

4.1.4 Composing the Incremental Warp with the Current Warp Estimate

  • Given the current estimate of the parameters the authors can compute the current mesh vertex locations using Equation (2).
  • Given these parameters, the authors can use Equation (2) again to estimate the corresponding changes to the base mesh vertex locations: 2 &' )( 2 * (36) where 2 2 2 2 2 are the changes to the base mesh vertex locations corresponding to - 2 .
  • The situation is then as illustrated in Figure 11.
  • Now consider any of the mesh triangles that contains the vertex.
  • For this triangle there is an affine warp between the base mesh and the current mesh .

4.2 Including Appearance Variation

  • The authors have now described all of the steps needed to apply the inverse compositional algorithm to an independent AAM assuming that there is no appearance variation.
  • More generally, the authors wish to use the same algorithm to minimize the expression in Equation (7).
  • The first of the two terms immediately simplifies.
  • Since the norm only considers the components of vectors in the orthogonal complement of , any component in itself can be dropped.
  • (The error image does not need to be projected into this subspace because Step 7 of the algorithm is really the dot product of the error image with .

4.3 Including a Global Shape Transform

  • The most common way of constructing an AAM [Cootes et al., 2001] consists of first “normalizing” the mesh so that it is as close as possible to the base mesh [Cootes et al., 2001].
  • Typically, a 2D similarity transformation (translation, rotation, and scale) is used, although an affine warp, a homography, or any other global warp could be used instead.
  • Because the training data is normal- The Inverse Compositional Algorithm with Appearance Variation ized in this way, the linear shape variation in the AAM does not model the translation, rotation, and scaling in the original training data.
  • To avoid this problem, the linear shape variation is typically augmented with a global shape transformation in the following manner.

4.3.1 Adding a Global Shape Transform to an AAM

  • Finally, is the rotation, again set so that when there is no rotation.
  • Other natural choices are affine warps and homographies [Bergen et al., 1992].
  • Note that the above is not the only way to parameterize the set of 2D similarity transformations.
  • The shape vector then “moves” the mouth vertices up and down in the image.
  • This shape variation is only possible if the linear AAM shape variation is followed by a global shape transformation, as defined in Equation (44).

4.3.2 Fitting an AAM with a Global Shape Transform

  • The authors now briefly describe how the inverse compositional algorithm can be used to fit an AAM with a global shape transformation; i.e. how to apply the inverse compositional algorithm to the warp: - - (46) rather than the warp - . since this is slightly simpler.
  • The authors just compute separate Jacobians for - and - for the and parameters respectively.
  • There are a variety of ways of performing the composition of the two warps.
  • These are then converted to changes in the destination mesh vertex locations 2 2 by applying the affine warp of - to each triangle and then averaging the results.
  • The equivalent of Equation (37) is then used to compute the new values of .

4.4 Other Extensions to the Algorithm

  • The authors have described how the inverse compositional image alignment algorithm can be applied to AAMs.
  • The field of image alignment is well studied and over the years a number of extensions Inverse Compositional Algorithm with Appearance Variation and Global Shape Transform multiplied by the inverse of the Hessian in Step 8 to give the dimensional vector 0 3 0 . . and heuristics have been developed to improve the performance of the algorithms.
  • The fitting algorithm can be applied hierarchically on a Gaussian image pyramid to reduce the likelihood of falling into a local minimum [Bergen et al., 1992].
  • See [Baker and Matthews, 2003] for the details of that algorithm.

5 Empirical Evaluation

  • The authors have proposed a new fitting algorithm for AAMs.
  • The performance can also be very dependent on minor details such as the definition of the gradient filter used to compute .
  • In their evaluation the authors take the following philosophy.
  • The authors compare their AAM fitting algorithm with and without each of these changes.
  • Every other line of code in the implementation is always exactly the same.

5.1 Generating the Inputs

  • The authors begin by taking a video of a person moving their head (both rigidly and non-rigidly.).
  • The authors avoid this problem by overlaying the reconstructed AAM over the original movie.
  • By comparing the performance of the algorithms on these two movies the authors should be able to detect any bias in the ground-truth.
  • Empirically the authors found almost no difference between the performance of any of the algorithms on the corresponding real and synthetic sequences and conclude there is no bias.
  • The shape parameters are randomly generated from independent Gaussian distributions with variance equal to the eigenvalue of that mode in the PCA performed during AAM construction.

5.2 The Evaluation Metrics

  • Given the initial parameters, the AAM fitting algorithm should hopefully converge to the groundtruth parameters.
  • The first measure is the average rate of convergence.
  • The authors average these graphs (for approximately the same starting error) over all cases where all algorithms converge.
  • The authors say that an algorithm converges if the RMS mesh point error is less than 9 pixels after 20 iterations.
  • Also, comparing the appearance algorithms by their appearance estimates is not possible because the appearance is not computed until the “project out” algorithm has converged.

5.3 Experiment 1: The Update Rule

  • In their first experiment the authors compare the inverse compositional update in Step 9 of the algorithm with the usual additive update of: 5 $ 2 .
  • The results without the global similarity transform are included in Figure 14 and the results with it are included in Figure 15.
  • Either the rate of convergence is faster or the frequency of convergence is higher.
  • These results also illustrate that the inverse compositional algorithm does not always outperform the additive algorithm.
  • It just performs better in many scenarios, and similarly in others.

5.4 Experiment 2: Computation of the Steepest Descent Images

  • The inverse compositional algorithm uses the steepest descent images .
  • (All other aspects of the algorithm are exactly the same.
  • Specifically the authors use the inverse compositional update in both cases.).
  • The results in Figure 16 show that the analytically derived steepest descent images performing significantly better.
  • The results show that the analytically derived images are significantly better.

5.5 Experiment 3: Appearance Variation

  • In their algorithm the authors “project out” the appearance variation.
  • There are potentially benefits of iteratively updating the appearance parameters.
  • The steepest descent image for the appearance parameter is - .
  • The authors plot results comparing the “project out appearance” approach with the “explicitly model appearance” approach in Figure 17.
  • The results in Figure 17 show that the two approaches perform almost identically.

5.6 Computational Efficiency

  • One concern with the inverse compositional algorithm is that the time taken to perform the inverse compositional update might be quite long.
  • The actual calculation is minimal because the number of vertices in the mesh is far less than the number of pixels.
  • The results were obtained on a dual 2.4GHz P4 machine and are for a model with 19,977 pixels, 3 shape parameters, 4 similarity parameters, and 9 appearance parameters.
  • Overall their Matlab implementation of the inverse compositional algorithm operates at approximately 5Hz.
  • The authors “C” implementation operates at approximately 150Hz on the same machine.

6.1 Summary

  • These results were obtained on a dual 2.4GHz P4 machine and are for a model with 19,977 pixels, 3 shape parameters, 4 similarity parameters, and 9 appearance parameters.
  • Overall the inverse compositional algorithm runs at about 5Hz in Matlab.
  • In this paper the authors have proposed an algorithm for fitting AAMs that has the advantages of both types of algorithms.
  • Overall their algorithm outperforms previous approaches in terms of: (1) the speed of convergence (far fewer iterations are needed to converge to any give accuracy), (2) the frequency of convergence (our algorithm is more likely to convergence from a large distance away), and (3) the computational cost (the algorithm is far faster because the appearance variation is projected out.).

6.2 Discussion

  • The inverse compositional AAM fitting algorithm can only be applied to independent AAMs.
  • It cannot be applied to combined AAMs which parameterize the shape and appearance variation with a single set of parameters and so introduce a coupling between shape and appearance.
  • In practice it is not.
  • The nonlinear optimization in their algorithm is only over the " shape parameters and so is actually lower dimensional than the equivalent combined AAM optimization which would have more than " parameters.
  • Currently the authors do not see a way to extend their algorithm to combined AAMs, but of course they may be wrong.

Acknowledgments

  • The authors thank Tim Cootes for discussions on the incorporation of the global shape transformation in Section 4.3.
  • Elements of the AAM fitting algorithm appeared in [Baker and Matthews, 2001].
  • The authors thank the reviewers of [Baker and Matthews, 2001] for their feedback.
  • The research described in this paper was conducted under U.S. Office of Naval Research contract N00014-00-1-0915.
  • Additional support was provided through U.S. Department of Defense award N41756-03-C4024.

Did you find this useful? Give us your feedback

Figures (16)

Content maybe subject to copyright    Report

Active Appearance Models Revisited
Iain Matthews and Simon Baker
CMU-RI-TR-03-02
The Robotics Institute
Carnegie Mellon University
Abstract
Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and
Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape
and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities.
Fitting an AAM to an image consists of minimizing the error between the input image and the clos-
est model instance; i.e. solving a nonlinear optimization problem. We propose an efficient fitting
algorithm for AAMs based on the inverse compositional image alignment algorithm. We show how
the appearance variation can be “projected out” using this algorithm and how the algorithm can
be extended to include a “shape normalizing” warp, typically a 2D similarity transformation. We
evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.
Keywords: Active Appearance Models, AAMs, Active Blobs, Morphable Models, fitting, effi-
ciency, Gauss-Newton gradient descent, inverse compositional image alignment.


1 Introduction
Active Appearance Models (AAMs) [Cootes et al., 2001], first proposed in [Cootes et al., 1998],
and the closely related concepts of Active Blobs [Sclaroff and Isidoro, 1998] and Morphable Mod-
els [Vetter and Poggio, 1997, Jones and Poggio, 1998, Blanz and Vetter, 1999], are non-linear,
generative, and parametric models of a certain visual phenomenon. The most frequent application
of AAMs to date has been face modeling [Lanitis et al., 1997]. However, AAMs may be useful for
other phenomena too [Sclaroff and Isidoro, 1998, Jones and Poggio, 1998]. Typically an AAM is
first fit to an image of a face; i.e. the model parameters are found to maximize the “match” between
the model instance and the input image. The model parameters are then used in whatever the ap-
plication is. For example, the parameters could be passed to a classifier to yield a face recognition
algorithm. Many different classification tasks are possible. In [Lanitis et al., 1997], for example,
the same model was used for face recognition, pose estimation, and expression recognition. AAMs
are a general-purpose image coding scheme, like Principal Components Analysis, but non-linear.
Fitting an AAM to an image is a non-linear optimization problem. The usual approach [Lanitis
et al., 1997, Cootes et al., 1998, Cootes et al., 2001,Cootes, 2001] is to iteratively solve for in-
cremental additive updates to the parameters (the shape and appearance coefficients.) Given the
current estimates of the shape parameters, it is possible to warp the input image backwards onto
the model coordinate frame and then compute an error image between the current model instance
and the image that the AAM is being fit to. In most previous algorithms, it is simply assumed
that there is a constant linear relationship between this error image and the additive incremental
updates to the parameters. The constant coefficients in this linear relationship can then be found
either by linear regression [Lanitis et al., 1997] or by other numerical methods [Cootes, 2001].
Unfortunately the assumption that there is such a simple relationship between the error image
and the appropriate update to the model parameters is in general incorrect. See Section 2.3.3 for a
counterexample. The result is that existing AAM fitting algorithms perform poorly, both in terms
of the number of iterations required to converge, and in terms of the accuracy of the final fit. In
this paper we propose a new analytical AAM fitting algorithm that does not make this simplifying
assumption. Our algorithm in based on an extension to the inverse compositional image alignment
algorithm [Baker and Matthews, 2001, Baker and Matthews, 2003]. The inverse compositional
algorithm is only applicable to sets of warps that form a group. Unfortunately, the set of piecewise
affine warps used in AAMs does not form a group. Hence, to use the inverse compositional algo-
rithm, we derive first order approximations to the group operators of composition and inversion.
Using the inverse compositional algorithm also allows a different treatment of the appearance
variation. Using the approach proposed in [Hager and Belhumeur, 1998], we project out the ap-
pearance variation thereby eliminating a great deal of computation. This modification is closely
1

related to “shape AAMs” [Cootes and Kittipanya-ngam, 2002]. Another feature that we include is
shape normalization. The linear shape variation of AAMs is often augmented by combining it with
a 2D similarity transformation to “normalize” the shape. We show how the inverse compositional
algorithm can be used to simultaneously fit the combination of the two warps (the linear AAM
shape variation and a following global shape transformation, usually a 2D similarity transform.)
2 Linear Shape and Appearance Models: AAMs
Although they are perhaps the most well-known example, Active Appearance Models are just
one instance in a large class of closely related linear shape and appearance models (and their
associated fitting algorithms.) This class contains Active Appearance Models (AAMs) [Cootes et
al., 2001, Cootes et al., 1998, Lanitis et al., 1997], Shape AAMs [Cootes and Kittipanya-ngam,
2002], Active Blobs [Sclaroff and Isidoro, 1998], Morphable Models [Vetter and Poggio, 1997,
Jones and Poggio, 1998,Blanz and Vetter, 1999], and Direct Appearance Models [Hou et al., 2001],
as well as possibly others. Many of these models were proposed independently in 1997-1998
[Lanitis et al., 1997,Vetter and Poggio, 1997,Cootes et al., 1998,Sclaroff and Isidoro, 1998,Jones
and Poggio, 1998]. In this paper we use the term Active Appearance Model” to refer generically to
the entire class of linear shape and appearance models. We chose the term AAM rather than Active
Blob or Morphable Model solely because it seems to have stuck better in the vision literature, not
because the term was introduced earlier or because AAMs have any particular technical advantage.
We also wanted to avoid introducing any new, and potentially confusing, terminology.
Unfortunately the previous literature is already very confusing. One thing that is particularly
confusing is that the terminology often refers to the combination of a model and a fitting algorithm.
For example, Active Appearance Models [Cootes et al., 2001] strictly only refers to a specific
model and an algorithm for fitting that model. Similarly, Direct Appearance Models [Hou et al.,
2001] refers to a different model-fitting algorithm pair. In order to simplify the terminology we
will a make clear distinction between models and algorithms, even though this sometimes means
we will have to abuse previous terminology. In particular, we use the term AAM to refer to the
model, independent of the fitting algorithm. We also use AAM to refer to a slightly larger class of
models than that described in [Cootes et al., 2001]. We hope that this simplifies the situation.
In essence there are just two type of linear shape and appearance models, those which model
shape and appearance independently, and those which parameterize shape and appearance with a
single set of linear parameters. We refer to the first set as independent linear shape and appearance
models and the second as combined shape and appearance models. Since the name AAM has
stuck, we will also refer to the first set as independent AAMs and the second as combined AAMs.
2

    
Figure 1: The linear shape model of an independent AAM. The model consists of a triangulated base mesh
plus a linear combination of
shape vectors

. See Equation (2). Here we display the base mesh on the
far left and to the right four shape vectors
,
,
, and
overlayed on the base mesh.
2.1 Independent AAMs
2.1.1 Shape
As the name suggests, independent AAMs model shape and appearance separately. The shape
of an independent AAM is defined by a mesh and in particular the vertex locations of the mesh.
Usually the mesh is triangulated (although there are ways to avoid triangulating the mesh by using
thin plate splines rather than piecewise affine warping [Cootes, 2001].) Mathematically, we define
the shape
of an AAM as the coordinates of the
vertices that make up the mesh:
  !
(1)
See Figure 1 for an example mesh. AAMs allow linear shape variation. This means that the shape
can be expressed as a base shape

plus a linear combination of
"
shape vectors
:
# %$ &
'
)(
*
(2)
In this expression the coefficients
*
are the shape parameters. Since we can easily perform a linear
reparameterization, wherever necessary we assume that the vectors
are orthonormal.
AAMs are normally computed from training data. The standard approach is to apply Principal
Component Analysis (PCA) to the training meshes [Cootes et al., 2001]. The base shape

is the
mean shape and the vectors

are the
"
eigenvectors corresponding to the
"
largest eigenvalues.
An example independent AAM shape model is shown in Figure 1. On the left of the figure, we
plot the triangulated base mesh

. In the remainder of the figure, the base mesh
+
is overlayed
with arrows corresponding to each of the first four shape vectors

,

,

, and

.
2.1.2 Appearance
The appearance of an independent AAM is defined within the base mesh

. That way only pixels
that are relevant to the phenomenon are modeled, and background pixels can be ignored. Let
,
also denote the set of pixels
-
./0
that lie inside the base mesh

, a convenient abuse of
3

Citations
More filters
Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations


Cites background from "Active Appearance Models Revisited"

  • ...…(Lucas and Kanade 1981; Shi and Tomasi 1994; Rehg and Kanade 1994), often applied to tracking faces (Figure 1.9d) (Lanitis, Taylor, and Cootes 1997; Matthews and Baker 2004; Matthews, Xiao, and Baker 2007) and whole bodies (Sidenbladh, Black, and Fleet 2000; Hilton, Fua, and Ronfard 2006;…...

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: The Cohn-Kanade (CK+) database is presented, with baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data.
Abstract: In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limitations have become apparent: 1) While AU codes are well validated, emotion labels are not, as they refer to what was requested rather than what was actually performed, 2) The lack of a common performance metric against which to evaluate new algorithms, and 3) Standard protocols for common databases have not emerged. As a consequence, the CK database has been used for both AU and emotion detection (even though labels for the latter have not been validated), comparison with benchmark algorithms is missing, and use of random subsets of the original database makes meta-analyses difficult. To address these and other concerns, we present the Extended Cohn-Kanade (CK+) database. The number of sequences is increased by 22% and the number of subjects by 27%. The target expression for each sequence is fully FACS coded and emotion labels have been revised and validated. In addition to this, non-posed sequences for several types of smiles and their associated metadata have been added. We present baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data. The emotion and AU labels, along with the extended image data and tracked landmarks will be made available July 2010.

3,439 citations


Cites methods from "Active Appearance Models Revisited"

  • ...Keyframes within each video sequence were manually labelled, while the remaining frames were automatically aligned using a gradient descent AAM fitting algorithm described in [18]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a wide variety of extensions have been made to the original formulation of the Lucas-Kanade algorithm and their extensions can be used with the inverse compositional algorithm without any significant loss of efficiency.
Abstract: Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most widely used techniques in computer vision Applications range from optical flow and tracking to layered motion, mosaic construction, and face coding Numerous algorithms have been proposed and a wide variety of extensions have been made to the original formulation We present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework We concentrate on the inverse compositional algorithm, an efficient algorithm that we recently proposed We examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which cannot In this paper, Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent approximation In future papers, we will cover the choice of the error function, how to allow linear appearance variation, and how to impose priors on the parameters

3,168 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: It is shown that tree-structured models are surprisingly effective at capturing global elastic deformation, while being easy to optimize unlike dense graph structures, in real-world, cluttered images.
Abstract: We present a unified model for face detection, pose estimation, and landmark estimation in real-world, cluttered images. Our model is based on a mixtures of trees with a shared pool of parts; we model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint. We show that tree-structured models are surprisingly effective at capturing global elastic deformation, while being easy to optimize unlike dense graph structures. We present extensive results on standard face benchmarks, as well as a new “in the wild” annotated dataset, that suggests our system advances the state-of-the-art, sometimes considerably, for all three tasks. Though our model is modestly trained with hundreds of faces, it compares favorably to commercial systems trained with billions of examples (such as Google Picasa and face.com).

2,340 citations


Cites background from "Active Appearance Models Revisited"

  • ...Facial landmark estimation dates back to the classic approaches of Active Appearance Models (AAMs) [9, 26] and elastic graph matching [25, 39]....

    [...]

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A Supervised Descent Method (SDM) is proposed for minimizing a Non-linear Least Squares (NLS) function and achieves state-of-the-art performance in the problem of facial feature detection.
Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, this paper proposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-of-the-art performance in the problem of facial feature detection. The code is available at www.humansensing.cs. cmu.edu/intraface.

2,138 citations


Cites background from "Active Appearance Models Revisited"

  • ...A major difference between SDM and discriminative method to fit AAMs [11], is that [11] only uses one step regression, which as shown in our experiments leads to lower performance....

    [...]

  • ...The ground truth is given by a person-specific AAMs [23]....

    [...]

  • ...Constrained Local Models (CLM) [13] model this prior similarly as AAMs assuming all faces lie in a linear subspace expanded by PCA bases....

    [...]

  • ...Cootes et al. [11] proposed to fit AAMs by learning a linear regression between the increment of motion parameters ∆p and the appearance differences ∆d....

    [...]

References
More filters
Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

Journal ArticleDOI
Abstract: We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.

6,200 citations

Proceedings ArticleDOI
01 Jul 1999
TL;DR: A new technique for modeling textured 3D faces by transforming the shape and texture of the examples into a vector space representation, which regulates the naturalness of modeled faces avoiding faces with an “unlikely” appearance.
Abstract: In this paper, a new technique for modeling textured 3D faces is introduced. 3D faces can either be generated automatically from one or more photographs, or modeled directly through an intuitive user interface. Users are assisted in two key problems of computer aided face modeling. First, new face images or new 3D face models can be registered automatically by computing dense one-to-one correspondence to an internal face model. Second, the approach regulates the naturalness of modeled faces avoiding faces with an “unlikely” appearance. Starting from an example set of 3D face models, we derive a morphable face model by transforming the shape and texture of the examples into a vector space representation. New faces and expressions can be modeled by forming linear combinations of the prototypes. Shape and texture constraints derived from the statistics of our example faces are used to guide manual modeling or automated matching algorithms. We show 3D face reconstructions from single images and their applications for photo-realistic image manipulations. We also demonstrate face manipulations according to complex parameters such as gender, fullness of a face or its distinctiveness.

4,514 citations


"Active Appearance Models Revisited" refers background or methods in this paper

  • ...This class includes Active Appearance Models (AAMs) (Cootes et al., 1998a, 2001; Edwards, 1999; Edwards et al., 1998; Lanitis et al., 1997), Shape AAMs (Cootes et al., 1998b; Cootes and Kittipanyangam, 2002; Cootes et al., 2001), Direct Appearance Models (Hou et al., 2001), Active Blobs (Sclaroff and Isidoro, 2003), and Morphable Models (Blanz and Vetter, 1999; Jones and Poggio, 1998; Vetter and Poggio, 1997) as well as possibly others....

    [...]

  • ...Keywords: Active Appearance Models, AAMs, Active Blobs, Morphable Models, fitting, efficiency, GaussNewton gradient descent, inverse compositional image alignment...

    [...]

  • ...…Shape AAMs (Cootes et al., 1998b; Cootes and Kittipanyangam, 2002; Cootes et al., 2001), Direct Appearance Models (Hou et al., 2001), Active Blobs (Sclaroff and Isidoro, 2003), and Morphable Models (Blanz and Vetter, 1999; Jones and Poggio, 1998; Vetter and Poggio, 1997) as well as possibly others....

    [...]

  • ...For example, Levenberg-Marquardt was used in Sclaroff and Isidoro (1998) and a stochastic gradient descent algorithm was used in Blanz and Vetter (1999) and Jones and Poggio (1998)....

    [...]

  • ...Another thing that makes empirical evaluation hard is the wide variety of AAM fitting algorithms (Blanz and Vetter, 1999; Cootes et al., 1998a, 2001; Jones and Poggio, 1998; Sclaroff and Isidoro, 1998) and the lack of a standard test set....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a wide variety of extensions have been made to the original formulation of the Lucas-Kanade algorithm and their extensions can be used with the inverse compositional algorithm without any significant loss of efficiency.
Abstract: Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most widely used techniques in computer vision Applications range from optical flow and tracking to layered motion, mosaic construction, and face coding Numerous algorithms have been proposed and a wide variety of extensions have been made to the original formulation We present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework We concentrate on the inverse compositional algorithm, an efficient algorithm that we recently proposed We examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which cannot In this paper, Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent approximation In future papers, we will cover the choice of the error function, how to allow linear appearance variation, and how to impose priors on the parameters

3,168 citations

Journal ArticleDOI
TL;DR: This work presents tools for hierarchical clustering of imaged objects according to the shapes of their boundaries, learning of probability models for clusters of shapes, and testing of newly observed shapes under competing probability models.
Abstract: Using a differential-geometric treatment of planar shapes, we present tools for: 1) hierarchical clustering of imaged objects according to the shapes of their boundaries, 2) learning of probability models for clusters of shapes, and 3) testing of newly observed shapes under competing probability models. Clustering at any level of hierarchy is performed using a minimum variance type criterion and a Markov process. Statistical means of clusters provide shapes to be clustered at the next higher level, thus building a hierarchy of shapes. Using finite-dimensional approximations of spaces tangent to the shape space at sample means, we (implicitly) impose probability models on the shape space, and results are illustrated via random sampling and classification (hypothesis testing). Together, hierarchical clustering and hypothesis testing provide an efficient framework for shape retrieval. Examples are presented using shapes and images from ETH, Surrey, and AMCOM databases.

2,858 citations

Frequently Asked Questions (10)
Q1. What are the contributions in "Active appearance models revisited" ?

The authors propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. The authors show how the appearance variation can be “ projected out ” using this algorithm and how the algorithm can be extended to include a “ shape normalizing ” warp, typically a 2D similarity transformation. The authors evaluate their algorithm to determine which of its novel aspects improve AAM fitting performance. 

In future work the authors hope to look into these questions and determine which of these extensions can be used with their inverse compositional AAM fitting algorithm and which can not. 

Given the current estimates of the shape parameters, it is possible to warp the input image backwards onto the model coordinate frame and then compute an error image between the current model instance and the image that the AAM is being fit to. 

a 2D similarity transformation (translation, rotation, and scale) is used, although an affine warp, a homography, or any other global warp could be used instead. 

The authors do not measure the accuracy of the appearance parameters because once the shape parameters have been estimated, estimating the appearance parameters is a simple linear operation. 

Perhaps the most natural way of minimizing the expression in Equation (7) is to use a standard gradient descent optimization algorithm. 

Because the training data is normal-The Inverse Compositional Algorithm with Appearance Variationized in this way, the linear shape variation in the AAM does not model the translation, rotation, and scaling in the original training data. 

If the authors raster scan the mesh the authors can avoid looking up * * * * */. */0 most of the time by creating a lookup table for the triangle identity that codes when the triangle identity changes. 

The original AAM formulation [Cootes et al., 1998] estimated the update functions 3 - and 4 - by systematically perturbing the model parameters 2* and and recording the corresponding error image - . 

Implementing this forwards warping to generate the model instance A without holes (see Figure 3) is actually somewhat tricky (and is best performed by backwards warping with the inverse warp from to + .)