Journal Article•DOI•

Active Appearance Models Revisited

Iain Matthews¹, Simon Baker¹•Institutions (1)

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 135-164

TL;DR: This work proposes an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm and shows that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp.

read less

Abstract: Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities. Fitting an AAM to an image consists of minimising the error between the input image and the closest model instances i.e. solving a nonlinear optimisation problem. We propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. We show that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp, typically a 2D similarity transformation. We evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.

...read moreread less

Summary (8 min read)

Jump to: [1 Introduction] – [2 Linear Shape and Appearance Models: AAMs] – [2.1.1 Shape] – [2.1.2 Appearance] – [2.1.3 Model Instantiation] – [2.2 Combined AAMs] – [2.3.1 Fitting Goal] – [2.3.2 Inefficient Gradient Descent Algorithms] – [2.3.3 Efficient Ad-Hoc Fitting Algorithms] – [3 Efficient Gradient Descent Image Alignment] – [3.1 Lucas-Kanade Image Alignment] – [3.2 Forwards Compositional Image Alignment] – [3.3 Inverse Compositional Image Alignment] – [4 Applying the Inverse Compositional Algorithm to AAMs] – [4.1 Application Without Appearance Variation] – [4.1.1 Piecewise Affine Warping] – [4.1.2 Computing the Warp Jacobian] – [4.1.3 Warp Inversion] – [4.1.4 Composing the Incremental Warp with the Current Warp Estimate] – [4.2 Including Appearance Variation] – [4.3 Including a Global Shape Transform] – [4.3.1 Adding a Global Shape Transform to an AAM] – [4.3.2 Fitting an AAM with a Global Shape Transform] – [4.4 Other Extensions to the Algorithm] – [5 Empirical Evaluation] – [5.1 Generating the Inputs] – [5.2 The Evaluation Metrics] – [5.3 Experiment 1: The Update Rule] – [5.4 Experiment 2: Computation of the Steepest Descent Images] – [5.5 Experiment 3: Appearance Variation] – [5.6 Computational Efficiency] – [6.1 Summary] – [6.2 Discussion] and [Acknowledgments]

1 Introduction

Active Appearance Models (AAMs) [Cootes et al., 2001], first proposed in [Cootes et al., 1998], and the closely related concepts of Active Blobs [Sclaroff and Isidoro, 1998] and Morphable Models [Vetter and Poggio, 1997, Jones and Poggio, 1998, Blanz and Vetter, 1999], are non-linear, generative, and parametric models of a certain visual phenomenon.
The parameters could be passed to a classifier to yield a face recognition algorithm.
The usual approach [Lanitis et al., 1997, Cootes et al., 1998, Cootes et al., 2001, Cootes, 2001] is to iteratively solve for incremental additive updates to the parameters (the shape and appearance coefficients.).
The inverse compositional algorithm is only applicable to sets of warps that form a group.
The linear shape variation of AAMs is often augmented by combining it with a 2D similarity transformation to “normalize” the shape.

2 Linear Shape and Appearance Models: AAMs

Active Appearance Models are just one instance in a large class of closely related linear shape and appearance models (and their associated fitting algorithms.).
The authors also wanted to avoid introducing any new, and potentially confusing, terminology.
One thing that is particularly confusing is that the terminology often refers to the combination of a model and a fitting algorithm.
In particular, the authors use the term AAM to refer to the model, independent of the fitting algorithm.
In essence there are just two type of linear shape and appearance models, those which model shape and appearance independently, and those which parameterize shape and appearance with a single set of linear parameters.

2.1.1 Shape

As the name suggests, independent AAMs model shape and appearance separately.
The shape of an independent AAM is defined by a mesh and in particular the vertex locations of the mesh.
Usually the mesh is triangulated (although there are ways to avoid triangulating the mesh by using thin plate splines rather than piecewise affine warping [Cootes, 2001].).
Since the authors can easily perform a linear reparameterization, wherever necessary they assume that the vectors are orthonormal.
In the remainder of the figure, the base mesh + is overlayed with arrows corresponding to each of the first four shape vectors , , , and .

2.1.2 Appearance

The appearance of an independent AAM is defined within the base mesh .
That way only pixels that are relevant to the phenomenon are modeled, and background pixels can be ignored.
Let , also denote the set of pixels - . / 0 that lie inside the base mesh , a convenient abuse of terminology.
Since the authors can easily perform a linear reparameterization, wherever necessary they assume that the images are orthonormal.
The base appearance is set to be the mean image and the images to be the eigenimages corresponding to the largest eigenvalues.

2.1.3 Model Instantiation

Equations (2) and (3) describe the AAM shape and appearance variation.
They do not describe how to generate a model instance.
The AAM model instance with shape parameters and appearance parameters is then created by warping the appearance from the base mesh to the model shape .
In particular, the pair of meshes and define a piecewise affine warp from to .
Thin plate splines could be used instead [Cootes, 2001].

2.2 Combined AAMs

Independent AAMs have separate shape and appearance parameters.
See the discussion at the end of this paper.
On the other hand, combined AAMs have a number of advantages.
First, the combined formulation is more general and is a strict superset of the independent formulation.
Since the authors will project out the appearance variation, as discussed in Section 4.2, the computational cost of their new algorithm is mainly just dependent on the number of shape parameters " and does not depend significantly on the number of appearance parameters .

2.3.1 Fitting Goal

Suppose the authors are given an input image - that they wish to fit an AAM to.
Suppose for now that the authors know the optimal shape and appearance parameters in the fit.
This means that the image - and the model instance A - - must be similar.
At the pixel - , the input image has the intensity - .
The error image is defined in the coordinate frame of the AAM and can be computed as follows.

2.3.2 Inefficient Gradient Descent Algorithms

Perhaps the most natural way of minimizing the expression in Equation (7) is to use a standard gradient descent optimization algorithm.
LevenbergMarquardt was used in [Sclaroff and Isidoro, 1998] and a stochastic gradient descent algorithm was used in [Jones and Poggio, 1998, Blanz and Vetter, 1999].
The advantage of these algorithms is that they use a principled, analytical algorithm, the convergence properties of which are well understood.
The disadvantage of these gradient descent algorithms is that they are very slow.
The partial derivatives, Hessian, and gradient direction all need to be recomputed in each iteration.

2.3.3 Efficient Ad-Hoc Fitting Algorithms

Because all previous gradient descent algorithms are so slow, a considerable amount of effort has been devoted in the past to developing other fitting algorithms that are more efficient [Cootes et al., 2001, Cootes et al., 1998, Sclaroff and Isidoro, 1998].
To improve the efficiency, previous AAM fitting algorithms such as [Cootes et al., 2001,Cootes et al., 1998, Sclaroff and Isidoro, 1998] have either explicitly or implicitly simply assumed that 3 - and 4 - do not depend on the model parameters.
But this time by not as far.
Similarly, although in this counter example the direction of 2 * is correct and it is just the magnitude that is wrong, other counterexamples can be provided where there the error images are the same, but the directions of the 2 * are different.
The use of difference decomposition in [Sclaroff and Isidoro, 1998] makes the constant linear assumption in Equation (24) of that paper.).

3 Efficient Gradient Descent Image Alignment

As described above, existing AAM fitting algorithms fall into one of two categories.
Instead it is possible to update the entire warp by composing the current warp with the computed incremental warp with parameters 2 .
In particular, it is possible to update: - 5 - - 2 (11) This compositional approach is different, yet provably equivalent, to the usual additive approach [Baker and Matthews, 2003].

3.1 Lucas-Kanade Image Alignment

The goal of image alignment is to find the location of a constant template image in an input image.
The goal of Lucas-Kanade is to find the locally “best” alignment by minimizing the sum of squares difference between a constant template image, + - say, and an example image - with respect to the warp parameters : ' + - - (12) Note the similarity with Equation (7).
As in Section 2 above, - is a warp that maps the pixels - from the template (i.e. the base mesh) image to the input image and has parameters .
Solving for is a nonlinear optimization problem, even if - is linear in because, in general, the pixels values - are nonlinear in (and essentially unrelated to) the pixel coordinates - .
In [Baker and Matthews, 2003] the authors refer to this as the forwards-additive algorithm.

3.2 Forwards Compositional Image Alignment

In the Lucas-Kanade algorithm the warp parameters are computed by estimating a 2 offset from the current warp parameters .
The compositional framework computes an incremental warp - 2 to be composed with the current warp - .
There are then two differences between Equation (19) and Equation (14).
The composition update step is computationally more costly than the update step for an additive algorithm, but this is offset by not having to compute the Jacobian in each iteration.
The key point in the forwards compositional algorithm, illustrated in Figure 6(a), is that the update is computed with respect to each time.

3.3 Inverse Compositional Image Alignment

The inverse compositional algorithm is a modification of the forwards compositional algorithm where the roles of the template and example image are reversed.
Rather than computing the incremental warp with respect to - it is computed with respect to the template - .
See Figure 7 for the details of the algorithm.
The only additional computation is Steps 8 and 9 which are very efficient.
These two steps essentially correct for the current estimates of the parameters and avoid the problem illustrated in Figure 4.

4 Applying the Inverse Compositional Algorithm to AAMs

The authors now show how the inverse compositional algorithm can be applied to independent AAMs.
The algorithm does not apply to combined AAMs.
See Section 6.2 for more discussion of why not.

4.1 Application Without Appearance Variation

The authors first describe how the algorithm applies without any appearance variation; i.e. when .
Comparing Equation (7) with Equation (12) the authors see that if there is no appearance variation, the inverse compositional algorithm applies as is.
Examining Figure 7 the authors find that most of the steps in the algorithm are standard vector, matrix, and image operations such as computing image gradients and image differences.
The only non-standard steps are: Step 1 warping with the piecewise affine warp - , Step 4 computing the Jacobian of the piecewise affine warp, and Step 9 inverting the incremental piecewise affine warp and composing it with the current estimate of the piecewise affine warp.
The authors now describe how each of these steps is performed.

4.1.1 Piecewise Affine Warping

The image - is computed by backwards warping the input image with the warp - ; i.e. for each pixel - in the base mesh the authors compute - and sample (bilinearly interpolate) the image at that location.
Suppose that the vertices of that triangle are , , and .
These vertices can be computed from the shape parameters using Equation (2).
One way to implement the piecewise affine warp is illustrated in Figure 8.
This computation only needs to be performed once per triangle, not once per pixel.

4.1.2 Computing the Warp Jacobian

The destination of the pixel - under the piecewise affine warp - depends on the AAM shape parameters through the vertices of the mesh .
From Equation (1) remember that these vertices are denoted: +.
The first components of the Jacobian are and , the Jacobians of the warp with respect to the vertices of the mesh .
(32) where denotes the component of that corresponds to and similarly for, also known as Differentiating Equation (2) gives.
The array of 7 (base mesh shaped) images correspond to the Jacobian for the four shape vectors in Figure 1.

4.1.3 Warp Inversion

It therefore follows that to first order in 2 : - 2 - 2 (35) Note that the two Jacobians in Equation (34) are not evaluated at exactly the same location, but since they are evaluated at points 2 apart, they are equal to zeroth order in 2 .
Since the difference is multiplied by 2 the authors can ignore the first and higher order terms.
Also note that the composition of two warps is not strictly defined and so the argument in Equation (34) is informal.
The essence of the argument is correct, however.
Once the authors have the derived the first order approximation to the composition of two piecewise affine warps below, they can then use that definition of composition in the argument above.

4.1.4 Composing the Incremental Warp with the Current Warp Estimate

Given the current estimate of the parameters the authors can compute the current mesh vertex locations using Equation (2).
Given these parameters, the authors can use Equation (2) again to estimate the corresponding changes to the base mesh vertex locations: 2 &' )( 2 * (36) where 2 2 2 2 2 are the changes to the base mesh vertex locations corresponding to - 2 .
The situation is then as illustrated in Figure 11.
Now consider any of the mesh triangles that contains the vertex.
For this triangle there is an affine warp between the base mesh and the current mesh .

4.2 Including Appearance Variation

The authors have now described all of the steps needed to apply the inverse compositional algorithm to an independent AAM assuming that there is no appearance variation.
More generally, the authors wish to use the same algorithm to minimize the expression in Equation (7).
The first of the two terms immediately simplifies.
Since the norm only considers the components of vectors in the orthogonal complement of , any component in itself can be dropped.
(The error image does not need to be projected into this subspace because Step 7 of the algorithm is really the dot product of the error image with .

4.3 Including a Global Shape Transform

The most common way of constructing an AAM [Cootes et al., 2001] consists of first “normalizing” the mesh so that it is as close as possible to the base mesh [Cootes et al., 2001].
Typically, a 2D similarity transformation (translation, rotation, and scale) is used, although an affine warp, a homography, or any other global warp could be used instead.
Because the training data is normal- The Inverse Compositional Algorithm with Appearance Variation ized in this way, the linear shape variation in the AAM does not model the translation, rotation, and scaling in the original training data.
To avoid this problem, the linear shape variation is typically augmented with a global shape transformation in the following manner.

4.3.1 Adding a Global Shape Transform to an AAM

Finally, is the rotation, again set so that when there is no rotation.
Other natural choices are affine warps and homographies [Bergen et al., 1992].
Note that the above is not the only way to parameterize the set of 2D similarity transformations.
The shape vector then “moves” the mouth vertices up and down in the image.
This shape variation is only possible if the linear AAM shape variation is followed by a global shape transformation, as defined in Equation (44).

4.3.2 Fitting an AAM with a Global Shape Transform

The authors now briefly describe how the inverse compositional algorithm can be used to fit an AAM with a global shape transformation; i.e. how to apply the inverse compositional algorithm to the warp: - - (46) rather than the warp - . since this is slightly simpler.
The authors just compute separate Jacobians for - and - for the and parameters respectively.
There are a variety of ways of performing the composition of the two warps.
These are then converted to changes in the destination mesh vertex locations 2 2 by applying the affine warp of - to each triangle and then averaging the results.
The equivalent of Equation (37) is then used to compute the new values of .

4.4 Other Extensions to the Algorithm

The authors have described how the inverse compositional image alignment algorithm can be applied to AAMs.
The field of image alignment is well studied and over the years a number of extensions Inverse Compositional Algorithm with Appearance Variation and Global Shape Transform multiplied by the inverse of the Hessian in Step 8 to give the dimensional vector 0 3 0 . . and heuristics have been developed to improve the performance of the algorithms.
The fitting algorithm can be applied hierarchically on a Gaussian image pyramid to reduce the likelihood of falling into a local minimum [Bergen et al., 1992].
See [Baker and Matthews, 2003] for the details of that algorithm.

5 Empirical Evaluation

The authors have proposed a new fitting algorithm for AAMs.
The performance can also be very dependent on minor details such as the definition of the gradient filter used to compute .
In their evaluation the authors take the following philosophy.
The authors compare their AAM fitting algorithm with and without each of these changes.
Every other line of code in the implementation is always exactly the same.

5.1 Generating the Inputs

The authors begin by taking a video of a person moving their head (both rigidly and non-rigidly.).
The authors avoid this problem by overlaying the reconstructed AAM over the original movie.
By comparing the performance of the algorithms on these two movies the authors should be able to detect any bias in the ground-truth.
Empirically the authors found almost no difference between the performance of any of the algorithms on the corresponding real and synthetic sequences and conclude there is no bias.
The shape parameters are randomly generated from independent Gaussian distributions with variance equal to the eigenvalue of that mode in the PCA performed during AAM construction.

5.2 The Evaluation Metrics

Given the initial parameters, the AAM fitting algorithm should hopefully converge to the groundtruth parameters.
The first measure is the average rate of convergence.
The authors average these graphs (for approximately the same starting error) over all cases where all algorithms converge.
The authors say that an algorithm converges if the RMS mesh point error is less than 9 pixels after 20 iterations.
Also, comparing the appearance algorithms by their appearance estimates is not possible because the appearance is not computed until the “project out” algorithm has converged.

5.3 Experiment 1: The Update Rule

In their first experiment the authors compare the inverse compositional update in Step 9 of the algorithm with the usual additive update of: 5 $ 2 .
The results without the global similarity transform are included in Figure 14 and the results with it are included in Figure 15.
Either the rate of convergence is faster or the frequency of convergence is higher.
These results also illustrate that the inverse compositional algorithm does not always outperform the additive algorithm.
It just performs better in many scenarios, and similarly in others.

5.4 Experiment 2: Computation of the Steepest Descent Images

The inverse compositional algorithm uses the steepest descent images .
(All other aspects of the algorithm are exactly the same.
Specifically the authors use the inverse compositional update in both cases.).
The results in Figure 16 show that the analytically derived steepest descent images performing significantly better.
The results show that the analytically derived images are significantly better.

5.5 Experiment 3: Appearance Variation

In their algorithm the authors “project out” the appearance variation.
There are potentially benefits of iteratively updating the appearance parameters.
The steepest descent image for the appearance parameter is - .
The authors plot results comparing the “project out appearance” approach with the “explicitly model appearance” approach in Figure 17.
The results in Figure 17 show that the two approaches perform almost identically.

5.6 Computational Efficiency

One concern with the inverse compositional algorithm is that the time taken to perform the inverse compositional update might be quite long.
The actual calculation is minimal because the number of vertices in the mesh is far less than the number of pixels.
The results were obtained on a dual 2.4GHz P4 machine and are for a model with 19,977 pixels, 3 shape parameters, 4 similarity parameters, and 9 appearance parameters.
Overall their Matlab implementation of the inverse compositional algorithm operates at approximately 5Hz.
The authors “C” implementation operates at approximately 150Hz on the same machine.

6.1 Summary

These results were obtained on a dual 2.4GHz P4 machine and are for a model with 19,977 pixels, 3 shape parameters, 4 similarity parameters, and 9 appearance parameters.
Overall the inverse compositional algorithm runs at about 5Hz in Matlab.
In this paper the authors have proposed an algorithm for fitting AAMs that has the advantages of both types of algorithms.
Overall their algorithm outperforms previous approaches in terms of: (1) the speed of convergence (far fewer iterations are needed to converge to any give accuracy), (2) the frequency of convergence (our algorithm is more likely to convergence from a large distance away), and (3) the computational cost (the algorithm is far faster because the appearance variation is projected out.).

6.2 Discussion

The inverse compositional AAM fitting algorithm can only be applied to independent AAMs.
It cannot be applied to combined AAMs which parameterize the shape and appearance variation with a single set of parameters and so introduce a coupling between shape and appearance.
In practice it is not.
The nonlinear optimization in their algorithm is only over the " shape parameters and so is actually lower dimensional than the equivalent combined AAM optimization which would have more than " parameters.
Currently the authors do not see a way to extend their algorithm to combined AAMs, but of course they may be wrong.

Acknowledgments

The authors thank Tim Cootes for discussions on the incorporation of the global shape transformation in Section 4.3.
Elements of the AAM fitting algorithm appeared in [Baker and Matthews, 2001].
The authors thank the reviewers of [Baker and Matthews, 2001] for their feedback.
The research described in this paper was conducted under U.S. Office of Naval Research contract N00014-00-1-0915.
Additional support was provided through U.S. Department of Defense award N41756-03-C4024.

Did you find this useful? Give us your feedback

Figures (16)

Figure 3: An example of AAM instantiation. The shape parameters are used to compute the model shape and the appearance parameters are used to compute the model appearance . The model appearance is defined in the base mesh . The pair of meshes and define a (piecewise affine) warp from to which we denote . The final AAM model instance, denoted , is computed by forwards warping the appearance from to using .

Figure 13: The inverse compositional AAM fitting algorithm with appearance variation and global shape transform. Most of the computation is similar to that in Figure 12 for the inverse compositional algorithm with appearance variation, except there are shape parameters, in and in . We therefore compute modified steepest descent images in Step 5, a Hessian matrix in Step 6, and steepest descent parameter updates in Step 7. These updates are formed into a dimensional vector and multiplied by the inverse of the Hessian in Step 8 to give the dimensional vector .

Figure 7: The inverse compositional algorithm [Baker and Matthews, 2003]. All of the computationally demanding steps are performed in a pre-computation step. The main algorithm simply consists of image warping (Step 1), image differencing (Step 2), image dot products (Step 7), multiplication with the inverse of the Hessian (Step 8), and the update to the warp (Step 9). All of these steps can be implemented efficiently.

Figure 14: The results of comparing the additive and compositional updates to the warp in Step 9 of the algorithm without including a global shape normalizing transform. We plot the rate of convergence in (a), (c), and (e), and the frequency of convergence in (b), (d), and (f). In (a) and (b) we just perturb the shape parameters, in (c) and (d) we just perturb the similarity parameters, and in (e) and (f) we perturb both sets of parameters. The results in all cases show the compositional update to be better than the additive update.

Figure 12: The inverse compositional algorithm with appearance variation. The only differences from the algorithm without appearance variation are: (1) the expression in Equation (42) is computed in place of the steepest descent images in Step 5, and (2) Step 10 is added to compute the appearance parameters.

Figure 17: A comparison of the “project out appearance” approach described in Section 4.2 with the usual approach of “explicitly modeling appearance” used in [Cootes et al., 2001,Cootes, 2001]. The results show no significant difference in the performance of the two approaches. The “project out appearance” approach is therefore the better choice because it is far more computationally efficient. See Table 1.

Figure 10: The Jacobian corresponding to the four shape vectors in Figure 1. The first mode

Figure 6: (a) A schematic overview of the forwards-compositional image alignment algorithm. Given current estimates of the parameters, the forwards compositional algorithm solves for an incremental warp rather than a simple update to the parameters . The incremental warp is then composed with the current estimate of the warp: . The main advantage of the forwards compositional algorithm is that the alignment is computed between and starting from . The Jacobian is therefore always evaluated at and so is constant across iterations. (b) A schematic overview of the inverse-compositional image alignment algorithm. The roles of and are reversed and the incremental warp is estimated in the other (inverse) direction. The incremental warp therefore has to be inverted before it is composed with the current estimate of the warp: . The main advantage of the inverse compositional algorithm is that the Hessian and are both constant and so can be precomputed. Since this is most of the computation, the resulting algorithm is very efficient. See Figure 7 for the details of the algorithm.

Figure 9: The Jacobians and with respect to the vertices of the mesh for 3 different vertices . The component of the Jacobian is in the top row and the component is in the bottom row.

Figure 16: A comparison of using the analytically derived steepest descent images versus the

Figure 5: A schematic overview of the Lucas-Kanade (forwards-additive) image alignment algorithm. Given current estimates of the parameters , Lucas-Kanade linearizes the problem and solves for incremental updates to the parameters that are then added to the current estimates .

Figure 4: A counterexample showing that the linear relationship between and the error image does not have constant coefficients. We exhibit a simple AAM and two input images which have the same error image, but yet the correct update to the parameter is different in the two cases.

Figure 15: The results of comparing the additive and compositional updates to the warp in Step 9 of the algorithm including a global shape normalizing transform (a similarity transform.) We perform the same experiments as in Figure 14. These results with the global similarity transform are all generally better than the corresponding results in Figure 14. The compositional update generally outperforms the additive update. The one exception is in (a) and (b). The global similarity transform partially hides the problem described in Section 2.3.3, however the problem is just transfered to the similarity parameters and so the overall results in (e) and (f) still show that the additive update performs worse than for the compositional update.

Figure 11: Composing the incremental warp with the current warp . We know the current mesh and incremental updates to the base mesh . We need to compute incremental updates to the current mesh . This can be performed by applying the affine warp for each triangle about the vertex to to obtain multiple estimates of . In general these estimates will differ because the affine warps for each triangle are different. In essence this is why composing two piecewise affine warps is hard. We average the multiple estimates to compute the new mesh vertex locations. The new warp parameters can then be computed using Equation (37).

Figure 2: The linear appearance variation of an independent AAM. The model consists of a base appearance image defined on the pixels inside the base mesh plus a linear combination of appearance images also defined on the same set of pixels. See Equation (3) for the formal definition of the model.

Figure 8: Computing the piecewise affine warp . Each pixel in the base mesh lies in a triangle. The pixel can be decomposed into one vertex plus times a vector down one side of the triangle plus times a vector down the other side of the triangle. The destination of under the piecewise affine warp is the equivalent expression for the other triangle in mesh .

Content maybe subject to copyright Report

Active Appearance Models Revisited

Iain Matthews and Simon Baker

CMU-RI-TR-03-02

The Robotics Institute

Carnegie Mellon University

Abstract

Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and

Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape

and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities.

Fitting an AAM to an image consists of minimizing the error between the input image and the clos-

est model instance; i.e. solving a nonlinear optimization problem. We propose an efﬁcient ﬁtting

algorithm for AAMs based on the inverse compositional image alignment algorithm. We show how

the appearance variation can be “projected out” using this algorithm and how the algorithm can

be extended to include a “shape normalizing” warp, typically a 2D similarity transformation. We

evaluate our algorithm to determine which of its novel aspects improve AAM ﬁtting performance.

Keywords: Active Appearance Models, AAMs, Active Blobs, Morphable Models, ﬁtting, efﬁ-

ciency, Gauss-Newton gradient descent, inverse compositional image alignment.

1 Introduction

Active Appearance Models (AAMs) [Cootes et al., 2001], ﬁrst proposed in [Cootes et al., 1998],

and the closely related concepts of Active Blobs [Sclaroff and Isidoro, 1998] and Morphable Mod-

els [Vetter and Poggio, 1997, Jones and Poggio, 1998, Blanz and Vetter, 1999], are non-linear,

generative, and parametric models of a certain visual phenomenon. The most frequent application

of AAMs to date has been face modeling [Lanitis et al., 1997]. However, AAMs may be useful for

other phenomena too [Sclaroff and Isidoro, 1998, Jones and Poggio, 1998]. Typically an AAM is

ﬁrst ﬁt to an image of a face; i.e. the model parameters are found to maximize the “match” between

the model instance and the input image. The model parameters are then used in whatever the ap-

plication is. For example, the parameters could be passed to a classiﬁer to yield a face recognition

algorithm. Many different classiﬁcation tasks are possible. In [Lanitis et al., 1997], for example,

the same model was used for face recognition, pose estimation, and expression recognition. AAMs

are a general-purpose image coding scheme, like Principal Components Analysis, but non-linear.

Fitting an AAM to an image is a non-linear optimization problem. The usual approach [Lanitis

et al., 1997, Cootes et al., 1998, Cootes et al., 2001,Cootes, 2001] is to iteratively solve for in-

cremental additive updates to the parameters (the shape and appearance coefﬁcients.) Given the

current estimates of the shape parameters, it is possible to warp the input image backwards onto

the model coordinate frame and then compute an error image between the current model instance

and the image that the AAM is being ﬁt to. In most previous algorithms, it is simply assumed

that there is a constant linear relationship between this error image and the additive incremental

updates to the parameters. The constant coefﬁcients in this linear relationship can then be found

either by linear regression [Lanitis et al., 1997] or by other numerical methods [Cootes, 2001].

Unfortunately the assumption that there is such a simple relationship between the error image

and the appropriate update to the model parameters is in general incorrect. See Section 2.3.3 for a

counterexample. The result is that existing AAM ﬁtting algorithms perform poorly, both in terms

of the number of iterations required to converge, and in terms of the accuracy of the ﬁnal ﬁt. In

this paper we propose a new analytical AAM ﬁtting algorithm that does not make this simplifying

assumption. Our algorithm in based on an extension to the inverse compositional image alignment

algorithm [Baker and Matthews, 2001, Baker and Matthews, 2003]. The inverse compositional

algorithm is only applicable to sets of warps that form a group. Unfortunately, the set of piecewise

afﬁne warps used in AAMs does not form a group. Hence, to use the inverse compositional algo-

rithm, we derive ﬁrst order approximations to the group operators of composition and inversion.

Using the inverse compositional algorithm also allows a different treatment of the appearance

variation. Using the approach proposed in [Hager and Belhumeur, 1998], we project out the ap-

pearance variation thereby eliminating a great deal of computation. This modiﬁcation is closely

related to “shape AAMs” [Cootes and Kittipanya-ngam, 2002]. Another feature that we include is

shape normalization. The linear shape variation of AAMs is often augmented by combining it with

a 2D similarity transformation to “normalize” the shape. We show how the inverse compositional

algorithm can be used to simultaneously ﬁt the combination of the two warps (the linear AAM

shape variation and a following global shape transformation, usually a 2D similarity transform.)

2 Linear Shape and Appearance Models: AAMs

Although they are perhaps the most well-known example, Active Appearance Models are just

one instance in a large class of closely related linear shape and appearance models (and their

associated ﬁtting algorithms.) This class contains Active Appearance Models (AAMs) [Cootes et

al., 2001, Cootes et al., 1998, Lanitis et al., 1997], Shape AAMs [Cootes and Kittipanya-ngam,

2002], Active Blobs [Sclaroff and Isidoro, 1998], Morphable Models [Vetter and Poggio, 1997,

Jones and Poggio, 1998,Blanz and Vetter, 1999], and Direct Appearance Models [Hou et al., 2001],

as well as possibly others. Many of these models were proposed independently in 1997-1998

[Lanitis et al., 1997,Vetter and Poggio, 1997,Cootes et al., 1998,Sclaroff and Isidoro, 1998,Jones

and Poggio, 1998]. In this paper we use the term “Active Appearance Model” to refer generically to

the entire class of linear shape and appearance models. We chose the term AAM rather than Active

Blob or Morphable Model solely because it seems to have stuck better in the vision literature, not

because the term was introduced earlier or because AAMs have any particular technical advantage.

We also wanted to avoid introducing any new, and potentially confusing, terminology.

Unfortunately the previous literature is already very confusing. One thing that is particularly

confusing is that the terminology often refers to the combination of a model and a ﬁtting algorithm.

For example, Active Appearance Models [Cootes et al., 2001] strictly only refers to a speciﬁc

model and an algorithm for ﬁtting that model. Similarly, Direct Appearance Models [Hou et al.,

2001] refers to a different model-ﬁtting algorithm pair. In order to simplify the terminology we

will a make clear distinction between models and algorithms, even though this sometimes means

we will have to abuse previous terminology. In particular, we use the term AAM to refer to the

model, independent of the ﬁtting algorithm. We also use AAM to refer to a slightly larger class of

models than that described in [Cootes et al., 2001]. We hope that this simpliﬁes the situation.

In essence there are just two type of linear shape and appearance models, those which model

shape and appearance independently, and those which parameterize shape and appearance with a

single set of linear parameters. We refer to the ﬁrst set as independent linear shape and appearance

models and the second as combined shape and appearance models. Since the name AAM has

stuck, we will also refer to the ﬁrst set as independent AAMs and the second as combined AAMs.

    

Figure 1: The linear shape model of an independent AAM. The model consists of a triangulated base mesh





plus a linear combination of



shape vectors



. See Equation (2). Here we display the base mesh on the

far left and to the right four shape vectors













, and





overlayed on the base mesh.

2.1 Independent AAMs

2.1.1 Shape

As the name suggests, independent AAMs model shape and appearance separately. The shape

of an independent AAM is deﬁned by a mesh and in particular the vertex locations of the mesh.

Usually the mesh is triangulated (although there are ways to avoid triangulating the mesh by using

thin plate splines rather than piecewise afﬁne warping [Cootes, 2001].) Mathematically, we deﬁne

the shape



of an AAM as the coordinates of the



vertices that make up the mesh:

  !

(1)

See Figure 1 for an example mesh. AAMs allow linear shape variation. This means that the shape



can be expressed as a base shape



plus a linear combination of

shape vectors





# %$ &

)(

*









(2)

In this expression the coefﬁcients



are the shape parameters. Since we can easily perform a linear

reparameterization, wherever necessary we assume that the vectors





are orthonormal.

AAMs are normally computed from training data. The standard approach is to apply Principal

Component Analysis (PCA) to the training meshes [Cootes et al., 2001]. The base shape



is the

mean shape and the vectors



are the

eigenvectors corresponding to the

largest eigenvalues.

An example independent AAM shape model is shown in Figure 1. On the left of the ﬁgure, we

plot the triangulated base mesh



. In the remainder of the ﬁgure, the base mesh

+

is overlayed

with arrows corresponding to each of the ﬁrst four shape vectors







, and



2.1.2 Appearance

The appearance of an independent AAM is deﬁned within the base mesh



. That way only pixels

that are relevant to the phenomenon are modeled, and background pixels can be ignored. Let

,

also denote the set of pixels

./0



that lie inside the base mesh



, a convenient abuse of

HTML Viewer

Frequently Asked Questions (10)

Q1. What are the contributions in "Active appearance models revisited" ?

The authors propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. The authors show how the appearance variation can be “ projected out ” using this algorithm and how the algorithm can be extended to include a “ shape normalizing ” warp, typically a 2D similarity transformation. The authors evaluate their algorithm to determine which of its novel aspects improve AAM fitting performance.

Q2. What are the future works in "Active appearance models revisited" ?

In future work the authors hope to look into these questions and determine which of these extensions can be used with their inverse compositional AAM fitting algorithm and which can not.

Q3. What is the common way to fit an AAM to an image?

Given the current estimates of the shape parameters, it is possible to warp the input image backwards onto the model coordinate frame and then compute an error image between the current model instance and the image that the AAM is being fit to.

Q4. What is the common way to parameterize the set of 2D similarity transformations?

a 2D similarity transformation (translation, rotation, and scale) is used, although an affine warp, a homography, or any other global warp could be used instead.

Q5. Why do the authors not measure the accuracy of the appearance parameters?

The authors do not measure the accuracy of the appearance parameters because once the shape parameters have been estimated, estimating the appearance parameters is a simple linear operation.

Q6. What is the natural way of minimizing the expression in Equation (7)?

Perhaps the most natural way of minimizing the expression in Equation (7) is to use a standard gradient descent optimization algorithm.

Q7. Why is the AAM not modelled in the original training data?

Because the training data is normal-The Inverse Compositional Algorithm with Appearance Variationized in this way, the linear shape variation in the AAM does not model the translation, rotation, and scaling in the original training data.

Q8. How do the authors avoid looking up the triangle identity?

If the authors raster scan the mesh the authors can avoid looking up * * * * */. */0 most of the time by creating a lookup table for the triangle identity that codes when the triangle identity changes.

Q9. How did the original AAM formulation estimate the update functions?

The original AAM formulation [Cootes et al., 1998] estimated the update functions 3 - and 4 - by systematically perturbing the model parameters 2* and and recording the corresponding error image - .

Q10. What is the way to generate the model instance A without holes?

Implementing this forwards warping to generate the model instance A without holes (see Figure 3) is actually somewhat tricky (and is best performed by backwards warping with the inverse warp from to + .)

Active Appearance Models Revisited

Summary (8 min read)

1 Introduction

2 Linear Shape and Appearance Models: AAMs

2.1.1 Shape

2.1.2 Appearance

2.1.3 Model Instantiation

2.2 Combined AAMs

2.3.1 Fitting Goal

2.3.2 Inefficient Gradient Descent Algorithms

2.3.3 Efficient Ad-Hoc Fitting Algorithms

3 Efficient Gradient Descent Image Alignment

3.1 Lucas-Kanade Image Alignment

3.2 Forwards Compositional Image Alignment

3.3 Inverse Compositional Image Alignment

4 Applying the Inverse Compositional Algorithm to AAMs

4.1 Application Without Appearance Variation

4.1.1 Piecewise Affine Warping

4.1.2 Computing the Warp Jacobian

4.1.3 Warp Inversion

4.1.4 Composing the Incremental Warp with the Current Warp Estimate

4.2 Including Appearance Variation

4.3 Including a Global Shape Transform

4.3.1 Adding a Global Shape Transform to an AAM

4.3.2 Fitting an AAM with a Global Shape Transform

4.4 Other Extensions to the Algorithm

5 Empirical Evaluation

5.1 Generating the Inputs

5.2 The Evaluation Metrics

5.3 Experiment 1: The Update Rule

5.4 Experiment 2: Computation of the Steepest Descent Images

5.5 Experiment 3: Appearance Variation

5.6 Computational Efficiency

6.1 Summary

6.2 Discussion

Acknowledgments

Figures (16)

Citations

Cites background from "Active Appearance Models Revisited"

Cites methods from "Active Appearance Models Revisited"

Cites background from "Active Appearance Models Revisited"

Cites background from "Active Appearance Models Revisited"

References