Active Appearance Models Revisited
Summary (8 min read)
1 Introduction
- Active Appearance Models (AAMs) [Cootes et al., 2001], first proposed in [Cootes et al., 1998], and the closely related concepts of Active Blobs [Sclaroff and Isidoro, 1998] and Morphable Models [Vetter and Poggio, 1997, Jones and Poggio, 1998, Blanz and Vetter, 1999], are non-linear, generative, and parametric models of a certain visual phenomenon.
- The parameters could be passed to a classifier to yield a face recognition algorithm.
- The usual approach [Lanitis et al., 1997, Cootes et al., 1998, Cootes et al., 2001, Cootes, 2001] is to iteratively solve for incremental additive updates to the parameters (the shape and appearance coefficients.).
- The inverse compositional algorithm is only applicable to sets of warps that form a group.
- The linear shape variation of AAMs is often augmented by combining it with a 2D similarity transformation to “normalize” the shape.
2 Linear Shape and Appearance Models: AAMs
- Active Appearance Models are just one instance in a large class of closely related linear shape and appearance models (and their associated fitting algorithms.).
- The authors also wanted to avoid introducing any new, and potentially confusing, terminology.
- One thing that is particularly confusing is that the terminology often refers to the combination of a model and a fitting algorithm.
- In particular, the authors use the term AAM to refer to the model, independent of the fitting algorithm.
- In essence there are just two type of linear shape and appearance models, those which model shape and appearance independently, and those which parameterize shape and appearance with a single set of linear parameters.
2.1.1 Shape
- As the name suggests, independent AAMs model shape and appearance separately.
- The shape of an independent AAM is defined by a mesh and in particular the vertex locations of the mesh.
- Usually the mesh is triangulated (although there are ways to avoid triangulating the mesh by using thin plate splines rather than piecewise affine warping [Cootes, 2001].).
- Since the authors can easily perform a linear reparameterization, wherever necessary they assume that the vectors are orthonormal.
- In the remainder of the figure, the base mesh + is overlayed with arrows corresponding to each of the first four shape vectors , , , and .
2.1.2 Appearance
- The appearance of an independent AAM is defined within the base mesh .
- That way only pixels that are relevant to the phenomenon are modeled, and background pixels can be ignored.
- Let , also denote the set of pixels - . / 0 that lie inside the base mesh , a convenient abuse of terminology.
- Since the authors can easily perform a linear reparameterization, wherever necessary they assume that the images are orthonormal.
- The base appearance is set to be the mean image and the images to be the eigenimages corresponding to the largest eigenvalues.
2.1.3 Model Instantiation
- Equations (2) and (3) describe the AAM shape and appearance variation.
- They do not describe how to generate a model instance.
- The AAM model instance with shape parameters and appearance parameters is then created by warping the appearance from the base mesh to the model shape .
- In particular, the pair of meshes and define a piecewise affine warp from to .
- Thin plate splines could be used instead [Cootes, 2001].
2.2 Combined AAMs
- Independent AAMs have separate shape and appearance parameters.
- See the discussion at the end of this paper.
- On the other hand, combined AAMs have a number of advantages.
- First, the combined formulation is more general and is a strict superset of the independent formulation.
- Since the authors will project out the appearance variation, as discussed in Section 4.2, the computational cost of their new algorithm is mainly just dependent on the number of shape parameters " and does not depend significantly on the number of appearance parameters .
2.3.1 Fitting Goal
- Suppose the authors are given an input image - that they wish to fit an AAM to.
- Suppose for now that the authors know the optimal shape and appearance parameters in the fit.
- This means that the image - and the model instance A - - must be similar.
- At the pixel - , the input image has the intensity - .
- The error image is defined in the coordinate frame of the AAM and can be computed as follows.
2.3.2 Inefficient Gradient Descent Algorithms
- Perhaps the most natural way of minimizing the expression in Equation (7) is to use a standard gradient descent optimization algorithm.
- LevenbergMarquardt was used in [Sclaroff and Isidoro, 1998] and a stochastic gradient descent algorithm was used in [Jones and Poggio, 1998, Blanz and Vetter, 1999].
- The advantage of these algorithms is that they use a principled, analytical algorithm, the convergence properties of which are well understood.
- The disadvantage of these gradient descent algorithms is that they are very slow.
- The partial derivatives, Hessian, and gradient direction all need to be recomputed in each iteration.
2.3.3 Efficient Ad-Hoc Fitting Algorithms
- Because all previous gradient descent algorithms are so slow, a considerable amount of effort has been devoted in the past to developing other fitting algorithms that are more efficient [Cootes et al., 2001, Cootes et al., 1998, Sclaroff and Isidoro, 1998].
- To improve the efficiency, previous AAM fitting algorithms such as [Cootes et al., 2001,Cootes et al., 1998, Sclaroff and Isidoro, 1998] have either explicitly or implicitly simply assumed that 3 - and 4 - do not depend on the model parameters.
- But this time by not as far.
- Similarly, although in this counter example the direction of 2 * is correct and it is just the magnitude that is wrong, other counterexamples can be provided where there the error images are the same, but the directions of the 2 * are different.
- The use of difference decomposition in [Sclaroff and Isidoro, 1998] makes the constant linear assumption in Equation (24) of that paper.).
3 Efficient Gradient Descent Image Alignment
- As described above, existing AAM fitting algorithms fall into one of two categories.
- Instead it is possible to update the entire warp by composing the current warp with the computed incremental warp with parameters 2 .
- In particular, it is possible to update: - 5 - - 2 (11) This compositional approach is different, yet provably equivalent, to the usual additive approach [Baker and Matthews, 2003].
3.1 Lucas-Kanade Image Alignment
- The goal of image alignment is to find the location of a constant template image in an input image.
- The goal of Lucas-Kanade is to find the locally “best” alignment by minimizing the sum of squares difference between a constant template image, + - say, and an example image - with respect to the warp parameters : ' + - - (12) Note the similarity with Equation (7).
- As in Section 2 above, - is a warp that maps the pixels - from the template (i.e. the base mesh) image to the input image and has parameters .
- Solving for is a nonlinear optimization problem, even if - is linear in because, in general, the pixels values - are nonlinear in (and essentially unrelated to) the pixel coordinates - .
- In [Baker and Matthews, 2003] the authors refer to this as the forwards-additive algorithm.
3.2 Forwards Compositional Image Alignment
- In the Lucas-Kanade algorithm the warp parameters are computed by estimating a 2 offset from the current warp parameters .
- The compositional framework computes an incremental warp - 2 to be composed with the current warp - .
- There are then two differences between Equation (19) and Equation (14).
- The composition update step is computationally more costly than the update step for an additive algorithm, but this is offset by not having to compute the Jacobian in each iteration.
- The key point in the forwards compositional algorithm, illustrated in Figure 6(a), is that the update is computed with respect to each time.
3.3 Inverse Compositional Image Alignment
- The inverse compositional algorithm is a modification of the forwards compositional algorithm where the roles of the template and example image are reversed.
- Rather than computing the incremental warp with respect to - it is computed with respect to the template - .
- See Figure 7 for the details of the algorithm.
- The only additional computation is Steps 8 and 9 which are very efficient.
- These two steps essentially correct for the current estimates of the parameters and avoid the problem illustrated in Figure 4.
4 Applying the Inverse Compositional Algorithm to AAMs
- The authors now show how the inverse compositional algorithm can be applied to independent AAMs.
- The algorithm does not apply to combined AAMs.
- See Section 6.2 for more discussion of why not.
4.1 Application Without Appearance Variation
- The authors first describe how the algorithm applies without any appearance variation; i.e. when .
- Comparing Equation (7) with Equation (12) the authors see that if there is no appearance variation, the inverse compositional algorithm applies as is.
- Examining Figure 7 the authors find that most of the steps in the algorithm are standard vector, matrix, and image operations such as computing image gradients and image differences.
- The only non-standard steps are: Step 1 warping with the piecewise affine warp - , Step 4 computing the Jacobian of the piecewise affine warp, and Step 9 inverting the incremental piecewise affine warp and composing it with the current estimate of the piecewise affine warp.
- The authors now describe how each of these steps is performed.
4.1.1 Piecewise Affine Warping
- The image - is computed by backwards warping the input image with the warp - ; i.e. for each pixel - in the base mesh the authors compute - and sample (bilinearly interpolate) the image at that location.
- Suppose that the vertices of that triangle are , , and .
- These vertices can be computed from the shape parameters using Equation (2).
- One way to implement the piecewise affine warp is illustrated in Figure 8.
- This computation only needs to be performed once per triangle, not once per pixel.
4.1.2 Computing the Warp Jacobian
- The destination of the pixel - under the piecewise affine warp - depends on the AAM shape parameters through the vertices of the mesh .
- From Equation (1) remember that these vertices are denoted: +.
- The first components of the Jacobian are and , the Jacobians of the warp with respect to the vertices of the mesh .
- (32) where denotes the component of that corresponds to and similarly for, also known as Differentiating Equation (2) gives.
- The array of 7 (base mesh shaped) images correspond to the Jacobian for the four shape vectors in Figure 1.
4.1.3 Warp Inversion
- It therefore follows that to first order in 2 : - 2 - 2 (35) Note that the two Jacobians in Equation (34) are not evaluated at exactly the same location, but since they are evaluated at points 2 apart, they are equal to zeroth order in 2 .
- Since the difference is multiplied by 2 the authors can ignore the first and higher order terms.
- Also note that the composition of two warps is not strictly defined and so the argument in Equation (34) is informal.
- The essence of the argument is correct, however.
- Once the authors have the derived the first order approximation to the composition of two piecewise affine warps below, they can then use that definition of composition in the argument above.
4.1.4 Composing the Incremental Warp with the Current Warp Estimate
- Given the current estimate of the parameters the authors can compute the current mesh vertex locations using Equation (2).
- Given these parameters, the authors can use Equation (2) again to estimate the corresponding changes to the base mesh vertex locations: 2 &' )( 2 * (36) where 2 2 2 2 2 are the changes to the base mesh vertex locations corresponding to - 2 .
- The situation is then as illustrated in Figure 11.
- Now consider any of the mesh triangles that contains the vertex.
- For this triangle there is an affine warp between the base mesh and the current mesh .
4.2 Including Appearance Variation
- The authors have now described all of the steps needed to apply the inverse compositional algorithm to an independent AAM assuming that there is no appearance variation.
- More generally, the authors wish to use the same algorithm to minimize the expression in Equation (7).
- The first of the two terms immediately simplifies.
- Since the norm only considers the components of vectors in the orthogonal complement of , any component in itself can be dropped.
- (The error image does not need to be projected into this subspace because Step 7 of the algorithm is really the dot product of the error image with .
4.3 Including a Global Shape Transform
- The most common way of constructing an AAM [Cootes et al., 2001] consists of first “normalizing” the mesh so that it is as close as possible to the base mesh [Cootes et al., 2001].
- Typically, a 2D similarity transformation (translation, rotation, and scale) is used, although an affine warp, a homography, or any other global warp could be used instead.
- Because the training data is normal- The Inverse Compositional Algorithm with Appearance Variation ized in this way, the linear shape variation in the AAM does not model the translation, rotation, and scaling in the original training data.
- To avoid this problem, the linear shape variation is typically augmented with a global shape transformation in the following manner.
4.3.1 Adding a Global Shape Transform to an AAM
- Finally, is the rotation, again set so that when there is no rotation.
- Other natural choices are affine warps and homographies [Bergen et al., 1992].
- Note that the above is not the only way to parameterize the set of 2D similarity transformations.
- The shape vector then “moves” the mouth vertices up and down in the image.
- This shape variation is only possible if the linear AAM shape variation is followed by a global shape transformation, as defined in Equation (44).
4.3.2 Fitting an AAM with a Global Shape Transform
- The authors now briefly describe how the inverse compositional algorithm can be used to fit an AAM with a global shape transformation; i.e. how to apply the inverse compositional algorithm to the warp: - - (46) rather than the warp - . since this is slightly simpler.
- The authors just compute separate Jacobians for - and - for the and parameters respectively.
- There are a variety of ways of performing the composition of the two warps.
- These are then converted to changes in the destination mesh vertex locations 2 2 by applying the affine warp of - to each triangle and then averaging the results.
- The equivalent of Equation (37) is then used to compute the new values of .
4.4 Other Extensions to the Algorithm
- The authors have described how the inverse compositional image alignment algorithm can be applied to AAMs.
- The field of image alignment is well studied and over the years a number of extensions Inverse Compositional Algorithm with Appearance Variation and Global Shape Transform multiplied by the inverse of the Hessian in Step 8 to give the dimensional vector 0 3 0 . . and heuristics have been developed to improve the performance of the algorithms.
- The fitting algorithm can be applied hierarchically on a Gaussian image pyramid to reduce the likelihood of falling into a local minimum [Bergen et al., 1992].
- See [Baker and Matthews, 2003] for the details of that algorithm.
5 Empirical Evaluation
- The authors have proposed a new fitting algorithm for AAMs.
- The performance can also be very dependent on minor details such as the definition of the gradient filter used to compute .
- In their evaluation the authors take the following philosophy.
- The authors compare their AAM fitting algorithm with and without each of these changes.
- Every other line of code in the implementation is always exactly the same.
5.1 Generating the Inputs
- The authors begin by taking a video of a person moving their head (both rigidly and non-rigidly.).
- The authors avoid this problem by overlaying the reconstructed AAM over the original movie.
- By comparing the performance of the algorithms on these two movies the authors should be able to detect any bias in the ground-truth.
- Empirically the authors found almost no difference between the performance of any of the algorithms on the corresponding real and synthetic sequences and conclude there is no bias.
- The shape parameters are randomly generated from independent Gaussian distributions with variance equal to the eigenvalue of that mode in the PCA performed during AAM construction.
5.2 The Evaluation Metrics
- Given the initial parameters, the AAM fitting algorithm should hopefully converge to the groundtruth parameters.
- The first measure is the average rate of convergence.
- The authors average these graphs (for approximately the same starting error) over all cases where all algorithms converge.
- The authors say that an algorithm converges if the RMS mesh point error is less than 9 pixels after 20 iterations.
- Also, comparing the appearance algorithms by their appearance estimates is not possible because the appearance is not computed until the “project out” algorithm has converged.
5.3 Experiment 1: The Update Rule
- In their first experiment the authors compare the inverse compositional update in Step 9 of the algorithm with the usual additive update of: 5 $ 2 .
- The results without the global similarity transform are included in Figure 14 and the results with it are included in Figure 15.
- Either the rate of convergence is faster or the frequency of convergence is higher.
- These results also illustrate that the inverse compositional algorithm does not always outperform the additive algorithm.
- It just performs better in many scenarios, and similarly in others.
5.4 Experiment 2: Computation of the Steepest Descent Images
- The inverse compositional algorithm uses the steepest descent images .
- (All other aspects of the algorithm are exactly the same.
- Specifically the authors use the inverse compositional update in both cases.).
- The results in Figure 16 show that the analytically derived steepest descent images performing significantly better.
- The results show that the analytically derived images are significantly better.
5.5 Experiment 3: Appearance Variation
- In their algorithm the authors “project out” the appearance variation.
- There are potentially benefits of iteratively updating the appearance parameters.
- The steepest descent image for the appearance parameter is - .
- The authors plot results comparing the “project out appearance” approach with the “explicitly model appearance” approach in Figure 17.
- The results in Figure 17 show that the two approaches perform almost identically.
5.6 Computational Efficiency
- One concern with the inverse compositional algorithm is that the time taken to perform the inverse compositional update might be quite long.
- The actual calculation is minimal because the number of vertices in the mesh is far less than the number of pixels.
- The results were obtained on a dual 2.4GHz P4 machine and are for a model with 19,977 pixels, 3 shape parameters, 4 similarity parameters, and 9 appearance parameters.
- Overall their Matlab implementation of the inverse compositional algorithm operates at approximately 5Hz.
- The authors “C” implementation operates at approximately 150Hz on the same machine.
6.1 Summary
- These results were obtained on a dual 2.4GHz P4 machine and are for a model with 19,977 pixels, 3 shape parameters, 4 similarity parameters, and 9 appearance parameters.
- Overall the inverse compositional algorithm runs at about 5Hz in Matlab.
- In this paper the authors have proposed an algorithm for fitting AAMs that has the advantages of both types of algorithms.
- Overall their algorithm outperforms previous approaches in terms of: (1) the speed of convergence (far fewer iterations are needed to converge to any give accuracy), (2) the frequency of convergence (our algorithm is more likely to convergence from a large distance away), and (3) the computational cost (the algorithm is far faster because the appearance variation is projected out.).
6.2 Discussion
- The inverse compositional AAM fitting algorithm can only be applied to independent AAMs.
- It cannot be applied to combined AAMs which parameterize the shape and appearance variation with a single set of parameters and so introduce a coupling between shape and appearance.
- In practice it is not.
- The nonlinear optimization in their algorithm is only over the " shape parameters and so is actually lower dimensional than the equivalent combined AAM optimization which would have more than " parameters.
- Currently the authors do not see a way to extend their algorithm to combined AAMs, but of course they may be wrong.
Acknowledgments
- The authors thank Tim Cootes for discussions on the incorporation of the global shape transformation in Section 4.3.
- Elements of the AAM fitting algorithm appeared in [Baker and Matthews, 2001].
- The authors thank the reviewers of [Baker and Matthews, 2001] for their feedback.
- The research described in this paper was conducted under U.S. Office of Naval Research contract N00014-00-1-0915.
- Additional support was provided through U.S. Department of Defense award N41756-03-C4024.
Did you find this useful? Give us your feedback
Citations
4,146 citations
Cites background from "Active Appearance Models Revisited"
...…(Lucas and Kanade 1981; Shi and Tomasi 1994; Rehg and Kanade 1994), often applied to tracking faces (Figure 1.9d) (Lanitis, Taylor, and Cootes 1997; Matthews and Baker 2004; Matthews, Xiao, and Baker 2007) and whole bodies (Sidenbladh, Black, and Fleet 2000; Hilton, Fua, and Ronfard 2006;…...
[...]
3,439 citations
Cites methods from "Active Appearance Models Revisited"
...Keyframes within each video sequence were manually labelled, while the remaining frames were automatically aligned using a gradient descent AAM fitting algorithm described in [18]....
[...]
3,168 citations
2,340 citations
Cites background from "Active Appearance Models Revisited"
...Facial landmark estimation dates back to the classic approaches of Active Appearance Models (AAMs) [9, 26] and elastic graph matching [25, 39]....
[...]
2,138 citations
Cites background from "Active Appearance Models Revisited"
...A major difference between SDM and discriminative method to fit AAMs [11], is that [11] only uses one step regression, which as shown in our experiments leads to lower performance....
[...]
...The ground truth is given by a person-specific AAMs [23]....
[...]
...Constrained Local Models (CLM) [13] model this prior similarly as AAMs assuming all faces lie in a linear subspace expanded by PCA bases....
[...]
...Cootes et al. [11] proposed to fit AAMs by learning a linear regression between the increment of motion parameters ∆p and the appearance differences ∆d....
[...]
References
12,944 citations
6,200 citations
4,514 citations
"Active Appearance Models Revisited" refers background or methods in this paper
...This class includes Active Appearance Models (AAMs) (Cootes et al., 1998a, 2001; Edwards, 1999; Edwards et al., 1998; Lanitis et al., 1997), Shape AAMs (Cootes et al., 1998b; Cootes and Kittipanyangam, 2002; Cootes et al., 2001), Direct Appearance Models (Hou et al., 2001), Active Blobs (Sclaroff and Isidoro, 2003), and Morphable Models (Blanz and Vetter, 1999; Jones and Poggio, 1998; Vetter and Poggio, 1997) as well as possibly others....
[...]
...Keywords: Active Appearance Models, AAMs, Active Blobs, Morphable Models, fitting, efficiency, GaussNewton gradient descent, inverse compositional image alignment...
[...]
...…Shape AAMs (Cootes et al., 1998b; Cootes and Kittipanyangam, 2002; Cootes et al., 2001), Direct Appearance Models (Hou et al., 2001), Active Blobs (Sclaroff and Isidoro, 2003), and Morphable Models (Blanz and Vetter, 1999; Jones and Poggio, 1998; Vetter and Poggio, 1997) as well as possibly others....
[...]
...For example, Levenberg-Marquardt was used in Sclaroff and Isidoro (1998) and a stochastic gradient descent algorithm was used in Blanz and Vetter (1999) and Jones and Poggio (1998)....
[...]
...Another thing that makes empirical evaluation hard is the wide variety of AAM fitting algorithms (Blanz and Vetter, 1999; Cootes et al., 1998a, 2001; Jones and Poggio, 1998; Sclaroff and Isidoro, 1998) and the lack of a standard test set....
[...]
3,168 citations
2,858 citations
Related Papers (5)
Frequently Asked Questions (10)
Q2. What are the future works in "Active appearance models revisited" ?
In future work the authors hope to look into these questions and determine which of these extensions can be used with their inverse compositional AAM fitting algorithm and which can not.
Q3. What is the common way to fit an AAM to an image?
Given the current estimates of the shape parameters, it is possible to warp the input image backwards onto the model coordinate frame and then compute an error image between the current model instance and the image that the AAM is being fit to.
Q4. What is the common way to parameterize the set of 2D similarity transformations?
a 2D similarity transformation (translation, rotation, and scale) is used, although an affine warp, a homography, or any other global warp could be used instead.
Q5. Why do the authors not measure the accuracy of the appearance parameters?
The authors do not measure the accuracy of the appearance parameters because once the shape parameters have been estimated, estimating the appearance parameters is a simple linear operation.
Q6. What is the natural way of minimizing the expression in Equation (7)?
Perhaps the most natural way of minimizing the expression in Equation (7) is to use a standard gradient descent optimization algorithm.
Q7. Why is the AAM not modelled in the original training data?
Because the training data is normal-The Inverse Compositional Algorithm with Appearance Variationized in this way, the linear shape variation in the AAM does not model the translation, rotation, and scaling in the original training data.
Q8. How do the authors avoid looking up the triangle identity?
If the authors raster scan the mesh the authors can avoid looking up * * * * */. */0 most of the time by creating a lookup table for the triangle identity that codes when the triangle identity changes.
Q9. How did the original AAM formulation estimate the update functions?
The original AAM formulation [Cootes et al., 1998] estimated the update functions 3 - and 4 - by systematically perturbing the model parameters 2* and and recording the corresponding error image - .
Q10. What is the way to generate the model instance A without holes?
Implementing this forwards warping to generate the model instance A without holes (see Figure 3) is actually somewhat tricky (and is best performed by backwards warping with the inverse warp from to + .)