Lucas-Kanade 20 Years On: A Unifying Framework
Summary (6 min read)
1. Introduction
- Image alignment consists of moving, and possibly deforming, a template to minimize the difference between the template and an image.
- The authors propose a unifying framework for image alignment, describing the various algorithms and their extensions in a consistent manner.
- The authors prove the first order equivalence of the various alternatives, derive the efficiency of the resulting algorithms, describe the set of warps that each alternative can be applied to, and finally empirically compare the algorithms.
2.1. Goal of the Lucas-Kanade Algorithm
- The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and the image I warped back onto the coordinate frame of the template: ∑ x [I (W(x; p)) − T (x)]2 .
- (3) Warping I back to compute I (W(x; p)) requires interpolating the image I at the sub-pixel locations W(x; p).
- The minimization of the expression in Eq. (3) is performed with respect to p and the sum is performed over all of the pixels x in the template image T (x).
- In fact, the pixel values I (x) are essentially un-related to the pixel coordinates x.
2.2. Derivation of the Lucas-Kanade Algorithm
- The Lucas-Kanade algorithm (which is a GaussNewton gradient descent non-linear optimization algorithm) is then derived as follows.
- The term ∂W ∂p is the Jacobian of the warp.
- This convention has the advantage that the chain rule results in a matrix multiplication, as in the expression in Eq. (6).
- (8) Minimizing the expression in Eq. (6) is a least squares problem and has a closed from solution which can be derived as follows.
- In general, however, all 9 steps of the algorithm must be repeated in every iteration because the estimates of the parameters p vary from iteration to iteration.
2.3. Requirements on the Set of Warps
- The only requirement on the warps W(x; p) is that they are differentiable with respect to the warp parameters p.
- This condition is required to compute the Jacobian ∂W ∂p .
- Normally the warps are also differentiable with respect to x, but even this condition is not strictly required.
2.4. Computational Cost of the Lucas-Kanade Algorithm
- Step 1 of the Lucas-Kanade algorithm usually takes time O(n N ).
- The computational cost of computing W(x; p) depends on W but for most warps the cost is O(n) per pixel.
- Step 8 takes time O(n3) to invert the Hessian matrix and time O(n2) to multiply the result by the steepest descent parameter updated computed in Step 7.
- The total computational cost of each iteration is therefore O(n2 N +n3), the most expensive step being Step 6.
- See Table 1 for a summary of these computational costs.
3. The Quantity Approximated and the Warp Update Rule
- In each iteration Lucas-Kanade approximately minimizes ∑ x [ I (W(x; p + p)) − T (x) ]2 with respect to p and then updates the estimates of the parameters in Step 9 p ← p + p. Perhaps somewhat surprisingly iterating these two steps is not the only way to minimize the expression in Eq. (3).
- In this section the authors outline 3 alternative approaches that are all provably equivalent to the Lucas-Kanade algorithm.
- The authors then show empirically that they are equivalent.
3.1. Compositional Image Alignment
- The first alternative to the Lucas-Kanade algorithm is the compositional algorithm.
- The authors refer to the Lucas-Kanade algorithm in Eqs. (4) and (5) as the additive approach to contrast it with the compositional approach in Eqs. (12) and (13).
- The compositional and additive approaches are proved to be equivalent to first order in p in Section 3.1.5.
3.1.2. Derivation of the Compositional Algorithm.
- In order to proceed the authors make one assumption.
- It is also generally simpler analytically (Shum and Szeliski, 2000).
- The authors therefore have two requirements on the set of warps: (1) the set of warps must contain the identity warp and (2) the set of warps must be closed under composition.
- The computational cost of the compositional algorithm is almost exactly the same as that of the Lucas-Kanade algorithm.
- (Note that in the second of these expressions ∂W ∂p is evaluated at (x; 0), rather than at (x; p) in the first expression.).
3.2. Inverse Compositional Image Alignment
- As a number of authors have pointed out, there is a huge computational cost in re-evaluating the Hessian in every iteration of the Lucas-Kanade algorithm (Hager and Belhumeur, 1998; Dellaert and Collins, 1999; Shum and Szeliski, 2000).
- If the Hessian were constant it could be precomputed and then re-used.
- Each iteration of the algorithm (see Fig. 1) would then just consist of an image warp (Step 1), an image difference (Step 2), a collection of image “dot-products” (Step 7), multiplication of the result by the Hessian (Step 8), and the update to the parameters (Step 9).
- All of these operations can be performed at (close to) frame-rate (Dellaert and Collins, 1999).
- 2000)) these approximations are inelegant and it is often hard to say how good approximations they are.
3.2.1. Goal of the Inverse Compositional Algorithm.
- The key to efficiency is switching the role of the image and the template, as in Hager and Belhumeur (1998), where a change of variables is made to switch or invert the roles of the template and the image.
- The only difference from the update in the forwards compositional algorithm in Eq. (13) is that the incremental warp W(x; p) is inverted before it is composed with the current estimate.
- Fortunately, most warps used in computer vision, including homographies and 3D rotations (Shum and Szeliski, 2000), do form groups.
- The most time consuming step, the computation of the Hessian in Step 6 can be performed once as a pre-computation.
- The authors now show that the inverse compositional algorithm is equivalent to the forwards compositional algorithm introduced in Section 3.1.
3.3. Inverse Additive Image Alignment
- A natural question which arises at this point is whether the same trick of changing variables to convert Eq. (37) into Eq. (38) can be applied in the additive formulation.
- The simplification to the Jacobian in Eq. (39) therefore cannot be made.
- The term ∂W −1 ∂y has to be included in an inverse additive algorithm in some form or other.
3.3.1. Goal of the Inverse Additive Algorithm. An
- Image alignment algorithm that addresses this difficulty is the Hager-Belhumeur algorithm (Hager and Belhumeur, 1998).
- The Hager-Belhumeur algorithm does fit into their framework as an inverse additive algorithm.
- The template and the image are then switched by deriving the relationship between ∇I and ∇T .
3.3.2. Derivation of the Inverse Additive Algorithm.
- It is obviously possible to write down the solution to Eq. (45) in terms of the Hessian, just like in Section 2.2.
- So, in the naive approach, the Hessian will have to be re-computed in each iteration and the resulting algorithm will be just as inefficient as the original Lucas-Kanade algorithm.
- The product of the two Jacobians has to be able to be written in the form of Eq. (46) for the Hager-Belhumeur algorithm to be used.
- In comparison the inverse compositional algorithm can be applied to any set of warps which form a group, a very weak requirement.
- The computational cost of the HagerBelhumeur algorithm is similar to that of the inverse compositional algorithm.
3.4. Empirical Validation
- The authors have proved mathematically that all four image alignment algorithms take the same steps to first order in p, at least on sets of warps where they can all be used.
- The authors then randomly perturbed these points with additive white Gaussian noise of a certain variance and fit for the affine warp parameters p that these 3 perturbed points define.
- As can be seen, the 4 algorithms (3 for the homography) all converge at almost exactly the same rate, again validating the equivalence of the four algorithms.
- The authors computed the percentage of times that each algorithm converged for various different variances of the noise added to the canonical point locations.
- When equal noise is added to both images, the forwards algorithms perform marginally better than the inverse algorithms because the inverse algorithms are only first-order approximations to the forwards algorithms.
3.5. Summary
- The authors have outlined three approaches to image alignment beyond the original forwards additive Lucas-Kanade algorithm.
- In Section 3.4 the authors validated this equivalence empirically.
- There is little difference between the two algorithms.
- Since on warps like affine warps the algorithms are almost exactly the same, there is no reason to use the inverse additive algorithm.
- The inverse compositional algorithm is equally efficient, conceptually more elegant, and more generally applicable than the inverse additive algorithm.
4. The Gradient Descent Approximation
- Most non-linear optimization and parameter estimation algorithms operate by iterating 2 steps.
- The first step approximately minimizes the optimality criterion, usually by making some sort of linear or quadratic approximation around the current estimate of the parameters.
- The inverse compositional algorithm, for example, approximately minimizes the expression in Eq. (31) and updates the parameters using Eq. (32).
- In Sections 2 and 3 above the authors outlined 4 equivalent quantity approximated-parameter update rule pairs.
- The approximation that the authors made in each case is known as the Gauss-Newton approximation.
4.3. Steepest Descent
- The simplest possibility is to approximate the Hessian as proportional to the identity matrix.
- This approach to determining c has the obvious problem that it requires the Hessian ∑ x ∂2G ∂p2 .
- The computational cost of the Gauss-Newton steepest descent algorithm is almost exactly the same as the original inverse compositional algorithm.
- See Table 8 for a summary and Baker and Matthews (2002) for the details.
4.4. The Diagonal Approximation to the Hessian
- The steepest descent algorithm can be regarded as approximating the Hessian with the identity matrix.
- This approximation is commonly used in optimization problems with a large number of parameters.
- Examples of this diagonal approximation in vision include stereo (Szeliski and Golland, 1998) and super-resolution (Baker and Kanade, 2000).
- Overall, the pre-computation only takes time O(nN ) and the cost per iteration is only O(nN + n2).
- The diagonal approximation to the Hessian makes the Newton inverse compositional algorithm far more efficient.
4.5. The Levenberg-Marquardt Algorithm
- Of the various approximations, generally the steepest descent and diagonal approximations work better further away from the local minima, and the Newton and Gauss-Newton approximations work better close to the local minima where the quadratic approximation is good (Gill et al., 1986; Press et al., 1992).
- For large δ 1, the Hessian is approximately the Gauss-Newton diagonal approximation to the Hessian, but with a reduced step size of 1 δ .
- If the error has increased, the provisional update to the parameters is reversed and δ increased, δ → δ × 10, say.
- The re-ordering doesn’t affect the computation cost of the algorithm; it only marginally increases the pre-computation time, although not asymptotically.
- Overall the Levenberg-Marquardt algorithm is just as efficient as the Gauss-Newton inverse compositional algorithm.
4.6. Empirical Validation
- The authors have described six variants of the inverse compositional image alignment algorithm: Gauss-Newton, Newton, Gauss-Newton steepest descent, diagonal Hessian (Gauss-Newton and Newton), and LevenbergMarquardt.
- In Fig. 15(a) the authors plot the average frequency of convergence (computed over 5000 samples) with no intensity noise.
- The steepest descent and diagonal Hessian approximations perform very poorly.
- For affine warps the parameterization in Eq. (1) is not the only way.
- As can be seen, all of the algorithms are roughly equally fast except the Newton algorithm which is much slower.
4.7. Summary
- In this section the authors investigated the choice of the gradient descent approximation.
- The authors have exhibited five alternatives: (1) Newton, (2) steepest descent, (3) diagonal approximation to the Gauss-Newton Hessian, (4) diagonal approximation to the Newton Hessian, and (5) Levenberg-Marquardt.
- These three algorithms are also very sensitive to the estimation of the step size and the parameterization of the warps.
- The most likely reason is the noise introduced in computing the second derivatives of the template.
- Except for the Newton algorithm all of the alternatives are equally as efficient as the Gauss-Newton algorithm when combined with the inverse compositional algorithm.
4.8. Other Algorithms
- These are not the only choices.the authors.
- The focus of Section 4 has been approximating the Hessian: the Gauss-Newton approximation, the steepest descent approximation, and the diagonal approximations.
- In Shum and Szeliski (2000) an algorithm is proposed to estimate the Gauss-Newton Hessian for the forwards compositional algorithm, but in an efficient manner.
- One reason that computing the Hessian matrix is so time consuming is that it is a sum over the entire template.
- Since the coefficients are constant they can be pre-computed.
5. Discussion
- The authors have described a unifying framework for image alignment consisting of two halves.
- The results of the first half are summarized in Table 6.
- The algorithms differ in both their computational complexity and their empirical performance.
- Overall the choice of which algorithm to use depends on two main things: (1) whether there is likely to me more noise in the template or in the input image and (2) whether the algorithm needs to be efficient or not.
- The diagonal Hessian and steepest descent forwards algorithms are another option, but given their poor convergence properties it is probably better to use the inverse compositional algorithm even if the template is noisy.
6. Matlab Code, Test Images, and Scripts
- Matlab implementations of all of the algorithms described in this paper are available on the World Wide Web at: http://www.ri.cmu.edu/projects/ project 515.html.
- The authors have also included all of the test images and the scripts used to generate the experimental results in this paper.
Acknowledgments
- The authors would like to thank Bob Collins, Matthew Deans, Frank Dellaert, Daniel Huber, Takeo Kanade, Jianbo Shi, Sundar Vedula, and Jing Xiao for discussions on image alignment, and Sami Romdhani for pointing out a couple of algebraic errors in a preliminary draft of this paper.
- The authors would also like to thank the anonymous IJCV reviewers and the CVPR reviewers of Baker and Matthews (2001) for their feedback.
- The research described in this paper was conducted under U.S. Office of Naval Research contract N00014-00-1-0915.
Did you find this useful? Give us your feedback
Citations
4,146 citations
3,828 citations
3,772 citations
3,170 citations
3,137 citations
References
12,944 citations
11,285 citations
3,905 citations
1,775 citations
1,501 citations
Related Papers (5)
Frequently Asked Questions (13)
Q2. What have the authors stated for future works in "Lucas-kanade 20 years on: a unifying framework" ?
In future papers in this series the authors will extend their framework to cover these choices and, in particular, investigate whether the inverse compositional algorithm is compatible with these extensions of the Lucas-Kanade algorithm.
Q3. What is the goal of the Lucas-Kanade algorithm?
The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and the image The authorwarped back onto the coordinate frame of the template:∑ x [I (W(x; p)) − T (x)]2 .
Q4. What is the advantage of the forwards compositional algorithm?
The forwards compositional algorithm has the slight advantage that the Jacobian is constant, and is in general simpler so is less likely to be computed erroneously (Shum and Szeliski, 2000).
Q5. What is the effect of additive noise on the inverse algorithms?
Since the forwards algorithms compute the gradient of The authorand the inverse algorithms compute the gradient of T , it is not surprising that when noise is added to The authorthe inverse algorithms converge better (faster and more frequently), and conversely when noise is added to T the forwards algorithms converge better.
Q6. What is the way to apply the inverse additive algorithm to a warp?
The inverse additive algorithm can be applied to very few warps, mostly simple 2D linear warps such as translations and affine warps.
Q7. What is the cost of the Hager-Belhumeur algorithm?
In most of the steps, the cost is a function of k rather than n, but most of the time k = n in the Hager-Belhumeur algorithm anyway.
Q8. What is the cost of composing the two warps in step 9?
Most notably, the cost of composing the two warps in Step 9 depends on W but for most warps the cost is O(n2) or less, including for the affine warps in Eq. (16).3.1.5. Equivalence of the Additive and Compositional Algorithms.
Q9. What is the partial derivative of the expression in Eq. (6) with respect to p?
The partial derivative of the expression in Eq. (6) with respect to p is:2 ∑x[ ∇I ∂W∂p]T[ The author(W(x; p)) + ∇I ∂W∂p p − T (x) ] (9)where the authors refer to ∇I ∂W ∂p as the steepest descent images.
Q10. What is the error measure for the Hager-Belhumeural algorithm?
Since the 6 parameters in the affine warp have different units, the authors use the following error measure rather than the errors in the parameters.
Q11. What is the first step in the Hager-Belhumeur algorithm?
The initial goal of the Hager-Belhumeur algorithm is the same as the Lucas-Kanade algorithm; i.e. to minimize ∑ x [ The author(W(x; p + p)) − T (x) ]2 with respect to p and then update the parameters p ← p + p. Rather than changing variables like in Section 3.2.5, the roles of the template and the image are switched as follows.
Q12. What is the reason to use the inverse additive algorithm?
Since on warps like affine warps the algorithms are almost exactly the same, there is no reason to use the inverse additive algorithm.
Q13. What is the difference between the forwards and the inverse algorithms?
When equal noise is added to both images, the forwards algorithms perform marginally better than the inverse algorithms because the inverse algorithms are only first-order approximations to the forwards algorithms.