Journal Article•DOI•

Lucas-Kanade 20 Years On: A Unifying Framework

Simon Baker¹, Iain Matthews¹•Institutions (1)

01 Feb 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 56, Iss: 3, pp 221-255

TL;DR: In this paper, a wide variety of extensions have been made to the original formulation of the Lucas-Kanade algorithm and their extensions can be used with the inverse compositional algorithm without any significant loss of efficiency.

read less

Abstract: Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most widely used techniques in computer vision Applications range from optical flow and tracking to layered motion, mosaic construction, and face coding Numerous algorithms have been proposed and a wide variety of extensions have been made to the original formulation We present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework We concentrate on the inverse compositional algorithm, an efficient algorithm that we recently proposed We examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which cannot In this paper, Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent approximation In future papers, we will cover the choice of the error function, how to allow linear appearance variation, and how to impose priors on the parameters

...read moreread less

Summary (6 min read)

Jump to: [1. Introduction] – [2.1. Goal of the Lucas-Kanade Algorithm] – [2.2. Derivation of the Lucas-Kanade Algorithm] – [2.3. Requirements on the Set of Warps] – [2.4. Computational Cost of the Lucas-Kanade Algorithm] – [3. The Quantity Approximated and the Warp Update Rule] – [3.1. Compositional Image Alignment] – [3.1.2. Derivation of the Compositional Algorithm.] – [3.2. Inverse Compositional Image Alignment] – [3.2.1. Goal of the Inverse Compositional Algorithm.] – [3.3. Inverse Additive Image Alignment] – [3.3.1. Goal of the Inverse Additive Algorithm. An] – [3.3.2. Derivation of the Inverse Additive Algorithm.] – [3.4. Empirical Validation] – [3.5. Summary] – [4. The Gradient Descent Approximation] – [4.3. Steepest Descent] – [4.4. The Diagonal Approximation to the Hessian] – [4.5. The Levenberg-Marquardt Algorithm] – [4.6. Empirical Validation] – [4.7. Summary] – [4.8. Other Algorithms] – [5. Discussion] – [6. Matlab Code, Test Images, and Scripts] and [Acknowledgments]

1. Introduction

Image alignment consists of moving, and possibly deforming, a template to minimize the difference between the template and an image.
The authors propose a unifying framework for image alignment, describing the various algorithms and their extensions in a consistent manner.
The authors prove the first order equivalence of the various alternatives, derive the efficiency of the resulting algorithms, describe the set of warps that each alternative can be applied to, and finally empirically compare the algorithms.

2.1. Goal of the Lucas-Kanade Algorithm

The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and the image I warped back onto the coordinate frame of the template: ∑ x [I (W(x; p)) − T (x)]2 .
(3) Warping I back to compute I (W(x; p)) requires interpolating the image I at the sub-pixel locations W(x; p).
The minimization of the expression in Eq. (3) is performed with respect to p and the sum is performed over all of the pixels x in the template image T (x).
In fact, the pixel values I (x) are essentially un-related to the pixel coordinates x.

2.2. Derivation of the Lucas-Kanade Algorithm

The Lucas-Kanade algorithm (which is a GaussNewton gradient descent non-linear optimization algorithm) is then derived as follows.
The term ∂W ∂p is the Jacobian of the warp.
This convention has the advantage that the chain rule results in a matrix multiplication, as in the expression in Eq. (6).
(8) Minimizing the expression in Eq. (6) is a least squares problem and has a closed from solution which can be derived as follows.
In general, however, all 9 steps of the algorithm must be repeated in every iteration because the estimates of the parameters p vary from iteration to iteration.

2.3. Requirements on the Set of Warps

The only requirement on the warps W(x; p) is that they are differentiable with respect to the warp parameters p.
This condition is required to compute the Jacobian ∂W ∂p .
Normally the warps are also differentiable with respect to x, but even this condition is not strictly required.

2.4. Computational Cost of the Lucas-Kanade Algorithm

Step 1 of the Lucas-Kanade algorithm usually takes time O(n N ).
The computational cost of computing W(x; p) depends on W but for most warps the cost is O(n) per pixel.
Step 8 takes time O(n3) to invert the Hessian matrix and time O(n2) to multiply the result by the steepest descent parameter updated computed in Step 7.
The total computational cost of each iteration is therefore O(n2 N +n3), the most expensive step being Step 6.
See Table 1 for a summary of these computational costs.

3. The Quantity Approximated and the Warp Update Rule

In each iteration Lucas-Kanade approximately minimizes ∑ x [ I (W(x; p + p)) − T (x) ]2 with respect to p and then updates the estimates of the parameters in Step 9 p ← p + p. Perhaps somewhat surprisingly iterating these two steps is not the only way to minimize the expression in Eq. (3).
In this section the authors outline 3 alternative approaches that are all provably equivalent to the Lucas-Kanade algorithm.
The authors then show empirically that they are equivalent.

3.1. Compositional Image Alignment

The first alternative to the Lucas-Kanade algorithm is the compositional algorithm.
The authors refer to the Lucas-Kanade algorithm in Eqs. (4) and (5) as the additive approach to contrast it with the compositional approach in Eqs. (12) and (13).
The compositional and additive approaches are proved to be equivalent to first order in p in Section 3.1.5.

3.1.2. Derivation of the Compositional Algorithm.

In order to proceed the authors make one assumption.
It is also generally simpler analytically (Shum and Szeliski, 2000).
The authors therefore have two requirements on the set of warps: (1) the set of warps must contain the identity warp and (2) the set of warps must be closed under composition.
The computational cost of the compositional algorithm is almost exactly the same as that of the Lucas-Kanade algorithm.
(Note that in the second of these expressions ∂W ∂p is evaluated at (x; 0), rather than at (x; p) in the first expression.).

3.2. Inverse Compositional Image Alignment

As a number of authors have pointed out, there is a huge computational cost in re-evaluating the Hessian in every iteration of the Lucas-Kanade algorithm (Hager and Belhumeur, 1998; Dellaert and Collins, 1999; Shum and Szeliski, 2000).
If the Hessian were constant it could be precomputed and then re-used.
Each iteration of the algorithm (see Fig. 1) would then just consist of an image warp (Step 1), an image difference (Step 2), a collection of image “dot-products” (Step 7), multiplication of the result by the Hessian (Step 8), and the update to the parameters (Step 9).
All of these operations can be performed at (close to) frame-rate (Dellaert and Collins, 1999).
2000)) these approximations are inelegant and it is often hard to say how good approximations they are.

3.2.1. Goal of the Inverse Compositional Algorithm.

The key to efficiency is switching the role of the image and the template, as in Hager and Belhumeur (1998), where a change of variables is made to switch or invert the roles of the template and the image.
The only difference from the update in the forwards compositional algorithm in Eq. (13) is that the incremental warp W(x; p) is inverted before it is composed with the current estimate.
Fortunately, most warps used in computer vision, including homographies and 3D rotations (Shum and Szeliski, 2000), do form groups.
The most time consuming step, the computation of the Hessian in Step 6 can be performed once as a pre-computation.
The authors now show that the inverse compositional algorithm is equivalent to the forwards compositional algorithm introduced in Section 3.1.

3.3. Inverse Additive Image Alignment

A natural question which arises at this point is whether the same trick of changing variables to convert Eq. (37) into Eq. (38) can be applied in the additive formulation.
The simplification to the Jacobian in Eq. (39) therefore cannot be made.
The term ∂W −1 ∂y has to be included in an inverse additive algorithm in some form or other.

3.3.1. Goal of the Inverse Additive Algorithm. An

Image alignment algorithm that addresses this difficulty is the Hager-Belhumeur algorithm (Hager and Belhumeur, 1998).
The Hager-Belhumeur algorithm does fit into their framework as an inverse additive algorithm.
The template and the image are then switched by deriving the relationship between ∇I and ∇T .

3.3.2. Derivation of the Inverse Additive Algorithm.

It is obviously possible to write down the solution to Eq. (45) in terms of the Hessian, just like in Section 2.2.
So, in the naive approach, the Hessian will have to be re-computed in each iteration and the resulting algorithm will be just as inefficient as the original Lucas-Kanade algorithm.
The product of the two Jacobians has to be able to be written in the form of Eq. (46) for the Hager-Belhumeur algorithm to be used.
In comparison the inverse compositional algorithm can be applied to any set of warps which form a group, a very weak requirement.
The computational cost of the HagerBelhumeur algorithm is similar to that of the inverse compositional algorithm.

3.4. Empirical Validation

The authors have proved mathematically that all four image alignment algorithms take the same steps to first order in p, at least on sets of warps where they can all be used.
The authors then randomly perturbed these points with additive white Gaussian noise of a certain variance and fit for the affine warp parameters p that these 3 perturbed points define.
As can be seen, the 4 algorithms (3 for the homography) all converge at almost exactly the same rate, again validating the equivalence of the four algorithms.
The authors computed the percentage of times that each algorithm converged for various different variances of the noise added to the canonical point locations.
When equal noise is added to both images, the forwards algorithms perform marginally better than the inverse algorithms because the inverse algorithms are only first-order approximations to the forwards algorithms.

3.5. Summary

The authors have outlined three approaches to image alignment beyond the original forwards additive Lucas-Kanade algorithm.
In Section 3.4 the authors validated this equivalence empirically.
There is little difference between the two algorithms.
Since on warps like affine warps the algorithms are almost exactly the same, there is no reason to use the inverse additive algorithm.
The inverse compositional algorithm is equally efficient, conceptually more elegant, and more generally applicable than the inverse additive algorithm.

4. The Gradient Descent Approximation

Most non-linear optimization and parameter estimation algorithms operate by iterating 2 steps.
The first step approximately minimizes the optimality criterion, usually by making some sort of linear or quadratic approximation around the current estimate of the parameters.
The inverse compositional algorithm, for example, approximately minimizes the expression in Eq. (31) and updates the parameters using Eq. (32).
In Sections 2 and 3 above the authors outlined 4 equivalent quantity approximated-parameter update rule pairs.
The approximation that the authors made in each case is known as the Gauss-Newton approximation.

4.3. Steepest Descent

The simplest possibility is to approximate the Hessian as proportional to the identity matrix.
This approach to determining c has the obvious problem that it requires the Hessian ∑ x ∂2G ∂p2 .
The computational cost of the Gauss-Newton steepest descent algorithm is almost exactly the same as the original inverse compositional algorithm.
See Table 8 for a summary and Baker and Matthews (2002) for the details.

4.4. The Diagonal Approximation to the Hessian

The steepest descent algorithm can be regarded as approximating the Hessian with the identity matrix.
This approximation is commonly used in optimization problems with a large number of parameters.
Examples of this diagonal approximation in vision include stereo (Szeliski and Golland, 1998) and super-resolution (Baker and Kanade, 2000).
Overall, the pre-computation only takes time O(nN ) and the cost per iteration is only O(nN + n2).
The diagonal approximation to the Hessian makes the Newton inverse compositional algorithm far more efficient.

4.5. The Levenberg-Marquardt Algorithm

Of the various approximations, generally the steepest descent and diagonal approximations work better further away from the local minima, and the Newton and Gauss-Newton approximations work better close to the local minima where the quadratic approximation is good (Gill et al., 1986; Press et al., 1992).
For large δ 1, the Hessian is approximately the Gauss-Newton diagonal approximation to the Hessian, but with a reduced step size of 1 δ .
If the error has increased, the provisional update to the parameters is reversed and δ increased, δ → δ × 10, say.
The re-ordering doesn’t affect the computation cost of the algorithm; it only marginally increases the pre-computation time, although not asymptotically.
Overall the Levenberg-Marquardt algorithm is just as efficient as the Gauss-Newton inverse compositional algorithm.

4.6. Empirical Validation

The authors have described six variants of the inverse compositional image alignment algorithm: Gauss-Newton, Newton, Gauss-Newton steepest descent, diagonal Hessian (Gauss-Newton and Newton), and LevenbergMarquardt.
In Fig. 15(a) the authors plot the average frequency of convergence (computed over 5000 samples) with no intensity noise.
The steepest descent and diagonal Hessian approximations perform very poorly.
For affine warps the parameterization in Eq. (1) is not the only way.
As can be seen, all of the algorithms are roughly equally fast except the Newton algorithm which is much slower.

4.7. Summary

In this section the authors investigated the choice of the gradient descent approximation.
The authors have exhibited five alternatives: (1) Newton, (2) steepest descent, (3) diagonal approximation to the Gauss-Newton Hessian, (4) diagonal approximation to the Newton Hessian, and (5) Levenberg-Marquardt.
These three algorithms are also very sensitive to the estimation of the step size and the parameterization of the warps.
The most likely reason is the noise introduced in computing the second derivatives of the template.
Except for the Newton algorithm all of the alternatives are equally as efficient as the Gauss-Newton algorithm when combined with the inverse compositional algorithm.

4.8. Other Algorithms

These are not the only choices.the authors.
The focus of Section 4 has been approximating the Hessian: the Gauss-Newton approximation, the steepest descent approximation, and the diagonal approximations.
In Shum and Szeliski (2000) an algorithm is proposed to estimate the Gauss-Newton Hessian for the forwards compositional algorithm, but in an efficient manner.
One reason that computing the Hessian matrix is so time consuming is that it is a sum over the entire template.
Since the coefficients are constant they can be pre-computed.

5. Discussion

The authors have described a unifying framework for image alignment consisting of two halves.
The results of the first half are summarized in Table 6.
The algorithms differ in both their computational complexity and their empirical performance.
Overall the choice of which algorithm to use depends on two main things: (1) whether there is likely to me more noise in the template or in the input image and (2) whether the algorithm needs to be efficient or not.
The diagonal Hessian and steepest descent forwards algorithms are another option, but given their poor convergence properties it is probably better to use the inverse compositional algorithm even if the template is noisy.

6. Matlab Code, Test Images, and Scripts

Matlab implementations of all of the algorithms described in this paper are available on the World Wide Web at: http://www.ri.cmu.edu/projects/ project 515.html.
The authors have also included all of the test images and the scripts used to generate the experimental results in this paper.

Acknowledgments

The authors would like to thank Bob Collins, Matthew Deans, Frank Dellaert, Daniel Huber, Takeo Kanade, Jianbo Shi, Sundar Vedula, and Jing Xiao for discussions on image alignment, and Sami Romdhani for pointing out a couple of algebraic errors in a preliminary draft of this paper.
The authors would also like to thank the anonymous IJCV reviewers and the CVPR reviewers of Baker and Matthews (2001) for their feedback.
The research described in this paper was conducted under U.S. Office of Naval Research contract N00014-00-1-0915.

Did you find this useful? Give us your feedback

Figures (29)

Figure 3. The compositional algorithm used in Shum and Szeliski (2000) is similar to the Lucas-Kanade algorithm. The only differences are: (1) the gradient of I (W(x; p)) is used in Step 3 rather than the gradient of I evaluated at W(x; p), (2) the Jacobian can be pre-computed because it is evaluated at (x; 0) and so is constant, and (3) the warp is updated by composing the incremental warp W(x; p) with the current estimate in Step 9.

Figure 13. The Newton inverse compositional algorithm with the diagonal approximation to the Hessian is almost exactly the same as the full Newton version in Fig. 11. The only difference is that only the elements on the leading diagonals of the Hessian are computed in Steps 4, 5, and 6, and used to compute the update to the parameters in Step 8. The result is a far more efficient algorithm. See Table 9 for a summary.

Table 8. The computation cost of the GN steepest descent inverse compositional algorithm. The only change from the original inverse compositional algorithm is in Step 8. Instead of inverting the Hessian, Eq. (87) is used to compute the value of c. This takes time O(n2) rather than time O(n3). The cost per iteration is therefore O(n N +n2) rather than O(n N + n3). The pre-computation time is still O(n2 N ).

Figure 12. The Gauss-Newton steepest descent inverse compositional algorithm estimates the parameter updates p as constant multiples of the gradient ∑ x[ ∂G ∂p ] T rather than multiplying the gradient with the inverse Hessian. The computation of the constant multiplication factor c requires the Hessian however, estimated here with the Gauss-Newton approximation. The steepest descent inverse compositional algorithm is slightly faster than the original inverse compositional algorithm and takes time O(n N + n2) per iteration.

Figure 6. Examples of the four algorithms converging. The RMS error in the location of 3 canonical points in the template (4 for the homographies) is plot against the algorithm iteration number. In all examples the algorithms converge at the same rate validating their equivalence. Only 3 plots are shown for the homography in (b), (d), and (f) because the inverse additive algorithm cannot be applied to homographies.

Figure 18. The performance of Gauss-Newton, steepest descent, and the Gauss-Newton diagonal approximation to the Hessian for two parameterizations of affine warps: (1) the parameterization in Eqs. (1) and (2) a parameterization based on the location of three canonical points. Gauss-Newton is relatively unaffected by the re-parameterization whereas the performance of the other algorithms is dramatically improved.

Table 2. The computation cost of the compositional algorithm. The one time pre-computation cost of evaluating the Jacobian in Step 4 is O(n N ). After that, the cost of each iteration is O(n2 N + n3).

Figure 16. The average rates of convergence of the Gauss-Newton, Newton, and Levenberg-Marquardt algorithms. The other three algorithms converge so slowly that the results are omitted. Gauss-Newton and Levenberg-Marquardt converge similarly, and the fastest. Newton converges significantly slower.

Figure 17. The performance of the diagonal approximation to the Hessian algorithms, both with and without the step size estimation step used in the steepest descent algorithm. See Eqs. (87) and (88). With the step size estimation step, the performance of the diagonal Hessian algorithms improves dramatically.

Table 3. The computation cost of the inverse compositional algorithm. The one time pre-computation cost of computing the steepest descent images and the Hessian in Steps 3-6 is O(n2 N ). After that, the cost of each iteration is O(n N +n2) a substantial saving over the Lucas-Kanade and compositional algorithms.

Figure 1. The Lucas-Kanade algorithm (Lucas and Kanade, 1981) consists of iteratively applying Eqs. (10) and (5) until the estimates of the parameters p converge. Typically the test for convergence is whether some norm of the vector p is below a user specified threshold . Because the gradient ∇I must be evaluated at W(x; p) and the Jacobian ∂W ∂p must be evaluated at p, all 9 steps must be repeated in every iteration of the algorithm.

Table 5. Timing results for our Matlab implementation of the four algorithms in milliseconds. These results are for the 6-parameter affine warp using a 100 × 100 pixel template on a 933 MHz Pentium-IV.

Table 7. The computation cost of the Newton inverse compositional algorithm. The one time pre-computation cost in Steps 3–5 is O(n2 N ). After that, the cost of each iteration is O(n2 N + n3), substantially more than the cost of the Gauss-Newton inverse compositional algorithm described in Fig. 4, and asymptotically the same as the original Lucas-Kanade algorithm described in Fig. 1 in Section 2.

Figure 11. Compared to the Gauss-Newton inverse compositional algorithm in Fig. 4, the Newton inverse compositional algorithm is considerably more complex. The Hessian varies from iteration to iteration and so has to be re-computed each time. The expression for the Hessian is also considerably more complex and requires the second order derivatives of both the template T and the warp W(x; p). The computational cost of the Newton inverse compositional algorithm is O(n2 N + n3) and the pre-computation cost is O(n2 N ).

Figure 15. The average frequency of convergence of the six variants of the inverse compositional algorithm: Gauss-Newton, Newton, steepest descent, diagonal Hessian approximation (Gauss-Newton and Newton), and Levenberg-Marquardt. We find that Gauss-Newton and LevenbergMarquardt perform the best, with Newton significantly worse. The three other algorithms all perform very poorly indeed.

Table 4. The computation cost of the Hager-Belhumeur algorithm. Both the pre-computation cost and the cost per iteration are almost exactly the same as the inverse compositional algorithm when k = n.

Figure 5. The Hager-Belhumeur inverse additive algorithm (Hager and Belhumeur, 1998) is very similar to the inverse compositional algorithm in Fig. 4. The two main differences are that is used instead of the Jacobian and the warp is updated using (p) rather than by inverting the incremental warp and composing it with the current estimate. Otherwise, the similarly between the two algorithms is apparent. The computational cost of the two algorithms is almost exactly identical. The problem with the Hager-Belhumeur algorithm, however, is that it can only be applied to (the very few) warps for which the assumption in Eq. (46) holds.

Figure 2. A schematic overview of the Lucas-Kanade algorithm (Lucas and Kanade, 1981). The image I is warped with the current estimate of the warp in Step 1 and the result subtracted from the template in Step 2 to yield the error image. The gradient of I is warped in Step 3, the Jacobian is computed in Step 4, and the two combined in Step 5 to give the steepest descent images. In Step 6 the Hessian is computed from the steepest descent images. In Step 7 the steepest descent parameter updates are computed by dot producting the error image with the steepest descent images. In Step 8 the Hessian is inverted and multiplied by the steepest descent parameter updates to get the final parameter updates p which are then added to the parameters p in Step 9.

Figure 10. The effect of intensity noise on the rate of convergence and the frequency of convergence for affine warps. The results for homographies are similar and omitted for lack of space. In all cases, additive, white Gaussian noise with standard deviation 8.0 grey levels is added to one or both of the template T (x) and the warped input image I (x). The results show that the forwards algorithms are more robust to noise on the template and the inverse algorithms are more robust to noise on the input image. Overall, the forwards algorithms are slightly more robust to noise added to both the template and the input image.

Figure 9. The average rates of convergence and average frequencies of convergence for the affine warp for three different images: “Simon” (an image of the face of the first author), “Knee” (an image of an x-ray of a knee), and “Car” (an image of car.) These results demonstrate the same qualitative behavior as those in Figs. 7 and 8. Most importantly, the four algorithms all converge equally fast and diverge equally often. Their equivalence is therefore further validated. However, there is considerable variation in performance with the image. The performance of all four algorithms is far worse on “Car” than on the other images.

Figure 14. The Gauss-Newton Levenberg-Marquardt inverse compositional algorithm. Besides using the Levenberg-Marquardt Hessian in Step 8 and reordering Steps 1 and 2 slightly, the algorithm checks whether the error e has decreased at the end of each iteration in Step 10. If the error decreased, the value of δ is reduced and the next iteration started. If the error increased, the value of δ is increased and the current update reversed by “undoing” steps 9, 1, & 2. The computational cost of the Levenberg-Marquardt algorithm is detailed in Table 10. The algorithm is just as efficient as the original Gauss-Newton inverse compositional algorithm and operates in time O(nN + n3) per iteration. The pre-computation cost is O(n2 N ).

Content maybe subject to copyright Report

International Journal of Computer Vision 56(3), 221–255, 2004

 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Lucas-Kanade 20 Years On: A Unifying Framework

SIMON BAKER AND IAIN MATTHEWS

The Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA

simonb@cs.cmu.edu

iainm@cs.cmu.edu

Received July 10, 2002; Revised February 6, 2003; Accepted February 7, 2003

Abstract. Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most

widely used techniques in computer vision. Applications range from optical ﬂow and tracking to layered motion,

mosaic construction, and face coding. Numerous algorithms have been proposed and a wide variety of extensions

have been made to the original formulation. We present an overview of image alignment, describing most of the

algorithms and their extensions in a consistent framework. We concentrate on the inverse compositional algorithm,

an efﬁcient algorithm that we recently proposed. We examine which of the extensions to Lucas-Kanade can be used

with the inverse compositional algorithm without any signiﬁcant loss of efﬁciency, and which cannot. In this paper,

Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent

approximation. In future papers, we will cover the choice of the error function, how to allow linear appearance

variation, and how to impose priors on the parameters.

Keywords: image alignment, Lucas-Kanade, a unifying framework, additive vs. compositional algorithms, for-

wards vs. inverse algorithms, the inverse compositional algorithm, efﬁciency, steepest descent, Gauss-Newton,

Newton, Levenberg-Marquardt

1. Introduction

Image alignment consists of moving, and possibly de-

forming, a template to minimize the difference between

the template and an image. Since the ﬁrst use of im-

age alignment in the Lucas-Kanade optical ﬂow al-

gorithm (Lucas and Kanade, 1981), image alignment

has become one of the most widely used techniques

in computer vision. Besides optical ﬂow, some of its

other applications include tracking (Black and Jepson,

1998; Hager and Belhumeur, 1998), parametric and

layered motion estimation (Bergen et al., 1992), mo-

saic construction (Shum and Szeliski, 2000), medical

image registration (Christensen and Johnson, 2001),

and face coding (Baker and Matthews, 2001; Cootes

et al., 1998).

The usual approach to image alignment is gradi-

ent descent. A variety of other numerical algorithms

such as difference decomposition (Gleicher, 1997) and

linear regression (Cootes et al., 1998) have also been

proposed, but gradient descent is the defacto standard.

Gradient descent can be performed in variety of dif-

ferent ways, however. One difference between the var-

ious approaches is whether they estimate an additive

increment to the parameters (the additive approach

(Lucas and Kanade, 1981)), or whether they estimate

an incremental warp that is then composed with the

current estimate of the warp (the compositional ap-

proach (Shum and Szeliski, 2000)). Another difference

is whether the algorithm performs a Gauss-Newton, a

Newton, a steepest-descent, or a Levenberg-Marquardt

approximation in each gradient descent step.

We propose a unifying framework for image align-

ment, describing the various algorithms and their ex-

tensions in a consistent manner. Throughout the frame-

work we concentrate on the inverse compositional

222 Baker and Matthews

algorithm, an efﬁcient algorithm that we recently pro-

posed (Baker and Matthews, 2001). We examine which

of the extensions to Lucas-Kanade can be applied to

the inverse compositional algorithm without any sig-

niﬁcant loss of efﬁciency, and which extensions require

additional computation. Wherever possible we provide

empirical results to illustrate the various algorithms and

their extensions.

In this paper, Part 1 in a series of papers, we be-

gin in Section 2 by reviewing the Lucas-Kanade algo-

rithm. We proceed in Section 3 to analyze the quan-

tity that is approximated by the various image align-

ment algorithms and the warp update rule that is used.

We categorize algorithms as either additive or compo-

sitional, and as either forwards or inverse.Weprove

the ﬁrst order equivalence of the various alternatives,

derive the efﬁciency of the resulting algorithms, de-

scribe the set of warps that each alternative can be

applied to, and ﬁnally empirically compare the algo-

rithms. In Section 4 we describe the various gradient de-

scent approximations that can be used in each iteration,

Gauss-Newton, Newton, diagonal Hessian, Levenberg-

Marquardt, and steepest-descent (Press et al., 1992).

We compare these alternatives both in terms of speed

and in terms of empirical performance. We conclude

in Section 5 with a discussion. In future papers in this

series (which will be made available on our website

http://www.ri.cmu.edu/projects/project

515.html), we

will cover the choice of the error function, how to al-

low linear appearance variation, and how to add priors

on the parameters.

2. Background: Lucas-Kanade

The original image alignment algorithm was the Lucas-

Kanade algorithm (Lucas and Kanade, 1981). The goal

of Lucas-Kanade is to align a template image T (x)toan

input image I (x), where x = (x, y)

is a column vector

containing the pixel coordinates. If the Lucas-Kanade

algorithm is being used to compute optical ﬂow or to

track an image patch from time t = 1totime t = 2,

the template T (x)isanextracted sub-region (a 5 × 5

window, maybe) of the image at t = 1 and I (x)isthe

image at t = 2.

Let W(x; p) denote the parameterized set of allowed

warps, where p = ( p

,... p

)

is a vector of parame-

ters. The warp W(x; p) takes the pixel x in the coordi-

nate frame of the template T and maps it to the sub-pixel

location W(x; p)inthe coordinate frame of the image

I .Ifweare computing optical ﬂow, for example, the

warps W(x; p) might be the translations:

W(x; p) =



x + p

y + p



(1)

where the vector of parameters p = (p

, p

)

is then

the optical ﬂow. If we are tracking a larger image patch

moving in 3D we may instead consider the set of afﬁne

warps:

W(x; p) =



(1 + p

) · x + p

· y + p

· x + (1 + p

) · y + p





1 + p







(2)

where there are 6 parameters p = (p

, p

)

as, for example, was done in Bergen et al. (1992).

(There are other ways to parameterize afﬁne warps.

Later in this framework we will investigate what is

the best way.) In general, the number of parameters n

may be arbitrarily large and W(x; p) can be arbitrar-

ily complex. One example of a complex warp is the

set of piecewise afﬁne warps used in Active Appear-

ance Models (Cootes et al., 1998; Baker and Matthews,

2001) and Active Blobs (Sclaroff and Isidoro, 1998).

2.1. Goal of the Lucas-Kanade Algorithm

The goal of the Lucas-Kanade algorithm is to mini-

mize the sum of squared error between two images,

the template T and the image I warped back onto the

coordinate frame of the template:



[

I (W(x; p)) − T (x)

]

. (3)

Warping I back to compute I (W(x; p)) requires inter-

polating the image I at the sub-pixel locations W(x; p).

The minimization of the expression in Eq. (3) is per-

formed with respect to p and the sum is performed

over all of the pixels x in the template image T (x).

Minimizing the expression in Eq. (1) is a non-linear

optimization task even if W(x; p)islinear in p because

the pixel values I (x) are, in general, non-linear in x.

In fact, the pixel values I (x) are essentially un-related

to the pixel coordinates x.Tooptimize the expression

in Eq. (3), the Lucas-Kanade algorithm assumes that

a current estimate of p is known and then iteratively

Lucas-Kanade 20 Years On: A Unifying Framework 223

solves for increments to the parameters p; i.e. the

following expression is (approximately) minimized:



[I (W(x; p + p)) − T (x)]

(4)

with respect to p, and then the parameters are up-

dated:

p ← p + p. (5)

These two steps are iterated until the estimates of the

parameters p converge. Typically the test for conver-

gence is whether some norm of the vector p is below

a threshold ; i.e. p≤.

2.2. Derivation of the Lucas-Kanade Algorithm

The Lucas-Kanade algorithm (which is a Gauss-

Newton gradient descent non-linear optimization al-

gorithm) is then derived as follows. The non-linear ex-

pression in Eq. (4) is linearized by performing a ﬁrst

order Taylor expansion on I (W(x; p + p)) to give:





I (W(x; p)) + ∇I

∂W

∂p

p − T (x)



. (6)

In this expression, ∇ I = (

∂ I

∂x

∂ I

∂y

)isthe gradient of

image I evaluated at W(x; p); i.e. ∇ I is computed

in the coordinate frame of I and then warped back onto

the coordinate frame of T using the current estimate of

the warp W(x; p). The term

∂W

∂p

is the Jacobian of the

warp. If W(x; p) = (W

(x; p), W

(x; p))

then:

∂W

∂p





∂W

∂p

∂W

∂p

...

∂W

∂p

∂W

∂p

∂W

∂p

...

∂W

∂p





. (7)

We follow the notational convention that the partial

derivatives with respect to a column vector are laid out

as a row vector. This convention has the advantage that

the chain rule results in a matrix multiplication, as in

the expression in Eq. (6). For example, the afﬁne warp

in Eq. (2) has the Jacobian:

∂W

∂p



x 0 y 010

0 x 0 y 01



. (8)

Minimizing the expression in Eq. (6) is a least squares

problem and has a closed from solution which can be

Figure 1. The Lucas-Kanade algorithm (Lucas and Kanade, 1981)

consists of iteratively applying Eqs. (10) and (5) until the estimates

of the parameters p converge. Typically the test for convergence

is whether some norm of the vector p is below a user speciﬁed

threshold . Because the gradient ∇ I must be evaluated at W(x; p)

and the Jacobian

∂W

∂p

must be evaluated at p, all 9 steps must be

repeated in every iteration of the algorithm.

derived as follows. The partial derivative of the expres-

sion in Eq. (6) with respect to p is:





∇I

∂W

∂p





I (W(x; p)) + ∇ I

∂W

∂p

p − T (x)



(9)

where we refer to ∇I

∂W

∂p

as the steepest descent im-

ages. (See Section 4.3 for why.) Setting this expression

to equal zero and solving gives the closed form solution

for the minimum of the expression in Eq. (6) as:

p = H

−1





∇I

∂W

∂p



[T (x) − I (W(x; p))]

(10)

where H is the n × n (Gauss-Newton approximation

to the) Hessian matrix:

H =





∇I

∂W

∂p





∇I

∂W

∂p



. (11)

For reasons that will become clear later we refer to



[∇I

∂W

∂p

]

[T (x) − I(W(x; p))] as the steepest de-

scent parameter updates. Equation (10) then expresses

the fact that the parameter updates p are the steepest

descent parameter updates multiplied by the inverse

of the Hessian matrix. The Lucas-Kanade algorithm

(Lucas and Kanade, 1981) then consists of iteratively

applying Eqs. (10) and (5). See Figs. 1 and 2 for a sum-

mary. Because the gradient ∇I must be evaluated at

224 Baker and Matthews

Figure 2.Aschematic overview of the Lucas-Kanade algorithm (Lucas and Kanade, 1981). The image I is warped with the current estimate

of the warp in Step 1 and the result subtracted from the template in Step 2 to yield the error image. The gradient of I is warped in Step 3, the

Jacobian is computed in Step 4, and the two combined in Step 5 to give the steepest descent images. In Step 6 the Hessian is computed from

the steepest descent images. In Step 7 the steepest descent parameter updates are computed by dot producting the error image with the steepest

descent images. In Step 8 the Hessian is inverted and multiplied by the steepest descent parameter updates to get the ﬁnal parameter updates p

which are then added to the parameters p in Step 9.

W(x; p) and the Jacobian

∂W

∂p

at p, they both in gen-

eral depend on p.For some simple warps such as the

translations in Eq. (1) and the afﬁne warps in Eq. (2)

the Jacobian can sometimes be constant. See for ex-

ample Eq. (8). In general, however, all 9 steps of the

algorithm must be repeated in every iteration because

the estimates of the parameters p vary from iteration to

iteration.

2.3. Requirements on the Set of Warps

The only requirement on the warps W(x; p)isthat they

are differentiable with respect to the warp parameters p.

This condition is required to compute the Jacobian

∂W

∂p

Normally the warps are also (piecewise) differentiable

with respect to x,but even this condition is not strictly

required.

Lucas-Kanade 20 Years On: A Unifying Framework 225

Table 1. The computation cost of one iteration of the Lucas-Kanade algorithm. If n is the number of warp

parameters and N is the number of pixels in the template T , the cost of each iteration is O(n

N + n

). The

most expensive step by far is Step 6, the computation of the Hessian, which alone takes time O(n

N ).

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 Total

O(nN)O(N )O(nN)O(nN)O(nN)O(n

N )O(nN)O(n

)O(n)O(n

N + n

)

2.4. Computational Cost of the

Lucas-Kanade Algorithm

Assume that the number of warp parameters is n and the

number of pixels in T is N . Step 1 of the Lucas-Kanade

algorithm usually takes time O(nN). For each pixel x

in T we compute W(x; p) and then sample I at that

location. The computational cost of computing W(x; p)

depends on W but for most warps the cost is O(n) per

pixel. Step 2 takes time O(N ). Step 3 takes the same

time as Step 1, usually O(nN). Computing the Jacobian

in Step 4 also depends on W but for most warps the cost

is O(n) per pixel. The total cost of Step 4 is therefore

O(nN). Step 5 takes time O(nN), Step 6 takes time

O(n

N ), and Step 7 takes time O(nN). Step 8 takes

time O(n

)toinvert the Hessian matrix and time O(n

)

to multiply the result by the steepest descent parameter

updated computed in Step 7. Step 9 just takes time

O(n)toincrement the parameters by the updates. The

total computational cost of each iteration is therefore

O(n

N +n

), the most expensive step being Step 6. See

Table 1 for a summary of these computational costs.

3. The Quantity Approximated and the Warp

Update Rule

In each iteration Lucas-Kanade approximately mini-

mizes



[

I (W(x; p + p)) − T (x)

]

with respect to

p and then updates the estimates of the parameters in

Step 9 p ← p + p. Perhaps somewhat surprisingly

iterating these two steps is not the only way to minimize

the expression in Eq. (3). In this section we outline 3

alternative approaches that are all provably equivalent

to the Lucas-Kanade algorithm. We then show empiri-

cally that they are equivalent.

3.1. Compositional Image Alignment

The ﬁrst alternative to the Lucas-Kanade algorithm is

the compositional algorithm.

3.1.1. Goal of the Compositional Algorithm. The

compositional algorithm, used most notably by Shum

and Szeliski (2000), approximately minimizes:



[I (W(W(x; p); p)) − T (x)]

(12)

with respect to p in each iteration and then updates

the estimate of the warp as:

W(x; p) ← W(x; p) ◦ W(x; p), (13)

i.e. the compositional approach iteratively solves for

an incremental warp W(x; p) rather than an additive

update to the parameters p.Inthis context, we refer to

the Lucas-Kanade algorithm in Eqs. (4) and (5) as the

additive approach to contrast it with the compositional

approach in Eqs. (12) and (13). The compositional and

additive approaches are proved to be equivalent to ﬁrst

order in p in Section 3.1.5. The expression:

W(x; p) ◦ W(x; p) ≡ W(W(x; p); p) (14)

is the composition of 2 warps. For example, if W(x; p)

is the afﬁne warp of Eq. (2) then:

W(x; p) ◦ W(x; p)







(1 + p

) · ((1 + p

) · x + p

· y + p

)

+ p

· (p

· x + (1 + p

) · y + p

)

+ p

· ((1 + p

) · x + p

· y + p

)

+ (1 + p

) · (p

· x + (1 + p

) · y

+ p

) + p







(15)

HTML Viewer

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Lucas-kanade 20 years on: a unifying framework" ?

The authors present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework. The authors concentrate on the inverse compositional algorithm, an efficient algorithm that they recently proposed. The authors examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which can not. In this paper, Part 1 in a series of papers, the authors cover the quantity approximated, the warp update rule, and the gradient descent approximation.

Q2. What have the authors stated for future works in "Lucas-kanade 20 years on: a unifying framework" ?

In future papers in this series the authors will extend their framework to cover these choices and, in particular, investigate whether the inverse compositional algorithm is compatible with these extensions of the Lucas-Kanade algorithm.

Q3. What is the goal of the Lucas-Kanade algorithm?

The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and the image The authorwarped back onto the coordinate frame of the template:∑ x [I (W(x; p)) − T (x)]2 .

Q4. What is the advantage of the forwards compositional algorithm?

The forwards compositional algorithm has the slight advantage that the Jacobian is constant, and is in general simpler so is less likely to be computed erroneously (Shum and Szeliski, 2000).

Q5. What is the effect of additive noise on the inverse algorithms?

Since the forwards algorithms compute the gradient of The authorand the inverse algorithms compute the gradient of T , it is not surprising that when noise is added to The authorthe inverse algorithms converge better (faster and more frequently), and conversely when noise is added to T the forwards algorithms converge better.

Q6. What is the way to apply the inverse additive algorithm to a warp?

The inverse additive algorithm can be applied to very few warps, mostly simple 2D linear warps such as translations and affine warps.

Q7. What is the cost of the Hager-Belhumeur algorithm?

In most of the steps, the cost is a function of k rather than n, but most of the time k = n in the Hager-Belhumeur algorithm anyway.

Q8. What is the cost of composing the two warps in step 9?

Most notably, the cost of composing the two warps in Step 9 depends on W but for most warps the cost is O(n2) or less, including for the affine warps in Eq. (16).3.1.5. Equivalence of the Additive and Compositional Algorithms.

Q9. What is the partial derivative of the expression in Eq. (6) with respect to p?

The partial derivative of the expression in Eq. (6) with respect to p is:2 ∑x[ ∇I ∂W∂p]T[ The author(W(x; p)) + ∇I ∂W∂p p − T (x) ] (9)where the authors refer to ∇I ∂W ∂p as the steepest descent images.

Q10. What is the error measure for the Hager-Belhumeural algorithm?

Since the 6 parameters in the affine warp have different units, the authors use the following error measure rather than the errors in the parameters.

Q11. What is the first step in the Hager-Belhumeur algorithm?

The initial goal of the Hager-Belhumeur algorithm is the same as the Lucas-Kanade algorithm; i.e. to minimize ∑ x [ The author(W(x; p + p)) − T (x) ]2 with respect to p and then update the parameters p ← p + p. Rather than changing variables like in Section 3.2.5, the roles of the template and the image are switched as follows.

Q12. What is the reason to use the inverse additive algorithm?

Since on warps like affine warps the algorithms are almost exactly the same, there is no reason to use the inverse additive algorithm.

Q13. What is the difference between the forwards and the inverse algorithms?

When equal noise is added to both images, the forwards algorithms perform marginally better than the inverse algorithms because the inverse algorithms are only first-order approximations to the forwards algorithms.

Lucas-Kanade 20 Years On: A Unifying Framework

Summary (6 min read)

1. Introduction

2.1. Goal of the Lucas-Kanade Algorithm

2.2. Derivation of the Lucas-Kanade Algorithm

2.3. Requirements on the Set of Warps

2.4. Computational Cost of the Lucas-Kanade Algorithm

3. The Quantity Approximated and the Warp Update Rule

3.1. Compositional Image Alignment

3.1.2. Derivation of the Compositional Algorithm.

3.2. Inverse Compositional Image Alignment

3.2.1. Goal of the Inverse Compositional Algorithm.

3.3. Inverse Additive Image Alignment

3.3.1. Goal of the Inverse Additive Algorithm. An

3.3.2. Derivation of the Inverse Additive Algorithm.

3.4. Empirical Validation

3.5. Summary

4. The Gradient Descent Approximation

4.3. Steepest Descent

4.4. The Diagonal Approximation to the Hessian

4.5. The Levenberg-Marquardt Algorithm

4.6. Empirical Validation

4.7. Summary

4.8. Other Algorithms

5. Discussion

6. Matlab Code, Test Images, and Scripts

Acknowledgments

Figures (29)

Citations

References

Related Papers (5)

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Lucas-kanade 20 years on: a unifying framework" ?

Q2. What have the authors stated for future works in "Lucas-kanade 20 years on: a unifying framework" ?

Q3. What is the goal of the Lucas-Kanade algorithm?

Q4. What is the advantage of the forwards compositional algorithm?

Q5. What is the effect of additive noise on the inverse algorithms?

Q6. What is the way to apply the inverse additive algorithm to a warp?

Q7. What is the cost of the Hager-Belhumeur algorithm?

Q8. What is the cost of composing the two warps in step 9?

Q9. What is the partial derivative of the expression in Eq. (6) with respect to p?

Q10. What is the error measure for the Hager-Belhumeural algorithm?

Q11. What is the first step in the Hager-Belhumeur algorithm?

Q12. What is the reason to use the inverse additive algorithm?

Q13. What is the difference between the forwards and the inverse algorithms?