scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Lucas-Kanade 20 Years On: A Unifying Framework

01 Feb 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 56, Iss: 3, pp 221-255
TL;DR: In this paper, a wide variety of extensions have been made to the original formulation of the Lucas-Kanade algorithm and their extensions can be used with the inverse compositional algorithm without any significant loss of efficiency.
Abstract: Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most widely used techniques in computer vision Applications range from optical flow and tracking to layered motion, mosaic construction, and face coding Numerous algorithms have been proposed and a wide variety of extensions have been made to the original formulation We present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework We concentrate on the inverse compositional algorithm, an efficient algorithm that we recently proposed We examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which cannot In this paper, Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent approximation In future papers, we will cover the choice of the error function, how to allow linear appearance variation, and how to impose priors on the parameters

Summary (6 min read)

1. Introduction

  • Image alignment consists of moving, and possibly deforming, a template to minimize the difference between the template and an image.
  • The authors propose a unifying framework for image alignment, describing the various algorithms and their extensions in a consistent manner.
  • The authors prove the first order equivalence of the various alternatives, derive the efficiency of the resulting algorithms, describe the set of warps that each alternative can be applied to, and finally empirically compare the algorithms.

2.1. Goal of the Lucas-Kanade Algorithm

  • The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and the image I warped back onto the coordinate frame of the template: ∑ x [I (W(x; p)) − T (x)]2 .
  • (3) Warping I back to compute I (W(x; p)) requires interpolating the image I at the sub-pixel locations W(x; p).
  • The minimization of the expression in Eq. (3) is performed with respect to p and the sum is performed over all of the pixels x in the template image T (x).
  • In fact, the pixel values I (x) are essentially un-related to the pixel coordinates x.

2.2. Derivation of the Lucas-Kanade Algorithm

  • The Lucas-Kanade algorithm (which is a GaussNewton gradient descent non-linear optimization algorithm) is then derived as follows.
  • The term ∂W ∂p is the Jacobian of the warp.
  • This convention has the advantage that the chain rule results in a matrix multiplication, as in the expression in Eq. (6).
  • (8) Minimizing the expression in Eq. (6) is a least squares problem and has a closed from solution which can be derived as follows.
  • In general, however, all 9 steps of the algorithm must be repeated in every iteration because the estimates of the parameters p vary from iteration to iteration.

2.3. Requirements on the Set of Warps

  • The only requirement on the warps W(x; p) is that they are differentiable with respect to the warp parameters p.
  • This condition is required to compute the Jacobian ∂W ∂p .
  • Normally the warps are also differentiable with respect to x, but even this condition is not strictly required.

2.4. Computational Cost of the Lucas-Kanade Algorithm

  • Step 1 of the Lucas-Kanade algorithm usually takes time O(n N ).
  • The computational cost of computing W(x; p) depends on W but for most warps the cost is O(n) per pixel.
  • Step 8 takes time O(n3) to invert the Hessian matrix and time O(n2) to multiply the result by the steepest descent parameter updated computed in Step 7.
  • The total computational cost of each iteration is therefore O(n2 N +n3), the most expensive step being Step 6.
  • See Table 1 for a summary of these computational costs.

3. The Quantity Approximated and the Warp Update Rule

  • In each iteration Lucas-Kanade approximately minimizes ∑ x [ I (W(x; p + p)) − T (x) ]2 with respect to p and then updates the estimates of the parameters in Step 9 p ← p + p. Perhaps somewhat surprisingly iterating these two steps is not the only way to minimize the expression in Eq. (3).
  • In this section the authors outline 3 alternative approaches that are all provably equivalent to the Lucas-Kanade algorithm.
  • The authors then show empirically that they are equivalent.

3.1. Compositional Image Alignment

  • The first alternative to the Lucas-Kanade algorithm is the compositional algorithm.
  • The authors refer to the Lucas-Kanade algorithm in Eqs. (4) and (5) as the additive approach to contrast it with the compositional approach in Eqs. (12) and (13).
  • The compositional and additive approaches are proved to be equivalent to first order in p in Section 3.1.5.

3.1.2. Derivation of the Compositional Algorithm.

  • In order to proceed the authors make one assumption.
  • It is also generally simpler analytically (Shum and Szeliski, 2000).
  • The authors therefore have two requirements on the set of warps: (1) the set of warps must contain the identity warp and (2) the set of warps must be closed under composition.
  • The computational cost of the compositional algorithm is almost exactly the same as that of the Lucas-Kanade algorithm.
  • (Note that in the second of these expressions ∂W ∂p is evaluated at (x; 0), rather than at (x; p) in the first expression.).

3.2. Inverse Compositional Image Alignment

  • As a number of authors have pointed out, there is a huge computational cost in re-evaluating the Hessian in every iteration of the Lucas-Kanade algorithm (Hager and Belhumeur, 1998; Dellaert and Collins, 1999; Shum and Szeliski, 2000).
  • If the Hessian were constant it could be precomputed and then re-used.
  • Each iteration of the algorithm (see Fig. 1) would then just consist of an image warp (Step 1), an image difference (Step 2), a collection of image “dot-products” (Step 7), multiplication of the result by the Hessian (Step 8), and the update to the parameters (Step 9).
  • All of these operations can be performed at (close to) frame-rate (Dellaert and Collins, 1999).
  • 2000)) these approximations are inelegant and it is often hard to say how good approximations they are.

3.2.1. Goal of the Inverse Compositional Algorithm.

  • The key to efficiency is switching the role of the image and the template, as in Hager and Belhumeur (1998), where a change of variables is made to switch or invert the roles of the template and the image.
  • The only difference from the update in the forwards compositional algorithm in Eq. (13) is that the incremental warp W(x; p) is inverted before it is composed with the current estimate.
  • Fortunately, most warps used in computer vision, including homographies and 3D rotations (Shum and Szeliski, 2000), do form groups.
  • The most time consuming step, the computation of the Hessian in Step 6 can be performed once as a pre-computation.
  • The authors now show that the inverse compositional algorithm is equivalent to the forwards compositional algorithm introduced in Section 3.1.

3.3. Inverse Additive Image Alignment

  • A natural question which arises at this point is whether the same trick of changing variables to convert Eq. (37) into Eq. (38) can be applied in the additive formulation.
  • The simplification to the Jacobian in Eq. (39) therefore cannot be made.
  • The term ∂W −1 ∂y has to be included in an inverse additive algorithm in some form or other.

3.3.1. Goal of the Inverse Additive Algorithm. An

  • Image alignment algorithm that addresses this difficulty is the Hager-Belhumeur algorithm (Hager and Belhumeur, 1998).
  • The Hager-Belhumeur algorithm does fit into their framework as an inverse additive algorithm.
  • The template and the image are then switched by deriving the relationship between ∇I and ∇T .

3.3.2. Derivation of the Inverse Additive Algorithm.

  • It is obviously possible to write down the solution to Eq. (45) in terms of the Hessian, just like in Section 2.2.
  • So, in the naive approach, the Hessian will have to be re-computed in each iteration and the resulting algorithm will be just as inefficient as the original Lucas-Kanade algorithm.
  • The product of the two Jacobians has to be able to be written in the form of Eq. (46) for the Hager-Belhumeur algorithm to be used.
  • In comparison the inverse compositional algorithm can be applied to any set of warps which form a group, a very weak requirement.
  • The computational cost of the HagerBelhumeur algorithm is similar to that of the inverse compositional algorithm.

3.4. Empirical Validation

  • The authors have proved mathematically that all four image alignment algorithms take the same steps to first order in p, at least on sets of warps where they can all be used.
  • The authors then randomly perturbed these points with additive white Gaussian noise of a certain variance and fit for the affine warp parameters p that these 3 perturbed points define.
  • As can be seen, the 4 algorithms (3 for the homography) all converge at almost exactly the same rate, again validating the equivalence of the four algorithms.
  • The authors computed the percentage of times that each algorithm converged for various different variances of the noise added to the canonical point locations.
  • When equal noise is added to both images, the forwards algorithms perform marginally better than the inverse algorithms because the inverse algorithms are only first-order approximations to the forwards algorithms.

3.5. Summary

  • The authors have outlined three approaches to image alignment beyond the original forwards additive Lucas-Kanade algorithm.
  • In Section 3.4 the authors validated this equivalence empirically.
  • There is little difference between the two algorithms.
  • Since on warps like affine warps the algorithms are almost exactly the same, there is no reason to use the inverse additive algorithm.
  • The inverse compositional algorithm is equally efficient, conceptually more elegant, and more generally applicable than the inverse additive algorithm.

4. The Gradient Descent Approximation

  • Most non-linear optimization and parameter estimation algorithms operate by iterating 2 steps.
  • The first step approximately minimizes the optimality criterion, usually by making some sort of linear or quadratic approximation around the current estimate of the parameters.
  • The inverse compositional algorithm, for example, approximately minimizes the expression in Eq. (31) and updates the parameters using Eq. (32).
  • In Sections 2 and 3 above the authors outlined 4 equivalent quantity approximated-parameter update rule pairs.
  • The approximation that the authors made in each case is known as the Gauss-Newton approximation.

4.3. Steepest Descent

  • The simplest possibility is to approximate the Hessian as proportional to the identity matrix.
  • This approach to determining c has the obvious problem that it requires the Hessian ∑ x ∂2G ∂p2 .
  • The computational cost of the Gauss-Newton steepest descent algorithm is almost exactly the same as the original inverse compositional algorithm.
  • See Table 8 for a summary and Baker and Matthews (2002) for the details.

4.4. The Diagonal Approximation to the Hessian

  • The steepest descent algorithm can be regarded as approximating the Hessian with the identity matrix.
  • This approximation is commonly used in optimization problems with a large number of parameters.
  • Examples of this diagonal approximation in vision include stereo (Szeliski and Golland, 1998) and super-resolution (Baker and Kanade, 2000).
  • Overall, the pre-computation only takes time O(nN ) and the cost per iteration is only O(nN + n2).
  • The diagonal approximation to the Hessian makes the Newton inverse compositional algorithm far more efficient.

4.5. The Levenberg-Marquardt Algorithm

  • Of the various approximations, generally the steepest descent and diagonal approximations work better further away from the local minima, and the Newton and Gauss-Newton approximations work better close to the local minima where the quadratic approximation is good (Gill et al., 1986; Press et al., 1992).
  • For large δ 1, the Hessian is approximately the Gauss-Newton diagonal approximation to the Hessian, but with a reduced step size of 1 δ .
  • If the error has increased, the provisional update to the parameters is reversed and δ increased, δ → δ × 10, say.
  • The re-ordering doesn’t affect the computation cost of the algorithm; it only marginally increases the pre-computation time, although not asymptotically.
  • Overall the Levenberg-Marquardt algorithm is just as efficient as the Gauss-Newton inverse compositional algorithm.

4.6. Empirical Validation

  • The authors have described six variants of the inverse compositional image alignment algorithm: Gauss-Newton, Newton, Gauss-Newton steepest descent, diagonal Hessian (Gauss-Newton and Newton), and LevenbergMarquardt.
  • In Fig. 15(a) the authors plot the average frequency of convergence (computed over 5000 samples) with no intensity noise.
  • The steepest descent and diagonal Hessian approximations perform very poorly.
  • For affine warps the parameterization in Eq. (1) is not the only way.
  • As can be seen, all of the algorithms are roughly equally fast except the Newton algorithm which is much slower.

4.7. Summary

  • In this section the authors investigated the choice of the gradient descent approximation.
  • The authors have exhibited five alternatives: (1) Newton, (2) steepest descent, (3) diagonal approximation to the Gauss-Newton Hessian, (4) diagonal approximation to the Newton Hessian, and (5) Levenberg-Marquardt.
  • These three algorithms are also very sensitive to the estimation of the step size and the parameterization of the warps.
  • The most likely reason is the noise introduced in computing the second derivatives of the template.
  • Except for the Newton algorithm all of the alternatives are equally as efficient as the Gauss-Newton algorithm when combined with the inverse compositional algorithm.

4.8. Other Algorithms

  • These are not the only choices.the authors.
  • The focus of Section 4 has been approximating the Hessian: the Gauss-Newton approximation, the steepest descent approximation, and the diagonal approximations.
  • In Shum and Szeliski (2000) an algorithm is proposed to estimate the Gauss-Newton Hessian for the forwards compositional algorithm, but in an efficient manner.
  • One reason that computing the Hessian matrix is so time consuming is that it is a sum over the entire template.
  • Since the coefficients are constant they can be pre-computed.

5. Discussion

  • The authors have described a unifying framework for image alignment consisting of two halves.
  • The results of the first half are summarized in Table 6.
  • The algorithms differ in both their computational complexity and their empirical performance.
  • Overall the choice of which algorithm to use depends on two main things: (1) whether there is likely to me more noise in the template or in the input image and (2) whether the algorithm needs to be efficient or not.
  • The diagonal Hessian and steepest descent forwards algorithms are another option, but given their poor convergence properties it is probably better to use the inverse compositional algorithm even if the template is noisy.

6. Matlab Code, Test Images, and Scripts

  • Matlab implementations of all of the algorithms described in this paper are available on the World Wide Web at: http://www.ri.cmu.edu/projects/ project 515.html.
  • The authors have also included all of the test images and the scripts used to generate the experimental results in this paper.

Acknowledgments

  • The authors would like to thank Bob Collins, Matthew Deans, Frank Dellaert, Daniel Huber, Takeo Kanade, Jianbo Shi, Sundar Vedula, and Jing Xiao for discussions on image alignment, and Sami Romdhani for pointing out a couple of algebraic errors in a preliminary draft of this paper.
  • The authors would also like to thank the anonymous IJCV reviewers and the CVPR reviewers of Baker and Matthews (2001) for their feedback.
  • The research described in this paper was conducted under U.S. Office of Naval Research contract N00014-00-1-0915.

Did you find this useful? Give us your feedback

Figures (29)

Content maybe subject to copyright    Report

International Journal of Computer Vision 56(3), 221–255, 2004
c
2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
Lucas-Kanade 20 Years On: A Unifying Framework
SIMON BAKER AND IAIN MATTHEWS
The Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
simonb@cs.cmu.edu
iainm@cs.cmu.edu
Received July 10, 2002; Revised February 6, 2003; Accepted February 7, 2003
Abstract. Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most
widely used techniques in computer vision. Applications range from optical flow and tracking to layered motion,
mosaic construction, and face coding. Numerous algorithms have been proposed and a wide variety of extensions
have been made to the original formulation. We present an overview of image alignment, describing most of the
algorithms and their extensions in a consistent framework. We concentrate on the inverse compositional algorithm,
an efficient algorithm that we recently proposed. We examine which of the extensions to Lucas-Kanade can be used
with the inverse compositional algorithm without any significant loss of efficiency, and which cannot. In this paper,
Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent
approximation. In future papers, we will cover the choice of the error function, how to allow linear appearance
variation, and how to impose priors on the parameters.
Keywords: image alignment, Lucas-Kanade, a unifying framework, additive vs. compositional algorithms, for-
wards vs. inverse algorithms, the inverse compositional algorithm, efficiency, steepest descent, Gauss-Newton,
Newton, Levenberg-Marquardt
1. Introduction
Image alignment consists of moving, and possibly de-
forming, a template to minimize the difference between
the template and an image. Since the first use of im-
age alignment in the Lucas-Kanade optical flow al-
gorithm (Lucas and Kanade, 1981), image alignment
has become one of the most widely used techniques
in computer vision. Besides optical flow, some of its
other applications include tracking (Black and Jepson,
1998; Hager and Belhumeur, 1998), parametric and
layered motion estimation (Bergen et al., 1992), mo-
saic construction (Shum and Szeliski, 2000), medical
image registration (Christensen and Johnson, 2001),
and face coding (Baker and Matthews, 2001; Cootes
et al., 1998).
The usual approach to image alignment is gradi-
ent descent. A variety of other numerical algorithms
such as difference decomposition (Gleicher, 1997) and
linear regression (Cootes et al., 1998) have also been
proposed, but gradient descent is the defacto standard.
Gradient descent can be performed in variety of dif-
ferent ways, however. One difference between the var-
ious approaches is whether they estimate an additive
increment to the parameters (the additive approach
(Lucas and Kanade, 1981)), or whether they estimate
an incremental warp that is then composed with the
current estimate of the warp (the compositional ap-
proach (Shum and Szeliski, 2000)). Another difference
is whether the algorithm performs a Gauss-Newton, a
Newton, a steepest-descent, or a Levenberg-Marquardt
approximation in each gradient descent step.
We propose a unifying framework for image align-
ment, describing the various algorithms and their ex-
tensions in a consistent manner. Throughout the frame-
work we concentrate on the inverse compositional

222 Baker and Matthews
algorithm, an efficient algorithm that we recently pro-
posed (Baker and Matthews, 2001). We examine which
of the extensions to Lucas-Kanade can be applied to
the inverse compositional algorithm without any sig-
nificant loss of efficiency, and which extensions require
additional computation. Wherever possible we provide
empirical results to illustrate the various algorithms and
their extensions.
In this paper, Part 1 in a series of papers, we be-
gin in Section 2 by reviewing the Lucas-Kanade algo-
rithm. We proceed in Section 3 to analyze the quan-
tity that is approximated by the various image align-
ment algorithms and the warp update rule that is used.
We categorize algorithms as either additive or compo-
sitional, and as either forwards or inverse.Weprove
the first order equivalence of the various alternatives,
derive the efficiency of the resulting algorithms, de-
scribe the set of warps that each alternative can be
applied to, and finally empirically compare the algo-
rithms. In Section 4 we describe the various gradient de-
scent approximations that can be used in each iteration,
Gauss-Newton, Newton, diagonal Hessian, Levenberg-
Marquardt, and steepest-descent (Press et al., 1992).
We compare these alternatives both in terms of speed
and in terms of empirical performance. We conclude
in Section 5 with a discussion. In future papers in this
series (which will be made available on our website
http://www.ri.cmu.edu/projects/project
515.html), we
will cover the choice of the error function, how to al-
low linear appearance variation, and how to add priors
on the parameters.
2. Background: Lucas-Kanade
The original image alignment algorithm was the Lucas-
Kanade algorithm (Lucas and Kanade, 1981). The goal
of Lucas-Kanade is to align a template image T (x)toan
input image I (x), where x = (x, y)
T
is a column vector
containing the pixel coordinates. If the Lucas-Kanade
algorithm is being used to compute optical flow or to
track an image patch from time t = 1totime t = 2,
the template T (x)isanextracted sub-region (a 5 × 5
window, maybe) of the image at t = 1 and I (x)isthe
image at t = 2.
Let W(x; p) denote the parameterized set of allowed
warps, where p = ( p
1
,... p
n
)
T
is a vector of parame-
ters. The warp W(x; p) takes the pixel x in the coordi-
nate frame of the template T and maps it to the sub-pixel
location W(x; p)inthe coordinate frame of the image
I .Ifweare computing optical flow, for example, the
warps W(x; p) might be the translations:
W(x; p) =
x + p
1
y + p
2
(1)
where the vector of parameters p = (p
1
, p
2
)
T
is then
the optical flow. If we are tracking a larger image patch
moving in 3D we may instead consider the set of affine
warps:
W(x; p) =
(1 + p
1
) · x + p
3
· y + p
5
p
2
· x + (1 + p
4
) · y + p
6
=
1 + p
1
p
3
p
5
p
2
1 + p
4
p
6
x
y
1
(2)
where there are 6 parameters p = (p
1
, p
2
, p
3
, p
4
, p
5
,
p
6
)
T
as, for example, was done in Bergen et al. (1992).
(There are other ways to parameterize affine warps.
Later in this framework we will investigate what is
the best way.) In general, the number of parameters n
may be arbitrarily large and W(x; p) can be arbitrar-
ily complex. One example of a complex warp is the
set of piecewise affine warps used in Active Appear-
ance Models (Cootes et al., 1998; Baker and Matthews,
2001) and Active Blobs (Sclaroff and Isidoro, 1998).
2.1. Goal of the Lucas-Kanade Algorithm
The goal of the Lucas-Kanade algorithm is to mini-
mize the sum of squared error between two images,
the template T and the image I warped back onto the
coordinate frame of the template:
x
[
I (W(x; p)) T (x)
]
2
. (3)
Warping I back to compute I (W(x; p)) requires inter-
polating the image I at the sub-pixel locations W(x; p).
The minimization of the expression in Eq. (3) is per-
formed with respect to p and the sum is performed
over all of the pixels x in the template image T (x).
Minimizing the expression in Eq. (1) is a non-linear
optimization task even if W(x; p)islinear in p because
the pixel values I (x) are, in general, non-linear in x.
In fact, the pixel values I (x) are essentially un-related
to the pixel coordinates x.Tooptimize the expression
in Eq. (3), the Lucas-Kanade algorithm assumes that
a current estimate of p is known and then iteratively

Lucas-Kanade 20 Years On: A Unifying Framework 223
solves for increments to the parameters p; i.e. the
following expression is (approximately) minimized:
x
[I (W(x; p + p)) T (x)]
2
(4)
with respect to p, and then the parameters are up-
dated:
p p + p. (5)
These two steps are iterated until the estimates of the
parameters p converge. Typically the test for conver-
gence is whether some norm of the vector p is below
a threshold ; i.e. p≤.
2.2. Derivation of the Lucas-Kanade Algorithm
The Lucas-Kanade algorithm (which is a Gauss-
Newton gradient descent non-linear optimization al-
gorithm) is then derived as follows. The non-linear ex-
pression in Eq. (4) is linearized by performing a first
order Taylor expansion on I (W(x; p + p)) to give:
x
I (W(x; p)) + I
W
p
p T (x)
2
. (6)
In this expression, I = (
I
x
,
I
y
)isthe gradient of
image I evaluated at W(x; p); i.e. I is computed
in the coordinate frame of I and then warped back onto
the coordinate frame of T using the current estimate of
the warp W(x; p). The term
W
p
is the Jacobian of the
warp. If W(x; p) = (W
x
(x; p), W
y
(x; p))
T
then:
W
p
=
W
x
p
1
W
x
p
2
...
W
x
p
n
W
y
p
1
W
y
p
2
...
W
y
p
n
. (7)
We follow the notational convention that the partial
derivatives with respect to a column vector are laid out
as a row vector. This convention has the advantage that
the chain rule results in a matrix multiplication, as in
the expression in Eq. (6). For example, the affine warp
in Eq. (2) has the Jacobian:
W
p
=
x 0 y 010
0 x 0 y 01
. (8)
Minimizing the expression in Eq. (6) is a least squares
problem and has a closed from solution which can be
Figure 1. The Lucas-Kanade algorithm (Lucas and Kanade, 1981)
consists of iteratively applying Eqs. (10) and (5) until the estimates
of the parameters p converge. Typically the test for convergence
is whether some norm of the vector p is below a user specified
threshold . Because the gradient I must be evaluated at W(x; p)
and the Jacobian
W
p
must be evaluated at p, all 9 steps must be
repeated in every iteration of the algorithm.
derived as follows. The partial derivative of the expres-
sion in Eq. (6) with respect to p is:
2
x
I
W
p
T
I (W(x; p)) + I
W
p
p T (x)
(9)
where we refer to I
W
p
as the steepest descent im-
ages. (See Section 4.3 for why.) Setting this expression
to equal zero and solving gives the closed form solution
for the minimum of the expression in Eq. (6) as:
p = H
1
x
I
W
p
T
[T (x) I (W(x; p))]
(10)
where H is the n × n (Gauss-Newton approximation
to the) Hessian matrix:
H =
x
I
W
p
T
I
W
p
. (11)
For reasons that will become clear later we refer to
x
[I
W
p
]
T
[T (x) I(W(x; p))] as the steepest de-
scent parameter updates. Equation (10) then expresses
the fact that the parameter updates p are the steepest
descent parameter updates multiplied by the inverse
of the Hessian matrix. The Lucas-Kanade algorithm
(Lucas and Kanade, 1981) then consists of iteratively
applying Eqs. (10) and (5). See Figs. 1 and 2 for a sum-
mary. Because the gradient I must be evaluated at

224 Baker and Matthews
Figure 2.Aschematic overview of the Lucas-Kanade algorithm (Lucas and Kanade, 1981). The image I is warped with the current estimate
of the warp in Step 1 and the result subtracted from the template in Step 2 to yield the error image. The gradient of I is warped in Step 3, the
Jacobian is computed in Step 4, and the two combined in Step 5 to give the steepest descent images. In Step 6 the Hessian is computed from
the steepest descent images. In Step 7 the steepest descent parameter updates are computed by dot producting the error image with the steepest
descent images. In Step 8 the Hessian is inverted and multiplied by the steepest descent parameter updates to get the final parameter updates p
which are then added to the parameters p in Step 9.
W(x; p) and the Jacobian
W
p
at p, they both in gen-
eral depend on p.For some simple warps such as the
translations in Eq. (1) and the affine warps in Eq. (2)
the Jacobian can sometimes be constant. See for ex-
ample Eq. (8). In general, however, all 9 steps of the
algorithm must be repeated in every iteration because
the estimates of the parameters p vary from iteration to
iteration.
2.3. Requirements on the Set of Warps
The only requirement on the warps W(x; p)isthat they
are differentiable with respect to the warp parameters p.
This condition is required to compute the Jacobian
W
p
.
Normally the warps are also (piecewise) differentiable
with respect to x,but even this condition is not strictly
required.

Lucas-Kanade 20 Years On: A Unifying Framework 225
Table 1. The computation cost of one iteration of the Lucas-Kanade algorithm. If n is the number of warp
parameters and N is the number of pixels in the template T , the cost of each iteration is O(n
2
N + n
3
). The
most expensive step by far is Step 6, the computation of the Hessian, which alone takes time O(n
2
N ).
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 Total
O(nN)O(N )O(nN)O(nN)O(nN)O(n
2
N )O(nN)O(n
3
)O(n)O(n
2
N + n
3
)
2.4. Computational Cost of the
Lucas-Kanade Algorithm
Assume that the number of warp parameters is n and the
number of pixels in T is N . Step 1 of the Lucas-Kanade
algorithm usually takes time O(nN). For each pixel x
in T we compute W(x; p) and then sample I at that
location. The computational cost of computing W(x; p)
depends on W but for most warps the cost is O(n) per
pixel. Step 2 takes time O(N ). Step 3 takes the same
time as Step 1, usually O(nN). Computing the Jacobian
in Step 4 also depends on W but for most warps the cost
is O(n) per pixel. The total cost of Step 4 is therefore
O(nN). Step 5 takes time O(nN), Step 6 takes time
O(n
2
N ), and Step 7 takes time O(nN). Step 8 takes
time O(n
3
)toinvert the Hessian matrix and time O(n
2
)
to multiply the result by the steepest descent parameter
updated computed in Step 7. Step 9 just takes time
O(n)toincrement the parameters by the updates. The
total computational cost of each iteration is therefore
O(n
2
N +n
3
), the most expensive step being Step 6. See
Table 1 for a summary of these computational costs.
3. The Quantity Approximated and the Warp
Update Rule
In each iteration Lucas-Kanade approximately mini-
mizes
x
[
I (W(x; p + p)) T (x)
]
2
with respect to
p and then updates the estimates of the parameters in
Step 9 p p + p. Perhaps somewhat surprisingly
iterating these two steps is not the only way to minimize
the expression in Eq. (3). In this section we outline 3
alternative approaches that are all provably equivalent
to the Lucas-Kanade algorithm. We then show empiri-
cally that they are equivalent.
3.1. Compositional Image Alignment
The first alternative to the Lucas-Kanade algorithm is
the compositional algorithm.
3.1.1. Goal of the Compositional Algorithm. The
compositional algorithm, used most notably by Shum
and Szeliski (2000), approximately minimizes:
x
[I (W(W(x; p); p)) T (x)]
2
(12)
with respect to p in each iteration and then updates
the estimate of the warp as:
W(x; p) W(x; p) W(x; p), (13)
i.e. the compositional approach iteratively solves for
an incremental warp W(x; p) rather than an additive
update to the parameters p.Inthis context, we refer to
the Lucas-Kanade algorithm in Eqs. (4) and (5) as the
additive approach to contrast it with the compositional
approach in Eqs. (12) and (13). The compositional and
additive approaches are proved to be equivalent to first
order in p in Section 3.1.5. The expression:
W(x; p) W(x; p) W(W(x; p); p) (14)
is the composition of 2 warps. For example, if W(x; p)
is the affine warp of Eq. (2) then:
W(x; p) W(x; p)
=
(1 + p
1
) · ((1 + p
1
) · x + p
3
· y + p
5
)
+ p
3
· (p
2
· x + (1 + p
4
) · y + p
6
)
+ p
5
p
2
· ((1 + p
1
) · x + p
3
· y + p
5
)
+ (1 + p
4
) · (p
2
· x + (1 + p
4
) · y
+ p
6
) + p
6
,
(15)

Citations
More filters
Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,828 citations

Journal ArticleDOI
TL;DR: The first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches is presented.
Abstract: We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera

3,772 citations

Journal ArticleDOI
TL;DR: An extensive evaluation of the state of the art in a unified framework of monocular pedestrian detection using sixteen pretrained state-of-the-art detectors across six data sets and proposes a refined per-frame evaluation methodology.
Abstract: Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple data sets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: 1) We put together a large, well-annotated, and realistic monocular pedestrian detection data set and study the statistics of the size, position, and occlusion patterns of pedestrians in urban scenes, 2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and 3) we evaluate the performance of sixteen pretrained state-of-the-art detectors across six data sets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.

3,170 citations

Journal ArticleDOI
TL;DR: A novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection, and develops a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: P-expert estimates missed detections, and N-ex Expert estimates false alarms.
Abstract: This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector's errors and updates it to avoid these errors in the future. We study how to identify the detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: (1) P-expert estimates missed detections, and (2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

3,137 citations

References
More filters
Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

Book ChapterDOI
02 Jun 1998
TL;DR: A novel method of interpreting images using an Active Appearance Model (AAM), a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example.
Abstract: We demonstrate a novel method of interpreting images using an Active Appearance Model (AAM). An AAM contains a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example. During a training phase we learn the relationship between model parameter displacements and the residual errors induced between a training image and a synthesised model example. To match to an image we measure the current residuals and use the model to predict changes to the current parameters, leading to a better fit. A good overall match is obtained in a few iterations, even from poor starting estimates. We describe the technique in detail and give results of quantitative performance tests. We anticipate that the AAM algorithm will be an important method for locating deformable objects in many applications.

3,905 citations

Journal ArticleDOI
TL;DR: This work proposes an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm and shows that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp.
Abstract: Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities. Fitting an AAM to an image consists of minimising the error between the input image and the closest model instances i.e. solving a nonlinear optimisation problem. We propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. We show that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp, typically a 2D similarity transformation. We evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.

1,775 citations

Book ChapterDOI
19 May 1992
TL;DR: In this paper, a hierarchical estimation framework for the computation of diverse representations of motion information is described, which includes a global model that constrains the overall structure of the motion estimated, a local model that is used in the estimation process, and a coarse-fine refinement strategy.
Abstract: This paper describes a hierarchical estimation framework for the computation of diverse representations of motion information. The key features of the resulting framework (or family of algorithms) are a global model that constrains the overall structure of the motion estimated, a local model that is used in the estimation process, and a coarse-fine refinement strategy. Four specific motion models: affine flow, planar surface flow, rigid body motion, and general optical flow, are described along with their application to specific examples.

1,501 citations

Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Lucas-kanade 20 years on: a unifying framework" ?

The authors present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework. The authors concentrate on the inverse compositional algorithm, an efficient algorithm that they recently proposed. The authors examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which can not. In this paper, Part 1 in a series of papers, the authors cover the quantity approximated, the warp update rule, and the gradient descent approximation. 

In future papers in this series the authors will extend their framework to cover these choices and, in particular, investigate whether the inverse compositional algorithm is compatible with these extensions of the Lucas-Kanade algorithm. 

The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and the image The authorwarped back onto the coordinate frame of the template:∑ x [I (W(x; p)) − T (x)]2 . 

The forwards compositional algorithm has the slight advantage that the Jacobian is constant, and is in general simpler so is less likely to be computed erroneously (Shum and Szeliski, 2000). 

Since the forwards algorithms compute the gradient of The authorand the inverse algorithms compute the gradient of T , it is not surprising that when noise is added to The authorthe inverse algorithms converge better (faster and more frequently), and conversely when noise is added to T the forwards algorithms converge better. 

The inverse additive algorithm can be applied to very few warps, mostly simple 2D linear warps such as translations and affine warps. 

In most of the steps, the cost is a function of k rather than n, but most of the time k = n in the Hager-Belhumeur algorithm anyway. 

Most notably, the cost of composing the two warps in Step 9 depends on W but for most warps the cost is O(n2) or less, including for the affine warps in Eq. (16).3.1.5. Equivalence of the Additive and Compositional Algorithms. 

The partial derivative of the expression in Eq. (6) with respect to p is:2 ∑x[ ∇I ∂W∂p]T[ The author(W(x; p)) + ∇I ∂W∂p p − T (x) ] (9)where the authors refer to ∇I ∂W ∂p as the steepest descent images. 

Since the 6 parameters in the affine warp have different units, the authors use the following error measure rather than the errors in the parameters. 

The initial goal of the Hager-Belhumeur algorithm is the same as the Lucas-Kanade algorithm; i.e. to minimize ∑ x [ The author(W(x; p + p)) − T (x) ]2 with respect to p and then update the parameters p ← p + p. Rather than changing variables like in Section 3.2.5, the roles of the template and the image are switched as follows. 

Since on warps like affine warps the algorithms are almost exactly the same, there is no reason to use the inverse additive algorithm. 

When equal noise is added to both images, the forwards algorithms perform marginally better than the inverse algorithms because the inverse algorithms are only first-order approximations to the forwards algorithms.