scispace - formally typeset
Open AccessJournal ArticleDOI

Using generative models for handwritten digit recognition

Reads0
Chats0
TLDR
A method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline using a novel elastic matching procedure based on the expectation maximization algorithm.
Abstract
We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the expectation maximization algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages: 1) the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style; 2) the generative models can perform recognition driven segmentation; 3) the method involves a relatively small number of parameters and hence training is relatively easy and fast; and 4) unlike many other recognition schemes, it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated that our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is that it requires much more computation than more standard OCR techniques.

read more

Content maybe subject to copyright    Report

592
IEEE
TRANSACTIONS
ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL.
18,
NO.
6,
JUNE
1996
nerative
Models
for
andwritten
Digit
Recognition
Michael
Revow,
Christopher
K.I.
Williams, and Geoffrey
E.
Hinton
Abstract-We describe a method
of
recognizing handwritten digits
by
fitting
generative models that are
built
from
deformable
B-
splines
with
Gaussian
"ink
generators" spaced along the length
of
the spline. The splines are adjusted using a novel elastic
matching procedure based
on
the Expectation Maximization
(EM)
algorithm
that
maximizes the likelihood
of
the model generating
the data. This approach has many advantages.
1)
After identifying the model most likely
to
have generated the data, the system
not
only
produces a classification
of
the
digit
but
also a
rich
description
of
the instantiation parameters which can yield information such
as the
writing
style.
2)
During
the process
of
explaining
the image, generative models can perform recognition driven segmentation.
3)
The method involves a relatively small number
of
parameters and
hence
training is relatively easy and
fast.
4)
Unlike many other
recognition schemes,
it
does
not
rely
on
some
form
of
pre-normalization
of
input
images,
but
can handle arbitrary scalings,
translations and
a
limited degree
of
image rotation. We have demonstrated
our
method
of
fitting
models
to
images does
not
get
trapped
in
poor
local minima. The main disadvantage
of
the method is
it
requires much more computation than more standard
OCR
techniques.
Index
Terms-Deformable model, elastic net,
optical
character recognition, generative model, probabilistic model, mixture model
+
1
INTRODUCTION
HE
conventional statistical approach to performing clas-
T
sification is to use a discriminant classifier that con-
structs boundaries which discriminate between objects of
different categories. An alternative approach is to use gen-
erative models. This paper explores the use of generative
models for recognizing handwritten digits. In the simplest
version there is one model for each digit. Given an image of
an unidentified digit the idea is to search for the model that
is most likely to have generated that image. This approach
has the attractive property that, in addition to providing a
label, it can also say something about the particular way in
which the digit is instantiated.
So,
in some sense, it explains
the image rather than just labeling it. This is important
when the recognizer forms part of a larger computer vision
system since there may be interest in more than just the
labels.
For
example, given a roughly segmented image of a
single digit we may want to know which parts
of
the image
represent the digit and which parts are caused by noise
or
by some incorrectly segmented neighboring digit. We may
also want to know the pose of the digit (i.e., its position,
size, orientation, shear, and elongation)
so
that we can
check for consistency with its neighbors.
e M.
Revow
and
G.E.
Hinton
are
with the Department of Computer Science,
University
of
Toronto,
6
Kings College Road, Toronto, Ont.,
M5S
3H5
Canada.
E-mail:
irezjow;
hintoniacs.
toronto.edu.
0
C.K.I.
Williams
is
with
the Department
of
Coinputer Science and Applied
Mathematics, Aston University, Birmingham
B4
7ET,
UK.
E-mail:c.k.i.wi22iams~aston.ac.edu.
Manuscript received Aug.
31,
1994;
revised Apr.
22,1996.
Recommended
fo?
acceptance by R. Kasturi.
For
infovmation
on
obfaining repvints
of
this
article,
please
send
e-mail to:
tvanspami~computer.orX, and
yeference IEEECS
Log
Number
P96047.
We chose unconstrained handwritten digit recognition
because it is a task of great practical importance for which
there are standard databases that allow different
ap-
proaches to be compared. It also has the attractive property
that there are only ten different classes
so
it is feasible to
explore all ten different ways of generating each unidenti-
fied digit image. Although handwritten digit recognition is
an easier task than general three-dimensional object recog-
nition, it retains, albeit in reduced form, many of the prob-
lems associated with general computer vision such
as
vari-
ability in shape and pose, overlapping objects and both
structured and unstructured noise.
The paper is organized as follows: Following a brief re-
view of some past approaches to optical character recogni-
tion, we discuss elastic models which have been used for at
least two decades to deal with signal and image variability.
In Section
3,
we introduce our basic elastic model for
handwritten digits. We use the probabilistic interpretation
of
elastic models introduced in an analysis of the elastic net
algorithm
[l].
In Section
4,
we
show
how the underlying
parameters of the models may be learned. Section
5
dis-
cusses refinements of the basic ideas. Section
6
describes the
performance of our system on a realistic database of hand-
written digits. The final
two
sections discuss some implica-
tions of the approach and present conclusions.
2
REVIEW
OF
PAST
WORK
We will not attempt to review here the voluminous
work
on optical character recognition that has spanned more
than three decades (useful reviews can be found in
[Z],
[3]).
However, it is helpful to summarize the trends. Most re-
searchers have adopted the classical pattern recognition
approach in which image pre-processing is followed by
01
62-8828/96$05
00
01
996
IEEE
Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

REVOW ET AL.: USING GENERATIVE MODELS FOR HANDWRITTEN DIGIT RECOGNITION
593
feature extraction and classification. There have been many
variations, but these may be roughly described using
two
dimensions: statistical/structural and global/local.'
As
an
example of
a
global, statistical approach, [4] extracts eight
central and two raw moments
as
features. On the other
hand the recognizer used by Lam and Suen
[5]
uses local
features. They extract local geometric primitives consisting
of line segments and convex polygons and use these as in-
put to
a
structural classifier. Others extract topological fea-
tures which depend on the global properties of the data.
For example, Shridhar and Badreldin [6] use features de-
rived from the character profiles in the image. They then
feed these features into
a
tree classifier. More recently there
have been
a
number
[7],
[SI,
[9]
of successful attempts to
automatically learn appropriate local features using feed
forward neural networks. Some researchers
[lo],
[3],
[11]
have boosted performance using combinations of classifiers.
Significant progress has been made in
OCR.
On
a
stan-
dard database of lightly constrained pre-segmented hand-
written digits the very best systems achieve error rates of
about 1.5% with
no
rejections [12]. But more work is re-
quired to match human performance, especially on unseg-
mented strings of digits. We hypothesize that in order to
achieve human performance without astronomically large
training sets, recognizers must embed some form of prior
knowledge about the objects they expect to find in images.
This is common in structural systems but rarer in statistical
systems. There have been some statistical systems that al-
low for typical digit transformations [13], but discriminant
classifiers generally do not address the issue of explicitly
"explaining the data". This leads to a number of weak-
nesses that may limit the achievable performance:
1)
Conventionally,
a
recognizer does not help to guide
segmentation by dividing the image into significant
and irrelevant parts.
So
a
system typically [14] tries
many candidate segmentations and all the recognizer
can indicate is whether
a
particular segmentation
leads to confident recognition. In general, this type of
hypothesize-and-test search procedure is much less
efficient than
a
procedure that can use information
from the recognition to refine the segmentation hy-
pothesis.
2)
Statistical recognizers can occasionally confidently
classify images that do not look anything like
a
char-
acter [15]. This can be ameliorated by training the
system to reject junk images [16], but it is hard to get a
good sample of rare types of junk.
3)
Systems that do not incorporate any prior knowledge
about the shapes of characters must learn all their
knowledge from the training examples. We already
know that digits are composed of one-dimensional
strokes and
so
it seems wasteful to use up training
data to learn this.
4)
A
recognizer that "understands" an image should be
able to not only label it with the correct class, but
should also be able to return the instantiation pa-
l.
These are broad terms applied to the object recognizer
as
a complete
entity. Obviously, feature selection and classifier
design
may be independ-
ently described.
rameters such
as
the position, size, orientation, shear
and elongation. For handwritten digits we may also
want information on the writing style since this is oc-
casionally crucial in disambiguating other digits in
the same string.
Motivated by the success of model-based shape recogni-
tion in overcoming some of these shortcomings
[17],
we
have investigated the use of deformable elastic models for
handwritten digit recognition [181. Models of this general
type have been used in computer vision since the early
1970s.
Ullmann [19] discusses the idea of finding
a
distor-
tion mapping from
a
test image to
a
stored template such
that there is
correspondence
between like features rather than
exact matches. Widrow
[20]
also suggests the idea of using
rubber templates to achieve fuzzy matches to a variety of
natural objects and waveforms.
Burr presents an iterative framework for computing
elastic matches in dot and grey-scale images
[21]
and line
drawings [22]. Using
a
coarse-to-fine matching strategy he
shows how an image can be progressively deformed under
the influence of misalignment force fields to fit another im-
age.
In
a
later version 1231, global size and rotation adjust-
ments were included. The method has been adapted to
match tomographic [24] and thermographic images [25].
One weakness with the approach is that it does not allow
the amount of deformation to be traded off against the fi-
delity of the data match. It also
has
no principled way of
handling noise or missing data.
Bajcsy and co-workers
[26],
[27[
integrate the notion of
a
trade-off between data fit and d(2formation in their mul-
tiresolution elastic matching scheme for registering an im-
age with respect to
a
template. They consider
a
test image
to be drawn on an elastic membrane. The membrane is
subjected to external forces which are proportional
to
the
gradient of the similarity measure. The system iterates until
an equilibrium exists between the forces trying to increase
the similarity measure
(a
measur'e of cross-correlation be-
tween the two images) and the restraining forces arising
from the elastic properties of the membrane. The mul-
tiresolution approach is attractive
as
it initially concentrates
on achieving large-scale registration between the images
with fine-scale matching coming later in the process.
Early work by Fischler and Elschlager [28] described
a
model with local (data fit) and global (model deformation)
energy terms. Their model is composed of (rigid) features
whose spatial arrangement is constrained by springs and
hence the deformation is related
Ito
the energy required to
stretch or compress these spring Their matching proce-
dure works on
a
coarse scale, but it is scale dependent and
degrades in the presence of noise [29]. The facial feature
model example they used has bee:n extended by Yuille [29],
who constructs
a
more detailed descriptions of the feature
shapes and global matching criteria in terms of peak, valley
and edge intensities. In addition, the original dynamic pro-
gramming search was replaced .with
a
gradient method.
From an image explanation point of view this type of
matching scheme is deficient as it does not account for the
entire image. Instead
of
ensuring that every part of the
im-
age
is explained by the model (or explicitly attributed to
some additional noise process) the matching process tries to
Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

594 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL.
18,
NO. 6, JUNE 1996
ensure that every part of the
model
is supported by some
part of the image and a match may be good even though it
leaves large parts of the image unaccounted for. "Snakes"
1301
use different shape constraints, but also attempt to
match each part of the model to some part of the image
rather than vice versa. Point distribution models
[31],
rec-
ognize the importance of doing both types of matching, i.e.,
the model must be
supported
by the data and the model
should
explain
the data.
The digit models we propose have a sound generative
probabilistic basis and explicitly incorporate much prior
knowledge of handwritten digits, for example, that they are
made up of strokes and that they are globally invariant to
affine transformations, unlike other implementations
[
131
which attempt to achieve only local invariance.
ATCHING
ELASTIC
SPLINE
MODELS
TO IMAGES
3.1
Qverview
Each of the ten digits has its own elastic model.*
A
digit-
image is recognized by choosing the elastic model which
best matches the image. During the matching process, the
model is deformed in an attempt to ensure that every piece
of ink in the image is close to some part of the model. The
fidelity of the final match depends on the amount of de-
formation of the model, the amount of ink that is attributed
to noise, and the distance of the remaining ink from the
deformed model.
Unlike the approach taken in many
OCR
systems, we do
not pre-process images in order to remove the effects
of
translation, scale, rotations, shear, etc. Instead we handle
arbitrary global affine transformations of the image by de-
fining the model in an "object-based frame which
is
mapped through an affine transformation into the "image-
frame." The affine transformation is refined during the
matching process
so
that knowledge about the shape of the
digit can influence the choice of affine transformation. This
is not possible if normalization precedes recognition. Affine
transformations are not penalized during the matching
process,
so
deformations are only used to handle true
variations in shape that cannot be accommodated by global
affine transformations.
Similarly, we do not assume that the image has been per-
fectly segmented. The matching process decides which
pieces of the model correspond to which pieces of the im-
age and
it
can explicitly reject some parts of the image as
noise. Thus knowledge about the shape can be used to re-
fine the segmentation.
What we have just described is an instantiation of the
general framework of
generative
or
latent variable
models
[32].
The key idea is that the manifest variables are attrib-
utable to a smaller number of underlying
hidden
or
latent
variables. In our case, the manifest variables are the pixels
and the hidden variables are the positions of the elastic
model's control points in the image-frame. Section
3.2
describes the elastic model. Section 3.3 gives the under-
lying probabilistic interpretation of how the model gener-
ates an image from the hidden variables (required for com-
2.
C-code implementing the model is available
from
http:/
/www.cs.toronto.edu/-revow.
puting the log-likelihood
of
the image given a model in-
stantiation). An algorithm for the more difficult problem
of inferring the hidden variables from the manifest vari-
ables is presented in Section
3.4.
3.2
Elastic
Spline
Models
We model each digit with a uniform, cubic B-spline
[33].
Each model has at most
8
control
point^.^
Let
X
=
(xl,
x,,
.
.
.,
x,}
=
{x,,
x2,
. . .,
x2,-,,
xZn}
denote an instantiation of the model
in
terms
of
its
n
control points. The ith control point is lo-
cated at
(x*~-,,
xZi).
Similarly,
H
=
{
h,,
...,
h,}
indicates the
home
or undeformed control point locations and
Y
the affine
transformation with its six degrees of freedom. The location
of any point,'
s(b),
on the spline can be written as a linear
function
[33]
of the control points locations.'
Because of the local control feature
of
B-splines some of
the coefficients,
yl(b),
will be zero. For future convenience,
we
also
write
(1)
as:
To generate an ideal example of a digit we put the con-
trol points at their home locations. To deform the digit we
move the control points away from their home locations.
Assuming a Gaussian distribution for these deformations,
the probability of the
n
control points lying within a small
hypervolume
6
Vis approximately:
where
C
is the covariance matrix of the distribution. Thus, a
single deformable model defines an entire probability dis-
tribution across shape instances.
Following
[l]
and [34] we define the
deformation energy,
Ed+
to be the negative log probability of the deformation.
1
1
(3)
E~,(x)
=
(x
-
H)~
c-'(x
-
H)
+
ioglCl+ const
Splines are a convenient method for modeling hand-
written digits as it is easy to incorporate topological
variations. For example, small changes in the relative
locations of the control points can turn the loop
of
a
2
into a cusp or an open bend (Fig.
1).
This advantage of
spline models is pointed out in
[35]
where a different
kind of spline is used to fit on-line character data by di-
rectly locating candidate control points on strokes
in
the
image. It is lost
(as
pointed out in
[36])
when models
based more directly on Durbin and Willshaw's elastic net
are employed [37].
3.
The model of
a
one needs only three control points while the seven-
4.
The spline
is
a one-dimensional continuous curve parameterized by
b.
5.
We treat the first and last control points as if they are doubled.
model uses five.
In the development, we consider
a
discrete version.
Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

REVOW ET AL.: USING GENERATIVE MODELS FOR HANDWRlnEN DIGIT RECOGNITION
595
0
Either (with probability
a)
adcl a randomly positioned
noise pixel to the image-Or pick
a
bead at random and
generate an inked pixel from the Gaussian distribution de-
fined by the bead.
This is not a good generative model of the way in which
handwritten digits are actually produced. If, for example,
the beads have large variances the inked pixels in the image
0
(3
L
8
Fig.
1.
Illustrates the flexibility
of
spline models to capture topological
variations. The large loop
of
the
2
can smoothly decrease in size and
eventually become a cusp. The control points are labelled
1
through
8.
will have the correct overall shape but will be disconnected
and much too scattered. However, the generative model is
useful for recognizing digits
as
explained in the following
sections.
3.4
Fitting
a
Model to an Image
In
this section,
a
Bayesian interpretation of the fitting process
is adopted and we demonstrate how using
a
maximum
posterior framework (see, for example
[38])
yields a practi-
cal algorithm. First we need to refine the notation; A super-
script
'
or
'
is used to qualify if
a
quantity is in the object or
image frame respectively.
so
x"
represents control points
locations in the
object
frame. We can parameterize
a
model
instantiation in the image frame by
a
=
(X)
,Y.
To
classify an image,
I,
specified
as
a
vector of locations
of
its
NI
inked pixels
{zl,
...,
zNJ,
each of
m
models is fitted
to the data and the model that best "explains" the image is
chosen. Using
a
uniform prior over all digits, the posterior
probability,
P(m
I
I),
for each model is proportional to the
evidence,
P(I1
m):
0
@
3.3
Generative Models
Although we use our digit models for recognition, it is
helpful to consider how we would use them for generating
images. The generative model is an elaboration of the prob-
abilistic interpretation
of
the elastic net given in
[l].
To gen-
erate a noisy image of
a
particular digit class, run the fol-
lowing procedure:
1)
Pick
a
deformation of the model (i.e., move the con-
trol points away from their home locations) to give a
particular realization
X.
This defines the spline in ob-
ject-based coordinates. The log probability of picking
a
deformation is proportional to the quadratic term in
(3).
It is important that the deformation is measured
in object-based coordinates.
2)
Pick an affine transformation6 from the model's in-
trinsic reference frame to the image frame (i.e., pick a
size, position, orientation, slant and elongation for the
digit).
3)
Map the spline into image coordinates and place
beads uniformly along its length. Each bead is a cir-
cular Gaussian ink generator. The number of beads
and their variance
can
easily be changed without
changing the spline itself. Typically the variance is
chosen
so
that the bead centers are
two
standard de-
viations apart.
4)
Repeat many times:
6.
Using
our
prior knowledge that ones tend to be stroke-like, we used a
similarity transformation
for
the one-model.
P(Ilm)
=
j
P(Ila,
m)P(alm)da
(4)
Performing the integration over instantiation parameter
space is infeasible,
so
instead we compute the most prob-
able parameter values
(a').'
The evidence is approximated
by the height of the posterior peak
(P(IIa*,
m)
P(a
Im)),
multiplied by the volume
of
the parameter space under the
peak. The negative logarithm of the evidence is then:
(5)
where
K
is the logarithm
of
the volume term. When the
posterior is well modelled by a Gaussian, then
K
=
5
log
2z
-
*log)3fl,
with
3f=
-
V
V
log
P(a
I
I,
m)
the
Hessian evaluated at
as.
In the sequel we treat
K
as
a
con-
stant, but we allow it to be
a
different constant for each
model (see Section
6).
The second term is just
€dei
(3).
The
first term is the log-likelihood
o'f
the image given
a
par-
ticular instantiated model. We refer to this
as
the
data
fit
(Eft).
This leads to a convenient (objective function consist-
ing of just two energy terms:
-
log
P(1
I
m)
--
-
log
P(1
I
a*,
m)
-
log
P(a"
I
m)
-
K
If each inked pixel
zk
in the image is generated inde-
pendently from
a
distribution defined by
B
circularly sym-
metric Gaussian beads, each withL
a
mean
s(b)
and variance
o;,
and a uniform noise field, then the data fit term is the
sum of log probabilities of each inked pixel.
7.
This
is
reasonable
for
this problem because there will usually only be
one setting
of
the control points and affine transformation that will provide
a good fit.
Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

596
IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.
18,
NO.
6,
JUNE
1996
k=l
where
P,
is the probability of inking pixel
k:
(7)
(9)
with
N
the number of pixels which the uniform noise field
is distributed over (normally the whole image) and
zn
the
mixing proportion of
a
uniform noise field. Using
(7)
to
compute
Efit
has the undesirable property that it depends
on the number of inked pixels in the image. For example,
a
simple resizing of the image, will change
Efit
whereas
Edqfi
being defined in object based frame, is invariant to scale
changes. In order to mitigate this, we allow each pixel to
have its own weight
Wk.
Thus, we compute
Efit
using:
N,
Normally we set
W,
=
A/
NI
where
A
is
a
constant.
This
has
the desired effect of ensuring that all images have the same
total weight of
ink
and therefore about the same tradeoff be-
tween
E),
and
Ed$
regardless of the number of inked pixels.
However, it is also possible that a bottom up processor could
assign different weights to pixels based on other knowledge.
We assume that the deformations and affine are independ-
ent (for example, the size of
a
digit or its location in the image
is
dkely to be correlated with its style),
so
the second term
in
(5)
can be factorized into the sum of
Ed$
and
a
term involving
the affine parameters. During the fitting procedure, we treat
the affine parameters as if they have a uniform prior. How-
ever, in Section
6,
we show how solutions with unusual affines
may be penalized after the fitting procedure is complete.
The objective in fitting a model to an image is to find the
a
which minimizes
Et,:.
We start with zero deformations and
an initial guess for the affine parameters which ensure that
the control points are mapped within an upright rectangular
box around the inked pixels in the image.
A
small number of
beads with equal, large variance are placed along the spline.
These large variance beads form
a
broad, smooth ridge
of
high ink-probability along the spline. Because
of
the high
variance, the beads are attracted to inked pixels even if they
are fairly far away
so
the spline is quickly pulled towards the
data. During the fitting process the variance of the beads will
generally decrease and the number
of
beads increase
as
the
model gets closer to the data and begins to explain its finer
structure. The fitting technique resembles the elastic net algo-
rithm of Durbin and Willshaw
1391
except that our elastic
energy function is much more complex and we are also fit-
ting an affine transformation.
In
early experiments, we used a conjugate gradient method
to optimize
Etot.
Unfortunately this method is slow because
each conjugate gradient step may require a few evaluations of
E,,,
each of which is of the order of
E
x
N
operations. Our pre-
ferred method is based upon the Expectation Maximization
(EM)
algorithm
[40].
This
involves the repeated application of
a
two
step procedure which will not increase
Etot
as
a
is
adjusted at each application.
During the expectation
(E)
step, the beads are frozen at
their current locations and the responsibility that each bead
has for each inked pixel is computed. This is just the prob-
ability of generating the pixel under the Gaussian distribu-
tion for the bead normalized by the total probability of gen-
erating the pixel
(rkb
=
%%).
Because the negative log
P!i
likelihood (energy) under a Gaussian distribution is quad-
ratic in the distance from the mean, it is sometimes con-
venient
to
think of minimizing
Eiot
as analogous to finding
the minimum energy configuration of a system of springs.
For
a
fixed bead variance, consider
a
system in which each
fixed pixel is attached to mobile beads by springs whose
stiffkess is proportional to the responsibility of the bead for
the pixel.
As
we show shortly the
EM
method finds the
minimum energy configuration of the system
of
springs.
In
the second
(M)
step, the responsibilities are fixed, and
new values of
a
computed to minimize
Etof.
In the conven-
tional application of
EM,
the beads would be uncon-
strained, and hence
a
bead would move to the center of
gravity
of
the data (pixels), weighted by the responsibilities
that the bead has for each pixel:
‘k ‘kb
wk
s‘(b)
=
xk
‘kb
wk
However in
our
system the beads are
constrained
to lie on
the spline defined by the control points; the free variables are
really the control point locations and the affine parameters.
Directly minimizing
Etot
results in a set of non-linear equations.
We circumvent the expensive step of solving a set of non-
linear equations using
a
two
stage procedure.
In
the first stage,
the affine transformation
Y
is held constant.*
This
means we
can do the minimization exclusively in the
image
frame. To do
this we first need
to
define the deformation energy there.
This
involves mapping the control point covariance matrix,
C
,
through the affine (see below). Setting
dE,’,,
/
dx’
=
0
and
with the help of
(6),
(l),
(3),
(lo),
and
(11)
we update the con-
trol point locations by solving the set of linear equations:
0
BX’
=
d
with
Rb
=
&
wkrkb.
In
(12),
we have used the shorthand;
yr
to denote
y:
(yi)
and
s’
the
x
(y)
component of
s’
for
m
odd
(even).
In
the spring system analogy,
this
stage corresponds to
finding the
minimum
energy equilibrium point where the
forces pulling the beads towards the nearby pixels are balanced
by the forces pulling the beads towards their home locations.’
8.
This is an example of Expectation/Conditional Maximization
[41].
9.
More precisely, the pixel forces on the beads can be transferred onto
the control points and at equilibrium there
is
a balance between these forces
and those pulling the control points towards their
home
locations.
Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

Citations
More filters
Journal ArticleDOI

Phd by thesis

TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Journal ArticleDOI

Training products of experts by minimizing contrastive divergence

TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.
Journal ArticleDOI

Point Set Registration: Coherent Point Drift

TL;DR: A probabilistic method, called the Coherent Point Drift (CPD) algorithm, is introduced for both rigid and nonrigid point set registration and a fast algorithm is introduced that reduces the method computation complexity to linear.
Journal ArticleDOI

Human-level concept learning through probabilistic program induction.

TL;DR: A computational model is described that learns in a similar fashion and does so better than current deep learning algorithms and can generate new letters of the alphabet that look “right” as judged by Turing-like tests of the model's output in comparison to what real humans produce.
Journal ArticleDOI

Machine-Learning Research

Thomas G. Dietterich
- 15 Dec 1997 - 
TL;DR: This article summarizes four directions of machine-learning research, the improvement of classification accuracy by learning ensembles of classifiers, methods for scaling up supervised learning algorithms, reinforcement learning, and the learning of complex stochastic models.
References
More filters
Journal ArticleDOI

Snakes : Active Contour Models

TL;DR: This work uses snakes for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest, and uses scale-space continuation to enlarge the capture region surrounding a feature.
Journal ArticleDOI

Phd by thesis

TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Book

Computer Graphics: Principles and Practice

TL;DR: This chapter discusses the development of Hardware and Software for Computer Graphics, and the design methodology of User-Computer Dialogues, which led to the creation of the Simple Raster Graphics Package.
Journal ArticleDOI

Bayesian interpolation

TL;DR: The Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data by examining the posterior probability distribution of regularizing constants and noise levels.