Using generative models for handwritten digit recognition

doi:10.1109/34.506410

592

IEEE

TRANSACTIONS

ON

PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL.

18,

NO.

6,

JUNE

1996

nerative

Models

for

andwritten

Digit

Recognition

Michael

Revow,

Christopher

K.I.

Williams, and Geoffrey

E.

Hinton

Abstract-We describe a method

of

recognizing handwritten digits

by

fitting

generative models that are

built

from

deformable

B-

splines

with

Gaussian

"ink

generators" spaced along the length

of

the spline. The splines are adjusted using a novel elastic

matching procedure based

on

the Expectation Maximization

(EM)

algorithm

that

maximizes the likelihood

of

the model generating

the data. This approach has many advantages.

1)

After identifying the model most likely

to

have generated the data, the system

not

only

produces a classification

of

the

digit

but

also a

rich

description

of

the instantiation parameters which can yield information such

as the

writing

style.

2)

During

the process

of

explaining

the image, generative models can perform recognition driven segmentation.

3)

The method involves a relatively small number

of

parameters and

hence

training is relatively easy and

fast.

4)

Unlike many other

recognition schemes,

it

does

not

rely

on

some

form

of

pre-normalization

of

input

images,

but

can handle arbitrary scalings,

translations and

a

limited degree

of

image rotation. We have demonstrated

our

method

of

fitting

models

to

images does

not

get

trapped

in

poor

local minima. The main disadvantage

of

the method is

it

requires much more computation than more standard

OCR

techniques.

Index

Terms-Deformable model, elastic net,

optical

character recognition, generative model, probabilistic model, mixture model

+

1

INTRODUCTION

HE

conventional statistical approach to performing clas-

T

sification is to use a discriminant classifier that con-

structs boundaries which discriminate between objects of

different categories. An alternative approach is to use gen-

erative models. This paper explores the use of generative

models for recognizing handwritten digits. In the simplest

version there is one model for each digit. Given an image of

an unidentified digit the idea is to search for the model that

is most likely to have generated that image. This approach

has the attractive property that, in addition to providing a

label, it can also say something about the particular way in

which the digit is instantiated.

So,

in some sense, it explains

the image rather than just labeling it. This is important

when the recognizer forms part of a larger computer vision

system since there may be interest in more than just the

labels.

For

example, given a roughly segmented image of a

single digit we may want to know which parts

of

the image

represent the digit and which parts are caused by noise

or

by some incorrectly segmented neighboring digit. We may

also want to know the pose of the digit (i.e., its position,

size, orientation, shear, and elongation)

so

that we can

check for consistency with its neighbors.

e M.

Revow

and

G.E.

Hinton

are

with the Department of Computer Science,

University

of

Toronto,

6

Kings College Road, Toronto, Ont.,

M5S

3H5

Canada.

E-mail:

irezjow;

hintoniacs.

toronto.edu.

0

C.K.I.

Williams

is

with

the Department

of

Coinputer Science and Applied

Mathematics, Aston University, Birmingham

B4

7ET,

UK.

E-mail:c.k.i.wi22iams~aston.ac.edu.

Manuscript received Aug.

31,

1994;

revised Apr.

22,1996.

Recommended

fo?

acceptance by R. Kasturi.

For

infovmation

on

obfaining repvints

of

this

article,

please

send

e-mail to:

tvanspami~computer.orX, and

yeference IEEECS

Log

Number

P96047.

We chose unconstrained handwritten digit recognition

because it is a task of great practical importance for which

there are standard databases that allow different

ap-

proaches to be compared. It also has the attractive property

that there are only ten different classes

so

it is feasible to

explore all ten different ways of generating each unidenti-

fied digit image. Although handwritten digit recognition is

an easier task than general three-dimensional object recog-

nition, it retains, albeit in reduced form, many of the prob-

lems associated with general computer vision such

as

vari-

ability in shape and pose, overlapping objects and both

structured and unstructured noise.

The paper is organized as follows: Following a brief re-

view of some past approaches to optical character recogni-

tion, we discuss elastic models which have been used for at

least two decades to deal with signal and image variability.

In Section

3,

we introduce our basic elastic model for

handwritten digits. We use the probabilistic interpretation

of

elastic models introduced in an analysis of the elastic net

algorithm

[l].

In Section

4,

we

show

how the underlying

parameters of the models may be learned. Section

5

dis-

cusses refinements of the basic ideas. Section

6

describes the

performance of our system on a realistic database of hand-

written digits. The final

two

sections discuss some implica-

tions of the approach and present conclusions.

2

REVIEW

OF

PAST

WORK

We will not attempt to review here the voluminous

work

on optical character recognition that has spanned more

than three decades (useful reviews can be found in

[Z],

[3]).

However, it is helpful to summarize the trends. Most re-

searchers have adopted the classical pattern recognition

approach in which image pre-processing is followed by

01

62-8828/96$05

00

01

996

IEEE

Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

REVOW ET AL.: USING GENERATIVE MODELS FOR HANDWRITTEN DIGIT RECOGNITION

593

feature extraction and classification. There have been many

variations, but these may be roughly described using

two

dimensions: statistical/structural and global/local.'

As

an

example of

a

global, statistical approach, [4] extracts eight

central and two raw moments

as

features. On the other

hand the recognizer used by Lam and Suen

[5]

uses local

features. They extract local geometric primitives consisting

of line segments and convex polygons and use these as in-

put to

a

structural classifier. Others extract topological fea-

tures which depend on the global properties of the data.

For example, Shridhar and Badreldin [6] use features de-

rived from the character profiles in the image. They then

feed these features into

a

tree classifier. More recently there

have been

a

number

[7],

[SI,

[9]

of successful attempts to

automatically learn appropriate local features using feed

forward neural networks. Some researchers

[lo],

[3],

[11]

have boosted performance using combinations of classifiers.

Significant progress has been made in

OCR.

On

a

stan-

dard database of lightly constrained pre-segmented hand-

written digits the very best systems achieve error rates of

about 1.5% with

no

rejections [12]. But more work is re-

quired to match human performance, especially on unseg-

mented strings of digits. We hypothesize that in order to

achieve human performance without astronomically large

training sets, recognizers must embed some form of prior

knowledge about the objects they expect to find in images.

This is common in structural systems but rarer in statistical

systems. There have been some statistical systems that al-

low for typical digit transformations [13], but discriminant

classifiers generally do not address the issue of explicitly

"explaining the data". This leads to a number of weak-

nesses that may limit the achievable performance:

1)

Conventionally,

a

recognizer does not help to guide

segmentation by dividing the image into significant

and irrelevant parts.

So

a

system typically [14] tries

many candidate segmentations and all the recognizer

can indicate is whether

a

particular segmentation

leads to confident recognition. In general, this type of

hypothesize-and-test search procedure is much less

efficient than

a

procedure that can use information

from the recognition to refine the segmentation hy-

pothesis.

2)

Statistical recognizers can occasionally confidently

classify images that do not look anything like

a

char-

acter [15]. This can be ameliorated by training the

system to reject junk images [16], but it is hard to get a

good sample of rare types of junk.

3)

Systems that do not incorporate any prior knowledge

about the shapes of characters must learn all their

knowledge from the training examples. We already

know that digits are composed of one-dimensional

strokes and

so

it seems wasteful to use up training

data to learn this.

4)

A

recognizer that "understands" an image should be

able to not only label it with the correct class, but

should also be able to return the instantiation pa-

l.

These are broad terms applied to the object recognizer

as

a complete

entity. Obviously, feature selection and classifier

design

may be independ-

ently described.

rameters such

as

the position, size, orientation, shear

and elongation. For handwritten digits we may also

want information on the writing style since this is oc-

casionally crucial in disambiguating other digits in

the same string.

Motivated by the success of model-based shape recogni-

tion in overcoming some of these shortcomings

[17],

we

have investigated the use of deformable elastic models for

handwritten digit recognition [181. Models of this general

type have been used in computer vision since the early

1970s.

Ullmann [19] discusses the idea of finding

a

distor-

tion mapping from

a

test image to

a

stored template such

that there is

correspondence

between like features rather than

exact matches. Widrow

[20]

also suggests the idea of using

rubber templates to achieve fuzzy matches to a variety of

natural objects and waveforms.

Burr presents an iterative framework for computing

elastic matches in dot and grey-scale images

[21]

and line

drawings [22]. Using

a

coarse-to-fine matching strategy he

shows how an image can be progressively deformed under

the influence of misalignment force fields to fit another im-

age.

In

a

later version 1231, global size and rotation adjust-

ments were included. The method has been adapted to

match tomographic [24] and thermographic images [25].

One weakness with the approach is that it does not allow

the amount of deformation to be traded off against the fi-

delity of the data match. It also

has

no principled way of

handling noise or missing data.

Bajcsy and co-workers

[26],

[27[

integrate the notion of

a

trade-off between data fit and d(2formation in their mul-

tiresolution elastic matching scheme for registering an im-

age with respect to

a

template. They consider

a

test image

to be drawn on an elastic membrane. The membrane is

subjected to external forces which are proportional

to

the

gradient of the similarity measure. The system iterates until

an equilibrium exists between the forces trying to increase

the similarity measure

(a

measur'e of cross-correlation be-

tween the two images) and the restraining forces arising

from the elastic properties of the membrane. The mul-

tiresolution approach is attractive

as

it initially concentrates

on achieving large-scale registration between the images

with fine-scale matching coming later in the process.

Early work by Fischler and Elschlager [28] described

a

model with local (data fit) and global (model deformation)

energy terms. Their model is composed of (rigid) features

whose spatial arrangement is constrained by springs and

hence the deformation is related

Ito

the energy required to

stretch or compress these spring Their matching proce-

dure works on

a

coarse scale, but it is scale dependent and

degrades in the presence of noise [29]. The facial feature

model example they used has bee:n extended by Yuille [29],

who constructs

a

more detailed descriptions of the feature

shapes and global matching criteria in terms of peak, valley

and edge intensities. In addition, the original dynamic pro-

gramming search was replaced .with

a

gradient method.

From an image explanation point of view this type of

matching scheme is deficient as it does not account for the

entire image. Instead

of

ensuring that every part of the

im-

age

is explained by the model (or explicitly attributed to

some additional noise process) the matching process tries to

Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

594 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL.

18,

NO. 6, JUNE 1996

ensure that every part of the

model

is supported by some

part of the image and a match may be good even though it

leaves large parts of the image unaccounted for. "Snakes"

1301

use different shape constraints, but also attempt to

match each part of the model to some part of the image

rather than vice versa. Point distribution models

[31],

rec-

ognize the importance of doing both types of matching, i.e.,

the model must be

supported

by the data and the model

should

explain

the data.

The digit models we propose have a sound generative

probabilistic basis and explicitly incorporate much prior

knowledge of handwritten digits, for example, that they are

made up of strokes and that they are globally invariant to

affine transformations, unlike other implementations

[

131

which attempt to achieve only local invariance.

ATCHING

ELASTIC

SPLINE

MODELS

TO IMAGES

3.1

Qverview

Each of the ten digits has its own elastic model.*

A

digit-

image is recognized by choosing the elastic model which

best matches the image. During the matching process, the

model is deformed in an attempt to ensure that every piece

of ink in the image is close to some part of the model. The

fidelity of the final match depends on the amount of de-

formation of the model, the amount of ink that is attributed

to noise, and the distance of the remaining ink from the

deformed model.

Unlike the approach taken in many

OCR

systems, we do

not pre-process images in order to remove the effects

of

translation, scale, rotations, shear, etc. Instead we handle

arbitrary global affine transformations of the image by de-

fining the model in an "object-based frame which

is

mapped through an affine transformation into the "image-

frame." The affine transformation is refined during the

matching process

so

that knowledge about the shape of the

digit can influence the choice of affine transformation. This

is not possible if normalization precedes recognition. Affine

transformations are not penalized during the matching

process,

so

deformations are only used to handle true

variations in shape that cannot be accommodated by global

affine transformations.

Similarly, we do not assume that the image has been per-

fectly segmented. The matching process decides which

pieces of the model correspond to which pieces of the im-

age and

it

can explicitly reject some parts of the image as

noise. Thus knowledge about the shape can be used to re-

fine the segmentation.

What we have just described is an instantiation of the

general framework of

generative

or

latent variable

models

[32].

The key idea is that the manifest variables are attrib-

utable to a smaller number of underlying

hidden

or

latent

variables. In our case, the manifest variables are the pixels

and the hidden variables are the positions of the elastic

model's control points in the image-frame. Section

3.2

describes the elastic model. Section 3.3 gives the under-

lying probabilistic interpretation of how the model gener-

ates an image from the hidden variables (required for com-

2.

C-code implementing the model is available

from

http:/

/www.cs.toronto.edu/-revow.

puting the log-likelihood

of

the image given a model in-

stantiation). An algorithm for the more difficult problem

of inferring the hidden variables from the manifest vari-

ables is presented in Section

3.4.

3.2

Elastic

Spline

Models

We model each digit with a uniform, cubic B-spline

[33].

Each model has at most

8

control

point^.^

Let

X

=

(xl,

x,,

.

.,

x,}

=

{x,,

x2,

. . .,

x2,-,,

xZn}

denote an instantiation of the model

in

terms

of

its

n

control points. The ith control point is lo-

cated at

(x*~-,,

xZi).

Similarly,

H

=

{

h,,

...,

h,}

indicates the

home

or undeformed control point locations and

Y

the affine

transformation with its six degrees of freedom. The location

of any point,'

s(b),

on the spline can be written as a linear

function

[33]

of the control points locations.'

Because of the local control feature

of

B-splines some of

the coefficients,

yl(b),

will be zero. For future convenience,

we

also

write

(1)

as:

To generate an ideal example of a digit we put the con-

trol points at their home locations. To deform the digit we

move the control points away from their home locations.

Assuming a Gaussian distribution for these deformations,

the probability of the

n

control points lying within a small

hypervolume

6

Vis approximately:

where

C

is the covariance matrix of the distribution. Thus, a

single deformable model defines an entire probability dis-

tribution across shape instances.

Following

[l]

and [34] we define the

deformation energy,

Ed+

to be the negative log probability of the deformation.

1

(3)

E~,(x)

=

(x

-

H)~

c-'(x

-

H)

+

ioglCl+ const

Splines are a convenient method for modeling hand-

written digits as it is easy to incorporate topological

variations. For example, small changes in the relative

locations of the control points can turn the loop

of

a

2

into a cusp or an open bend (Fig.

1).

This advantage of

spline models is pointed out in

[35]

where a different

kind of spline is used to fit on-line character data by di-

rectly locating candidate control points on strokes

in

the

image. It is lost

(as

pointed out in

[36])

when models

based more directly on Durbin and Willshaw's elastic net

are employed [37].

3.

The model of

a

one needs only three control points while the seven-

4.

The spline

is

a one-dimensional continuous curve parameterized by

b.

5.

We treat the first and last control points as if they are doubled.

model uses five.

In the development, we consider

a

discrete version.

Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

REVOW ET AL.: USING GENERATIVE MODELS FOR HANDWRlnEN DIGIT RECOGNITION

595

0

Either (with probability

a)

adcl a randomly positioned

noise pixel to the image-Or pick

a

bead at random and

generate an inked pixel from the Gaussian distribution de-

fined by the bead.

This is not a good generative model of the way in which

handwritten digits are actually produced. If, for example,

the beads have large variances the inked pixels in the image

0

(3

L

8

Fig.

1.

Illustrates the flexibility

of

spline models to capture topological

variations. The large loop

of

the

2

can smoothly decrease in size and

eventually become a cusp. The control points are labelled

1

through

8.

will have the correct overall shape but will be disconnected

and much too scattered. However, the generative model is

useful for recognizing digits

as

explained in the following

sections.

3.4

Fitting

a

Model to an Image

In

this section,

a

Bayesian interpretation of the fitting process

is adopted and we demonstrate how using

a

maximum

posterior framework (see, for example

[38])

yields a practi-

cal algorithm. First we need to refine the notation; A super-

script

'

or

'

is used to qualify if

a

quantity is in the object or

image frame respectively.

so

x"

represents control points

locations in the

object

frame. We can parameterize

a

model

instantiation in the image frame by

a

=

(X)

,Y.

To

classify an image,

I,

specified

as

a

vector of locations

of

its

NI

inked pixels

{zl,

...,

zNJ,

each of

m

models is fitted

to the data and the model that best "explains" the image is

chosen. Using

a

uniform prior over all digits, the posterior

probability,

P(m

I

I),

for each model is proportional to the

evidence,

P(I1

m):

0

@

3.3

Generative Models

Although we use our digit models for recognition, it is

helpful to consider how we would use them for generating

images. The generative model is an elaboration of the prob-

abilistic interpretation

of

the elastic net given in

[l].

To gen-

erate a noisy image of

a

particular digit class, run the fol-

lowing procedure:

1)

Pick

a

deformation of the model (i.e., move the con-

trol points away from their home locations) to give a

particular realization

X.

This defines the spline in ob-

ject-based coordinates. The log probability of picking

a

deformation is proportional to the quadratic term in

(3).

It is important that the deformation is measured

in object-based coordinates.

2)

Pick an affine transformation6 from the model's in-

trinsic reference frame to the image frame (i.e., pick a

size, position, orientation, slant and elongation for the

digit).

3)

Map the spline into image coordinates and place

beads uniformly along its length. Each bead is a cir-

cular Gaussian ink generator. The number of beads

and their variance

can

easily be changed without

changing the spline itself. Typically the variance is

chosen

so

that the bead centers are

two

standard de-

viations apart.

4)

Repeat many times:

6.

Using

our

prior knowledge that ones tend to be stroke-like, we used a

similarity transformation

for

the one-model.

P(Ilm)

=

j

P(Ila,

m)P(alm)da

(4)

Performing the integration over instantiation parameter

space is infeasible,

so

instead we compute the most prob-

able parameter values

(a').'

The evidence is approximated

by the height of the posterior peak

(P(IIa*,

m)

P(a

Im)),

multiplied by the volume

of

the parameter space under the

peak. The negative logarithm of the evidence is then:

(5)

where

K

is the logarithm

of

the volume term. When the

posterior is well modelled by a Gaussian, then

K

=

5

log

2z

-

*log)3fl,

with

3f=

-

V

log

P(a

I

I,

m)

the

Hessian evaluated at

as.

In the sequel we treat

K

as

a

con-

stant, but we allow it to be

a

different constant for each

model (see Section

6).

The second term is just

€dei

(3).

The

first term is the log-likelihood

o'f

the image given

a

par-

ticular instantiated model. We refer to this

as

the

data

fit

(Eft).

This leads to a convenient (objective function consist-

ing of just two energy terms:

-

log

P(1

I

m)

--

-

log

P(1

I

a*,

m)

-

log

P(a"

I

m)

-

K

If each inked pixel

zk

in the image is generated inde-

pendently from

a

distribution defined by

B

circularly sym-

metric Gaussian beads, each withL

a

mean

s(b)

and variance

o;,

and a uniform noise field, then the data fit term is the

sum of log probabilities of each inked pixel.

7.

This

is

reasonable

for

this problem because there will usually only be

one setting

of

the control points and affine transformation that will provide

a good fit.

Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

596

IEEE

TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.

18,

NO.

6,

JUNE

1996

k=l

where

P,

is the probability of inking pixel

k:

(7)

(9)

with

N

the number of pixels which the uniform noise field

is distributed over (normally the whole image) and

zn

the

mixing proportion of

a

uniform noise field. Using

(7)

to

compute

Efit

has the undesirable property that it depends

on the number of inked pixels in the image. For example,

a

simple resizing of the image, will change

Efit

whereas

Edqfi

being defined in object based frame, is invariant to scale

changes. In order to mitigate this, we allow each pixel to

have its own weight

Wk.

Thus, we compute

Efit

using:

N,

Normally we set

W,

=

A/

NI

where

A

is

a

constant.

This

has

the desired effect of ensuring that all images have the same

total weight of

ink

and therefore about the same tradeoff be-

tween

E),

and

Ed$

regardless of the number of inked pixels.

However, it is also possible that a bottom up processor could

assign different weights to pixels based on other knowledge.

We assume that the deformations and affine are independ-

ent (for example, the size of

a

digit or its location in the image

is

dkely to be correlated with its style),

so

the second term

in

(5)

can be factorized into the sum of

Ed$

and

a

term involving

the affine parameters. During the fitting procedure, we treat

the affine parameters as if they have a uniform prior. How-

ever, in Section

6,

we show how solutions with unusual affines

may be penalized after the fitting procedure is complete.

The objective in fitting a model to an image is to find the

a

which minimizes

Et,:.

We start with zero deformations and

an initial guess for the affine parameters which ensure that

the control points are mapped within an upright rectangular

box around the inked pixels in the image.

A

small number of

beads with equal, large variance are placed along the spline.

These large variance beads form

a

broad, smooth ridge

of

high ink-probability along the spline. Because

of

the high

variance, the beads are attracted to inked pixels even if they

are fairly far away

so

the spline is quickly pulled towards the

data. During the fitting process the variance of the beads will

generally decrease and the number

of

beads increase

as

the

model gets closer to the data and begins to explain its finer

structure. The fitting technique resembles the elastic net algo-

rithm of Durbin and Willshaw

1391

except that our elastic

energy function is much more complex and we are also fit-

ting an affine transformation.

In

early experiments, we used a conjugate gradient method

to optimize

Etot.

Unfortunately this method is slow because

each conjugate gradient step may require a few evaluations of

E,,,

each of which is of the order of

E

x

N

operations. Our pre-

ferred method is based upon the Expectation Maximization

(EM)

algorithm

[40].

This

involves the repeated application of

a

two

step procedure which will not increase

Etot

as

a

is

adjusted at each application.

During the expectation

(E)

step, the beads are frozen at

their current locations and the responsibility that each bead

has for each inked pixel is computed. This is just the prob-

ability of generating the pixel under the Gaussian distribu-

tion for the bead normalized by the total probability of gen-

erating the pixel

(rkb

=

%%).

Because the negative log

P!i

likelihood (energy) under a Gaussian distribution is quad-

ratic in the distance from the mean, it is sometimes con-

venient

to

think of minimizing

Eiot

as analogous to finding

the minimum energy configuration of a system of springs.

For

a

fixed bead variance, consider

a

system in which each

fixed pixel is attached to mobile beads by springs whose

stiffkess is proportional to the responsibility of the bead for

the pixel.

As

we show shortly the

EM

method finds the

minimum energy configuration of the system

of

springs.

In

the second

(M)

step, the responsibilities are fixed, and

new values of

a

computed to minimize

Etof.

In the conven-

tional application of

EM,

the beads would be uncon-

strained, and hence

a

bead would move to the center of

gravity

of

the data (pixels), weighted by the responsibilities

that the bead has for each pixel:

‘k ‘kb

wk

s‘(b)

=

xk

‘kb

wk

However in

our

system the beads are

constrained

to lie on

the spline defined by the control points; the free variables are

really the control point locations and the affine parameters.

Directly minimizing

Etot

results in a set of non-linear equations.

We circumvent the expensive step of solving a set of non-

linear equations using

a

two

stage procedure.

In

the first stage,

the affine transformation

Y

is held constant.*

This

means we

can do the minimization exclusively in the

image

frame. To do

this we first need

to

define the deformation energy there.

This

involves mapping the control point covariance matrix,

C

,

through the affine (see below). Setting

dE,’,,

/

dx’

=

0

and

with the help of

(6),

(l),

(3),

(lo),

and

(11)

we update the con-

trol point locations by solving the set of linear equations:

0

BX’

=

d

with

Rb

=

&

wkrkb.

In

(12),

we have used the shorthand;

yr

to denote

y:

(yi)

and

s’

the

x

(y)

component of

s’

for

m

odd

(even).

In

the spring system analogy,

this

stage corresponds to

finding the

minimum

energy equilibrium point where the

forces pulling the beads towards the nearby pixels are balanced

by the forces pulling the beads towards their home locations.’

8.

This is an example of Expectation/Conditional Maximization

[41].

9.

More precisely, the pixel forces on the beads can be transferred onto

the control points and at equilibrium there

is

a balance between these forces

and those pulling the control points towards their

home

locations.

Authorized licensed use limited to: ASTON UNIVERSITY. Downloaded on July 8, 2009 at 07:19 from IEEE Xplore. Restrictions apply.

Using generative models for handwritten digit recognition

Citations

Phd by thesis

Training products of experts by minimizing contrastive divergence

Point Set Registration: Coherent Point Drift

Human-level concept learning through probabilistic program induction.

Machine-Learning Research

References

Maximum likelihood from incomplete data via the EM algorithm

Snakes : Active Contour Models

Phd by thesis

Computer Graphics: Principles and Practice

Bayesian interpolation

Related Papers (5)

Spatio-temporal filter and method

Spatio-temporal pattern recognition using a spiking neural network and processing thereof on a portable and/or distributed computer

Robot apparatus, face recognition method, and face recognition apparatus

Asynchronous temporal neural processing element

Multi-dimensional method and apparatus for automated language interpretation