What is the main strength of the proposed system?

The main strength of the proposed system lies in its training phase, which does not require any manual segmentation of the data to train the character models.

What is the way to handle the two sequences?

An obvious solution to handle the two sequences is to use two independent word recognition engines and combine theiroutputs in a subsequent stage.

What is the way to solve the problem of ambiguity between dynamic lexicons?

An elegant method could be a hierarchical data representa-tion which provides more details as the feature sequence lengthgets smaller and the ambiguity between the dynamic lexiconcandidates (to be computed dynamically) gets higher.

(Open Access) An HMM-based approach for off-line unconstrained handwritten word modeling and recognition (1999) | A. El-Yacoubi

Short Papers

___________________________________________________________________________________________________

An HMM-Based Approach for Off-Line

Unconstrained Handwritten Word Modeling

and Recognition

A. El-Yacoubi, M. Gilloux,

R. Sabourin, Member, IEEE,and

C.Y. Suen, Fellow, IEEE

AbstractÐThis paper describes a hidden Markov model-based approach

designed to recognize off-line unconstrained handwritten words for large

vocabularies. After preprocessing, a word image is segmented into letters or

pseudoletters and represented by two feature sequences of equal length, each

consisting of an alternating sequence of shape-symbols and segmentation-

symbols, which are both explicitly modeled. The word model is made up of the

concatenation of appropriate letter models consisting of elementary HMMs and an

HMM-based interpolation technique is used to optimally combine the two feature

sets. Two rejection mechanisms are considered depending on whether or not the

word image is guaranteed to belong to the lexicon. Experiments carried out on

real-life data show that the proposed approach can be successfully used for

handwritten word recognition.

Index TermsÐHandwriting modeling, preprocessing, segmentation, feature

extraction, hidden Markov models, word recognition, rejection.

1INTRODUCTION

HANDWRITING is one of the easiest and most natural ways of

communication between humans and computers. However, early

investigations in automatic handwriting recognition were limited

by the memory and power of the computers available at that time

which did not permit the design of real-time systems. Thanks to

the recent progress in electronics and to the latest generation of

computers, these problems have been overcome; therefore, since

the beginning of the 1980s, there has been a dramatic increase of

research in this field. According to the way handwriting data are

generated, two classes are distinguished: If the data provided to

the system correspond to the pixels of a static image obtained with

a scanner or a CCD camera after the writing is completed, then we

are in the off-line recognition case. If the data correspond to the

sequence of pixels (defined by their coordinates) drawn by the user

on a digitized tablet and transmitted to the system during the

writing, then we are in the on-line recognition case. Off-line and

on-line systems are also distinguished by the applications they are

devoted to. The former are dedicated to bank check processing,

mail sorting, commercial forms-reading, etc., while the latter are

mainly dedicated to pen computing industry and security domains

such as signature verification and author authentication. Off-line

handwriting recognition is a more difficult task, because the

temporal information, such as the number and the order of the

strokes and the pressure, is not available as in the on-line case. On

the other hand, off-line systems can achieve huge economic

benefits even with low recognition rates, while on-line systems

must achieve high recognition rates to be used in a commercial

system. In the remainder of this paper, we shall talk about off-line

handwriting recognition.

Despite the impressive progress achieved in handwriting

recognition, the results are still far from human performance. This

is a reason why researchers have limited their studies to particular

problems and applications. In this context, isolated character

recognition can be seen as a less complicated task where

satisfactory solutions are already available. In word recognition

tasks, the application specifies the lexicon of possible words. For

small lexicons, as in bank check processing, most approaches are

global, where a word is considered as an indivisible entity [1], [2],

[3]. For large lexicons, as in postal applications [4], [5], [6], the

segmentation of words into basic units such as letters is required.

Owing to the difficulty of this operation, most successful

approaches are segmentation-recognition methods in which words

are first loosely segmented into letters or pieces of letters, and a

dynamic programming technique is used in recognition to choose the

definitive segmentation [7], [8]. Although these methods are less

robust when the segmentation process fails to split a pair of letters

(or more), they have many advantages over global ones. Indeed,

for a given learning database, it is more reliable to train a small set

of letters than whole words. Furthermore, unlike analytic

approaches, global approaches are possible only for lexicon-driven

problems and do not satisfy the portability criterion since, for each

new application, the set of the lexicon words must be trained.

During the last decade, hidden Markov models (HMMs), which

can be thought of as a generalization of dynamic programming

techniques [9], have become the predominant approach to

automatic speech recognition [9], [10], [11]. These stochastic

models have been shown to be well-adapted to summarize

variability phenomena involved in time-varying signals. The

success of HMMs in speech recognition has recently led many

researchers to apply them to handwriting recognition by repre-

senting each word image as a sequence of observations. According

to the way this representation is carried out, two approaches can be

distinguished: implicit segmentation [4], [12], which leads to a

speech-like representation of the handwritten word image, and

explicit segmentation [5], [6], which requires a segmentation

algorithm to split words into letters or pseudoletters.

In this paper, we propose an explicit segmentation-based HMM

approach to recognize unconstrained handwritten words (upper-

case, cursive and mixed). This system uses three sets of features: The

first two are related to the shape of the segmented units, while the

features of the third set describe segmentation points between

these units. The first set is based on global features, such as loops,

ascenders, and descenders, and the second set is based on features

obtained by the analysis of the bidimensional contour transition

histogram of each segment. Finally, segmentation features corre-

spond to either spaces, possibly occurring between letters or

words, or the vertical position of segmentation points that split

connected letters. Given that the two sets of shape-features are

separately extracted from the image, we represent each word by

two feature sequences of equal length, each consisting of an

alternating sequence of shape-symbols and segmentation-symbols.

In the problem we are dealing with, we consider a vocabulary

752 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999

. A. El-Yacoubi, R. Sabourin, and C.Y. Suen are with the Centre for Pattern

Recognition and Machine Intelligence, Department of Computer Science,

Concordia University, 1455 de Maisonneuve Boulevard West, Suite GM-

606, Montre

al, Canada H3G 1M8. A. El-Yacoubi is also with the

Departamento de Informatica, Pontificia Universidade Catolica do Parana,

Av. Imaculada Conceicao, 1155-Prado Velho, 80.215-901 Curitiba-PR-

Brazil. R. Sabourin is also with Ecole de Technologie Supe

rieure,

Laboratoire d'Imagerie, de Vision et d'Intelligence Artificielle (LIVIA),

1100 Notre-Dame Ouest, Montre

al, Canada H3C 1K3.

E-mail: yacoubi@ppgia.pucpr.br.

. M. Gilloux is with Service de Recherche Technique de La Poste,

partement Reconnaissance, Mode

lisation et Optimasation (RMO), 10,

rue de l'õ

le Ma

bon, 44063 Nantes Cedex 02, France.

Manuscript received 26 May 1998; revised 10 Mar. 1999.

Recommended for acceptance by J. Hull.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number 107682.

0162-8828/99/$10.00 ß 1999 IEEE

which is large but dynamically limited. For example, in city name

recognition, the contextual knowledge brought by the postal code

identity can be used to reduce the lexicon of possible city names to

a small size. Since the entire vocabulary of words is large, it is more

realistic to model basic units, such as letters, rather than whole

words. Indeed, this modeling needs only a reasonable number of

models to train (and to store). Then, each word (or word sequence)

model can be dynamically built by concatenating letter models.

This modeling is also more appropriate for available learning

databases, which often do not contain all the possible words to be

recognized. Our system also contains a mechanism to reject

unreliable decisions.

This paper is organized as follows. Section 2 describes the

fundamentals of hidden Markov models. Section 3 details the steps

of preprocessing, segmentation, and feature extraction. Section 4

deals with the application of HMMs to handwritten word

recognition in a dynamic vocabulary. Section 5 presents the

experiments performed to validate the approach. Section 6

concerns the rejection mechanism considered by our system.

Finally, Section 7 gives some concluding remarks and perspectives.

2HIDDEN MARKOV MODELS

Hidden Markov models have been applied in several areas during

the last 15 years, including speech recognition [9], [10], [11],

language modeling [13], handwriting recognition [4], [5], [6], on-

line signature verification [14], etc. A hidden Markov model is a

doubly stochastic process, with an underlying stochastic process

that is not observable (hence the word hidden), but can be

observed through another stochastic process that produces the

sequence of observations [11]. The hidden process consists of a set

of states connected to each other by transitions with probabilities,

while the observed process consists of a set of outputs or

observations, each of which may be emitted by each state

according to some probability density function (pdf). Depending

on the nature of this pdf, several HMM classes can be

distinguished. If the observations are naturally discrete or

quantized using vector quantization [15], and drawn from an

alphabet or a codebook, the HMM is said to be discrete [10], [11]. If

these observations are continuous, we are dealing with a

continuous HMM [11], [16], with a continuous pdf usually

approximated by a mixture of normal distributions. Another

family of HMMs, a compromise between discrete and continuous

HMMs, are semi-continuous HMMs [17] that mutually optimize

the vector quantized codebook and HMM parameters under a

unified probabilistic framework. Although HMMs have some

limitations such as the assumption of conditional independence of

observations given the state sequence, these limitations are behind

the well-defined theoretical foundations of HMMs and the

existence of powerful algorithms for decoding and training.

Particularly, a procedure called the Baum-Welch algorithm [11]

can iteratively and automatically adjust HMM parameters given a

training set of observation sequences. This algorithm, which is an

implementation of the EM (expectation-maximization) algorithm

[18] in the HMM case, guarantees that the model converges to a

local maximum of the probability of observation of the training set

according to the maximum likelihood estimation (MLE) criterion.

The local maximum depends on the initial HMM parameters.

In some applications, it is useful to allow transitions with no

output in order to model for instance a missing event in a given

stochastic process, e.g., the absence of an expected character in a

word due to undersegmentation or misspelling. It has been shown,

in this case, it is more convenient to produce observations by

transitions rather than by states [10]. To accommodate these

changes, we have to define an additional HMM parameter a

which stands for the probability of null transition between states i

and j, i.e., that produces no output, a

being the conventional

nonnull transition between these two states. We also define, for

discrete HMMs, for instance, b

k as the probability of observing

the symbol k given the transition between states i and j. In this

case, the stochastic constraints, for an N-state discrete HMM with

an alphabet of size M, become:

j1

a

 a

1 and

k1

k1: 1

Taking this into account, slight changes occur in the classical

Baum-Welch and Viterbi [19] algorithms for which the various

forward and backward recursions still hold.

3REPRESENTATION OF WORD IMAGES

Markovian modeling assumes that a word image is represented by

a sequence of observations. These observations should be

statistically independent once the underlying hidden state se-

quence is known. Therefore, we first preprocess each input image

to get rid of information that is not meaningful to recognition and

that may lead to dependence between observations (character

slant, etc.). Then, segmentation and feature extraction processes are

carried out to transform the image into an ordered sequence of

symbols.

3.1 Preprocessing

The goal of preprocessing is to reduce irrelevant information such

as noise and intraclass variability (e.g., character slant) that causes

high writer-sensitivity in classification, therefore increasing the

task complexity in a writer-independent recognizer. In our system,

the preprocessing stage consists of four steps [20]: baseline slant

normalization, lower case letter area (upper-baseline) normalization

when dealing with cursive words, character skew correction, and,

finally, smoothing (Fig. 1). The first two attempt to ensure a robust

extraction of our first feature set, mainly ascenders and descenders,

while the third step is required since the second feature set shows a

significant sensitivity to character slant.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999 753

Fig. 1. Preprocessing steps: (a) original image, (b) and (c) baseline slant normalization, (d) character slant normalization, (e) lower-case letter area normalization,

(f) definitive image after smoothing.

Baseline slant normalization is performed by aligning the

minima of the lower contour after having filtered those corre-

sponding to descenders. Upper-baseline normalization is similar

and consists of aligning the maxima of the upper contour after

having filtered those corresponding to ascenders or upper-case

letters. However, the transformation here is nonlinear since it must

keep the normalized lower-baseline horizontal. The ratio of the

number of filtered maxima over the total number of maxima is

used as an a priori selector of the writing style: cursive or upper-

case (in which case no normalization is done). Character skew is

estimated as the average slant of elementary segments obtained by

sampling the word image contour, without taking into account

horizontal and pseudohorizontal segments. Finally, we apply

smoothing to eliminate the noise appearing at the borders of the

word image due to the normalizations mentioned above.

3.2 Segmentation of Words into Characters

In speech recognition, the basic units correspond to phonetic

events, for instance, phonemes. As it is hard to achieve an a priori

explicit segmentation of words into those units, the techniques

employed consist of sampling the speech signal into successive

frames with a sufficiently high frequency. This representation is

suitable because such a frequency allows a slow description of the

speech signal in such a way that the different phonetic events can

more or less be separately detected using minimal supervised

learning techniques [9], [10]. When dealing with handwritten words,

the basic units are naturally the alphabet letters. The employed

segmentation techniques are numerous, but can be categorized

into either implicit or explicit methods. Implicit methods are

inspired by those considered in speech recognition and can either

work at the pixel column level [4], [12] or realize an a priori

scanning of the image with sliding windows [21]. Explicit

methods, by contrast, use some characteristic points, such as

upper (or lower) contour minima, intersection points, or spaces, to

propose possible segmentation points (SPs). Due to the bidimen-

sional character of off-line handwritten word images and to the

overlap between letters, implicit methods are less efficient here

than in speech recognition or on-line handwriting recognition.

Indeed, vertical sampling loses the sequential aspect of the strokes,

which is better represented by explicit methods. Moreover, in

implicit methods, SPs have to be learned also. Nevertheless,

implicit methods complement explicit ones and are particularly

efficient in dealing with discrete touching characters. On the other

hand, because of the ambiguity encountered in handwritten

words, it is impossible to correctly segment a word into characters

without resorting to the recognition phase. Indeed, the same pixel

representation may have several interpretations, according to

context. In Fig. 2, for instance, the group of letters at the right of

e could be interpretedÐin the absence of contextÐby many

combinations of letters i, m, n, r, u, v,orw.

We choose an explicit segmentation algorithm that deliberately

proposes a high number of SPs, offering in this way several

segmentation options, the best one to be validated during

recognition. This algorithm is based on the following two

hypotheses: 1) There exists natural SPs corresponding to discon-

nected letters; 2) the physical SPs between connected letters are

located at the neighborhood of the image upper contour minima.

To segment a word, we make use of the upper and lower contours,

loops, and upper contour minima. Then, each minimum satisfying

some empirical rules gives rise to an SP. Mainly, we look in the

neighborhood of this minimum for the upper contour point that

permits a vertical transition from the upper contour to the lower

one without crossing any loop, while minimizing the vertical

transition histogram of the word image. If the crossing of a loop is

unavoidable, no SP is produced. This strategy may produce

correctly segmented, undersegmented, or oversegmented letters,

as shown in Fig. 3, for example.

3.3 Feature Extraction

The goal of the feature extraction phase is to extract, in an ordered

way (suitable to Markovian modeling), a set of relevant features

that reduce redundancy in the word image while preserving the

discriminative information for recognition. Our main philosophy

in this step is that, unlike isolated character recognition, lexicon-

driven word recognition approaches do not require features to be

very discriminative at the character or pseudo character level

because other information, such as context (particular letter

ordering in lexicon words), word length, etc., are available and

permit high discrimination of words. Thus, we consider features at

the segment level with the aim of clustering letters into classes. In

our system, the sequence of segments obtained by the segmenta-

tion process is transformed into a sequence of symbols by

considering two sets of features.

The first feature set is based on global features, namely loops,

ascenders, and descenders. Ascenders (descenders) are encoded in

two ways according to their relative size compared to the height of

the upper (lower) writing zone. Loops are encoded in various ways

according to their membership in each of the three writing zones

and their relative size compared to the sizes of these zones. The

horizontal order of the median loop and the ascender (or

descender) within a segment are also taken into account to ensure

a better discrimination between letters such as b and d or p and q.

Each combination of these features within a segment is encoded by

a distinct symbol, leading in this way to an alphabet of 27 symbols.

For example, in Fig. 3, the first segment is encoded by the symbol

L, reflecting the existence of a large ascender and a loop located

above the core region. The second segment is encoded by the

symbol o, indicating the presence of a small loop within the body

of the writing. The third segment is represented by the symbol -,

which encodes shapes without any interesting feature, etc.

The second feature set is based on the analysis of the

bidimensional contour transition histogram of each segment in

the horizontal and vertical directions. After a filtering phase

consisting of averaging each column (row) histogram value over a

five pixels-wide window centered in this column (row) and

rounding the result, the histogram values may be equal to 2, 4, or 6.

In each histogram, we focus only on the median part, representing

the stable area of the segment, and we determine the dominant

transition number defined as the value k (2, 4, or 6) for which the

number of columns (rows) with a histogram value equal to k is

754 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999

1. Henceforth, alphanumeric characters will be designated by courier

format and feature symbols with italic style.

Fig. 2. Ambiguity in handwritten words, here the French word Chemin.

Fig. 3. Segmentation of words into letters or pseudoletters.

maximum. Each different pair of dominant transition numbers is

then encoded by a different symbol or class. After having created

some further subclasses by a finer analysis of the segments, this

coding leads to a set of 14 symbols. For instance, in Fig. 4, Letters B,

C, and O, for which the pairs of dominant transition numbers are

(6, 2), (4, 2), and (4, 4), are encoded by symbols called B, C, and O,

respectively, but this, of course, is not always the case.

We also use five segmentation features that try to reflect the

way segments are linked together. For connected segments, two

configurations are distinguished: If the space width is less than a

third of the average segment width (ASW), we consider that there

is no space, and encode this configuration by the symbol n.

Otherwise, we validate the space and encode it in two ways,

depending on whether the space width is smaller (symbol @) or

larger (symbol #) than ASW. If the two segments are connected, the

considered feature is the segmentation point vertical position

which is encoded in two ways (symbols s or u) depending on

whether the segmentation point is close to or far from the writing

baseline. Finally, given an input word image, the output of the

feature extraction process is a pair of symbolic descriptions of

equal length, each consisting of an alternating sequence of segment

shape symbols and associated segmentation point symbols (Fig. 5).

4MARKOVIAN MODELING OF HANDWRITTEN WORDS

This section presents the application of HMMs in handwritten

word recognition. After briefly describing some related works in

this field, we give the justifications behind the design of the model

we propose and we detail the steps of learning and recognition as

used in our system.

4.1 Use of HMMs in Handwritten Word Recognition

Recently, HMMs have been applied to several areas in hand-

writing recognition, including character recognition [22], on-line

word recognition [23], [24] and off-line word recognition. In the

latter application, Gillies [4] was one of the first to use an implicit

segmentation-based HMM for cursive word recognition. First, a

label is given to each pixel in the image according to its

membership in strokes, holes, and concavities. Then, the image is

transformed into a sequence of symbols which result from the

vector quantization of each pixel column. Each letter is character-

ized using a different discrete HMM, the parameters of which are

estimated on hand-segmented data. The Viterbi algorithm is used

in recognition and allows an implicit segmentation of words into

letters as a by-product of the word matching process. Magdi and

Gader [12] use a similar technique in which the observations are

based on the location of black-white and white-black transitions on

each image column and a 12-state left-to-right HMM is designed

for each character. Cho et al. [21] also use an implicit segmentation

technique in which a cursive word image is first split into a

sequence of overlapping vertical gray-scale bitmap frames, which

are then encoded into discrete symbols using principal component

analysis and vector quantization. A word is modeled as an

interconnection network of character and ligature HMMs. To

improve the recognition strategy, several combinations of Forward

and Backward Viterbi procedures were investigated. Chen et al. [6]

use an explicit segmentation-based continuous density variable

duration HMM where the observations are based on geometrical

and topological features, pixel distributions, etc. Each letter is

identified with a state which can account for up to four segments

per letter. The parameters of the HMM are estimated using the

lexicon and the manually labeled training data. A modified Viterbi

algorithm is applied to provide several outputs, which are

postprocessed using a general string editing method. Finally,

Bunke et al. [25] propose an HMM approach to recognize cursive

words produced by cooperative writers. The features used in their

scheme are based on the edges of the skeleton graph of a word. A

semicontinuous HMM (the Isadora system [26]) is considered for

each character and the number of Gaussians was defined by

manual inspection of the data set. Recognition is performed using

a beam search-driven Viterbi algorithm.

4.2 The Proposed Model

As shown above, several HMM architectures can be considered for

handwritten word recognition. This stems from the fact that

handwriting is certainly not a Markovian process and, even if it

were so, the correct HMM architecture is actually not known. The

usual solution to overcome this problem is to first make structural

assumptions and then use parameter estimation to improve the

probability of generating the training data by the models. In our

case, the assumptions to be made are related to the behavior of the

segmentation process. As our segmentation process may produce

either a correct segmentation of a letter, a letter omission, or an

oversegmentation of a letter into two or three segments, we built

an eight-state HMM having three paths to take into account these

configurations (Fig. 6). In this model, observations are emitted

along transitions. Transition t

, emitting the null symbol ,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999 755

Fig. 4. Transition histograms of segmented shapes.

Fig. 5. Pair of feature sequences representing a word (word sequence) image.

models the letter omission case. Transition t

emits a symbol

encoding a correctly segmented letter shape, while transition t

emits a symbol encoding the nature of the segmentation point

associated with this shape. Null transition t

models the case of

oversegmentation into only two segments. Transitions t

, t

, and

are associated with the shapes of the first, second, and third

parts of an oversegmented letter, while t

and t

model the nature

of the segmentation points that gave rise to this oversegmentation.

Note that the rare occurrence of splitting a letter into three pieces

makes the associated parameters likely not to be reliably estimated.

The solution to this problem is to share the transitions involved in

this phenomenon (t

, t

) over all character models, by

calling for the tied states principle. Nevertheless, this procedure is

not carried out for letters M, W, m,orw for which the probability of

segmentation into three pieces is high and, therefore, there are

enough examples to separately train the third segment parameters

for each of these letters. Finally, a refinement of the character

model consisted of considering context-dependent models for

upper-case letters depending on their position in the word: first

position, whether in an upper-case or cursive word, or any

different position in an upper-case word. The motivation behind

this is that features extracted from these two categories of letters

can be very different since they are based on global features, such

as ascenders, which strongly depend on the writing style.

Our model architecture is somewhat similar to that of other

approaches, such as [2], [27], but with some differences. Here, the

first segment presented to a character model is produced by two

different transitions depending on whether it corresponds to the

entire shape of a correctly segmented character (t

) or to the first

part of an oversegmented character (t

), while in [2], [27], for

example, the same transition is shared between these two

configurations. Our architecture allows the transitions in the

model to be fed by homogeneous data sources, leading to less

variability and higher accuracy (e.g., the first part of an over-

segmented d and a correctly segmented d,whicharevery

different, would be presented to different kinds of transitions, t

and t

, respectively). In other words, the variability coming from

the inhomogeneity in the source data, since it is known a priori, is

eliminated by separate modeling of the two data sources. In

addition, we have a special model for interword space, in the case

where the input image contains more than one word. This model

simply consists of two states linked by two transitions, modeling a

space (in which case only the symbols corresponding to spaces @

or # can be emitted) or no space between a pair of words (Fig. 7).

4.3 The Learning Phase

Since the exact orthographic transcription (labeling) of each

training word image is available, the word model is made up of

the concatenation of the appropriate letter models, the final state of

an HMM becoming the initial state of the next one, and so on

(Fig. 8).

Note that, here, we use an embedded Baum-Welch training

algorithm for which the segments produced by the segmentation

algorithm need not be manually labeled. This is an important

consideration for two reasons: First, manually segmenting a

database is a very expensive process and is therefore not desirable;

second, assuming we have a sufficient learning database,

embedded Baum-Welch training allows the recognizer to capture

contextual effects and permits the segmentation of the feature

sequence into letters and the reestimation of the associated

transitions so as to optimize the likelihood of the training database.

Thus, the recognizer decides for itself what the optimal segmenta-

tion might be, rather than being heavily constrained by a priori

knowledge based on human intervention [9]. This is particularly

true if we bear in mind the inherent incorrect assumptions made

about the HMM structure. From an implementation point of view,

given a word composed of L letters, a new parameter correspond-

ing to the index of the currently processed letter is added to the

probabilities involved in the Baum-Welch algorithm. Then, the

results of the final forward (initial backward) probabilities at the

last (initial) state of the elementary HMM associated with a letter

are moved forward (backward) to become the initial forward (final

backward) probabilities at the initial (last) state of the elementary

HMM associated with the following (previous) letter. If 

i (

i)

denotes the forward (backward) probability associated with the

letter of index l, then this process is carried out according to the

following equations:



l1

0

N ÿ 1 l  0; ...;Lÿ 2 t  0; 1; ...;T ÿ 1 2



l1

0

N ÿ 1 l  0; ...;Lÿ 2 t  0; 1; ...;T ÿ 1; 3

where 0 and N ÿ 1 are the initial and final states of elementary

letter HMMs and t is time index. In addition to the learning set, we

use a validation set on which the reestimated model is tested after

each training iteration. The training stops when the likelihood of

the training set becomes sufficiently small or, more formally, when

the following inequality becomes true:







ÿ P

ÿ1



 P

ÿ1

<": 4

Here, P



is the likelihood of the training set at iteration , 

is the

normalized increase of P



and " is a sufficiently small threshold,

typically 10

ÿ3

, 10

ÿ4

, etc. Once the training phase is over, the stored

optimal model parameters are those corresponding to the iteration

maximizing the likelihood of the validation set (and not the last

iteration). This strategy allows the model to acquire a better

generalization over unknown samples.

4.4 The Recognition Phase

The recognition process consists of determining the word

maximizing the a posteriori probability that a word w has generated

an unknown observation sequence O,

Pr

wjOmax

PrwjO: 5

Applying Bayes' rule, we obtain the fundamental equation of

pattern recognition,

PrwjO

PrOjw Prw

PrO

: 6

756 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999

Fig. 6. The character model.

Fig. 7. The interword space model.

An HMM-based approach for off-line unconstrained handwritten word modeling and recognition

Figures

Citations

A Novel Connectionist System for Unconstrained Handwriting Recognition

Indian script character recognition: a survey

Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models

Landmine detection with ground penetrating radar using hidden Markov models

A survey on off-line Cursive Word recognition

References

A tutorial on hidden Markov models and selected applications in speech recognition

The viterbi algorithm

Vector quantization

The expectation-maximization algorithm

A Maximum Likelihood Approach to Continuous Speech Recognition

Related Papers (5)

A tutorial on hidden Markov models and selected applications in speech recognition

Online and off-line handwriting recognition: a comprehensive survey

Offline recognition of unconstrained handwritten texts using HMMs and statistical language models

Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition systems

Off-line cursive script word recognition

Frequently Asked Questions (10)

Q1. What contributions have the authors mentioned in the paper "An hmm-based approach for off-line unconstrained handwritten word modeling and recognition" ?

Q2. What is the goal of the feature extraction phase?

Q3. What is the usual solution to overcome this problem?

Q4. What is the way to model the vocabulary of words?

Q5. What is the main strength of the proposed system?

Q6. What is the a priori selector of the writing style?

Q7. What is the way to handle the two sequences?

Q8. What are the limitations of hidden Markov models?

Q9. What is the definition of character skew?

Q10. What is the way to solve the problem of ambiguity between dynamic lexicons?