scispace - formally typeset
Open AccessJournal ArticleDOI

An HMM-based approach for off-line unconstrained handwritten word modeling and recognition

TLDR
A hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies and can be successfully used for handwritten word recognition.
Abstract
Describes a hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies. After preprocessing, a word image is segmented into letters or pseudoletters and represented by two feature sequences of equal length, each consisting of an alternating sequence of shape-symbols and segmentation-symbols, which are both explicitly modeled. The word model is made up of the concatenation of appropriate letter models consisting of elementary HMMs and an HMM-based interpolation technique is used to optimally combine the two feature sets. Two rejection mechanisms are considered depending on whether or not the word image is guaranteed to belong to the lexicon. Experiments carried out on real-life data show that the proposed approach can be successfully used for handwritten word recognition.

read more

Content maybe subject to copyright    Report

Short Papers
___________________________________________________________________________________________________
An HMM-Based Approach for Off-Line
Unconstrained Handwritten Word Modeling
and Recognition
A. El-Yacoubi, M. Gilloux,
R. Sabourin, Member, IEEE,and
C.Y. Suen, Fellow, IEEE
AbstractÐThis paper describes a hidden Markov model-based approach
designed to recognize off-line unconstrained handwritten words for large
vocabularies. After preprocessing, a word image is segmented into letters or
pseudoletters and represented by two feature sequences of equal length, each
consisting of an alternating sequence of shape-symbols and segmentation-
symbols, which are both explicitly modeled. The word model is made up of the
concatenation of appropriate letter models consisting of elementary HMMs and an
HMM-based interpolation technique is used to optimally combine the two feature
sets. Two rejection mechanisms are considered depending on whether or not the
word image is guaranteed to belong to the lexicon. Experiments carried out on
real-life data show that the proposed approach can be successfully used for
handwritten word recognition.
Index TermsÐHandwriting modeling, preprocessing, segmentation, feature
extraction, hidden Markov models, word recognition, rejection.
æ
1INTRODUCTION
HANDWRITING is one of the easiest and most natural ways of
communication between humans and computers. However, early
investigations in automatic handwriting recognition were limited
by the memory and power of the computers available at that time
which did not permit the design of real-time systems. Thanks to
the recent progress in electronics and to the latest generation of
computers, these problems have been overcome; therefore, since
the beginning of the 1980s, there has been a dramatic increase of
research in this field. According to the way handwriting data are
generated, two classes are distinguished: If the data provided to
the system correspond to the pixels of a static image obtained with
a scanner or a CCD camera after the writing is completed, then we
are in the off-line recognition case. If the data correspond to the
sequence of pixels (defined by their coordinates) drawn by the user
on a digitized tablet and transmitted to the system during the
writing, then we are in the on-line recognition case. Off-line and
on-line systems are also distinguished by the applications they are
devoted to. The former are dedicated to bank check processing,
mail sorting, commercial forms-reading, etc., while the latter are
mainly dedicated to pen computing industry and security domains
such as signature verification and author authentication. Off-line
handwriting recognition is a more difficult task, because the
temporal information, such as the number and the order of the
strokes and the pressure, is not available as in the on-line case. On
the other hand, off-line systems can achieve huge economic
benefits even with low recognition rates, while on-line systems
must achieve high recognition rates to be used in a commercial
system. In the remainder of this paper, we shall talk about off-line
handwriting recognition.
Despite the impressive progress achieved in handwriting
recognition, the results are still far from human performance. This
is a reason why researchers have limited their studies to particular
problems and applications. In this context, isolated character
recognition can be seen as a less complicated task where
satisfactory solutions are already available. In word recognition
tasks, the application specifies the lexicon of possible words. For
small lexicons, as in bank check processing, most approaches are
global, where a word is considered as an indivisible entity [1], [2],
[3]. For large lexicons, as in postal applications [4], [5], [6], the
segmentation of words into basic units such as letters is required.
Owing to the difficulty of this operation, most successful
approaches are segmentation-recognition methods in which words
are first loosely segmented into letters or pieces of letters, and a
dynamic programming technique is used in recognition to choose the
definitive segmentation [7], [8]. Although these methods are less
robust when the segmentation process fails to split a pair of letters
(or more), they have many advantages over global ones. Indeed,
for a given learning database, it is more reliable to train a small set
of letters than whole words. Furthermore, unlike analytic
approaches, global approaches are possible only for lexicon-driven
problems and do not satisfy the portability criterion since, for each
new application, the set of the lexicon words must be trained.
During the last decade, hidden Markov models (HMMs), which
can be thought of as a generalization of dynamic programming
techniques [9], have become the predominant approach to
automatic speech recognition [9], [10], [11]. These stochastic
models have been shown to be well-adapted to summarize
variability phenomena involved in time-varying signals. The
success of HMMs in speech recognition has recently led many
researchers to apply them to handwriting recognition by repre-
senting each word image as a sequence of observations. According
to the way this representation is carried out, two approaches can be
distinguished: implicit segmentation [4], [12], which leads to a
speech-like representation of the handwritten word image, and
explicit segmentation [5], [6], which requires a segmentation
algorithm to split words into letters or pseudoletters.
In this paper, we propose an explicit segmentation-based HMM
approach to recognize unconstrained handwritten words (upper-
case, cursive and mixed). This system uses three sets of features: The
first two are related to the shape of the segmented units, while the
features of the third set describe segmentation points between
these units. The first set is based on global features, such as loops,
ascenders, and descenders, and the second set is based on features
obtained by the analysis of the bidimensional contour transition
histogram of each segment. Finally, segmentation features corre-
spond to either spaces, possibly occurring between letters or
words, or the vertical position of segmentation points that split
connected letters. Given that the two sets of shape-features are
separately extracted from the image, we represent each word by
two feature sequences of equal length, each consisting of an
alternating sequence of shape-symbols and segmentation-symbols.
In the problem we are dealing with, we consider a vocabulary
752 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999
. A. El-Yacoubi, R. Sabourin, and C.Y. Suen are with the Centre for Pattern
Recognition and Machine Intelligence, Department of Computer Science,
Concordia University, 1455 de Maisonneuve Boulevard West, Suite GM-
606, Montre
Â
al, Canada H3G 1M8. A. El-Yacoubi is also with the
Departamento de Informatica, Pontificia Universidade Catolica do Parana,
Av. Imaculada Conceicao, 1155-Prado Velho, 80.215-901 Curitiba-PR-
Brazil. R. Sabourin is also with Ecole de Technologie Supe
Â
rieure,
Laboratoire d'Imagerie, de Vision et d'Intelligence Artificielle (LIVIA),
1100 Notre-Dame Ouest, Montre
Â
al, Canada H3C 1K3.
E-mail: yacoubi@ppgia.pucpr.br.
. M. Gilloux is with Service de Recherche Technique de La Poste,
De
Â
partement Reconnaissance, Mode
Â
lisation et Optimasation (RMO), 10,
rue de l'õ
Ã
le Ma
Ã
bon, 44063 Nantes Cedex 02, France.
Manuscript received 26 May 1998; revised 10 Mar. 1999.
Recommended for acceptance by J. Hull.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number 107682.
0162-8828/99/$10.00 ß 1999 IEEE

which is large but dynamically limited. For example, in city name
recognition, the contextual knowledge brought by the postal code
identity can be used to reduce the lexicon of possible city names to
a small size. Since the entire vocabulary of words is large, it is more
realistic to model basic units, such as letters, rather than whole
words. Indeed, this modeling needs only a reasonable number of
models to train (and to store). Then, each word (or word sequence)
model can be dynamically built by concatenating letter models.
This modeling is also more appropriate for available learning
databases, which often do not contain all the possible words to be
recognized. Our system also contains a mechanism to reject
unreliable decisions.
This paper is organized as follows. Section 2 describes the
fundamentals of hidden Markov models. Section 3 details the steps
of preprocessing, segmentation, and feature extraction. Section 4
deals with the application of HMMs to handwritten word
recognition in a dynamic vocabulary. Section 5 presents the
experiments performed to validate the approach. Section 6
concerns the rejection mechanism considered by our system.
Finally, Section 7 gives some concluding remarks and perspectives.
2HIDDEN MARKOV MODELS
Hidden Markov models have been applied in several areas during
the last 15 years, including speech recognition [9], [10], [11],
language modeling [13], handwriting recognition [4], [5], [6], on-
line signature verification [14], etc. A hidden Markov model is a
doubly stochastic process, with an underlying stochastic process
that is not observable (hence the word hidden), but can be
observed through another stochastic process that produces the
sequence of observations [11]. The hidden process consists of a set
of states connected to each other by transitions with probabilities,
while the observed process consists of a set of outputs or
observations, each of which may be emitted by each state
according to some probability density function (pdf). Depending
on the nature of this pdf, several HMM classes can be
distinguished. If the observations are naturally discrete or
quantized using vector quantization [15], and drawn from an
alphabet or a codebook, the HMM is said to be discrete [10], [11]. If
these observations are continuous, we are dealing with a
continuous HMM [11], [16], with a continuous pdf usually
approximated by a mixture of normal distributions. Another
family of HMMs, a compromise between discrete and continuous
HMMs, are semi-continuous HMMs [17] that mutually optimize
the vector quantized codebook and HMM parameters under a
unified probabilistic framework. Although HMMs have some
limitations such as the assumption of conditional independence of
observations given the state sequence, these limitations are behind
the well-defined theoretical foundations of HMMs and the
existence of powerful algorithms for decoding and training.
Particularly, a procedure called the Baum-Welch algorithm [11]
can iteratively and automatically adjust HMM parameters given a
training set of observation sequences. This algorithm, which is an
implementation of the EM (expectation-maximization) algorithm
[18] in the HMM case, guarantees that the model converges to a
local maximum of the probability of observation of the training set
according to the maximum likelihood estimation (MLE) criterion.
The local maximum depends on the initial HMM parameters.
In some applications, it is useful to allow transitions with no
output in order to model for instance a missing event in a given
stochastic process, e.g., the absence of an expected character in a
word due to undersegmentation or misspelling. It has been shown,
in this case, it is more convenient to produce observations by
transitions rather than by states [10]. To accommodate these
changes, we have to define an additional HMM parameter a
0
ij
which stands for the probability of null transition between states i
and j, i.e., that produces no output, a
ij
being the conventional
nonnull transition between these two states. We also define, for
discrete HMMs, for instance, b
ij
k as the probability of observing
the symbol k given the transition between states i and j. In this
case, the stochastic constraints, for an N-state discrete HMM with
an alphabet of size M, become:
X
N
j1
a
ij
a
0
ij
1 and
X
M
k1
b
ij
k1: 1
Taking this into account, slight changes occur in the classical
Baum-Welch and Viterbi [19] algorithms for which the various
forward and backward recursions still hold.
3REPRESENTATION OF WORD IMAGES
Markovian modeling assumes that a word image is represented by
a sequence of observations. These observations should be
statistically independent once the underlying hidden state se-
quence is known. Therefore, we first preprocess each input image
to get rid of information that is not meaningful to recognition and
that may lead to dependence between observations (character
slant, etc.). Then, segmentation and feature extraction processes are
carried out to transform the image into an ordered sequence of
symbols.
3.1 Preprocessing
The goal of preprocessing is to reduce irrelevant information such
as noise and intraclass variability (e.g., character slant) that causes
high writer-sensitivity in classification, therefore increasing the
task complexity in a writer-independent recognizer. In our system,
the preprocessing stage consists of four steps [20]: baseline slant
normalization, lower case letter area (upper-baseline) normalization
when dealing with cursive words, character skew correction, and,
finally, smoothing (Fig. 1). The first two attempt to ensure a robust
extraction of our first feature set, mainly ascenders and descenders,
while the third step is required since the second feature set shows a
significant sensitivity to character slant.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999 753
Fig. 1. Preprocessing steps: (a) original image, (b) and (c) baseline slant normalization, (d) character slant normalization, (e) lower-case letter area normalization,
(f) definitive image after smoothing.

Baseline slant normalization is performed by aligning the
minima of the lower contour after having filtered those corre-
sponding to descenders. Upper-baseline normalization is similar
and consists of aligning the maxima of the upper contour after
having filtered those corresponding to ascenders or upper-case
letters. However, the transformation here is nonlinear since it must
keep the normalized lower-baseline horizontal. The ratio of the
number of filtered maxima over the total number of maxima is
used as an a priori selector of the writing style: cursive or upper-
case (in which case no normalization is done). Character skew is
estimated as the average slant of elementary segments obtained by
sampling the word image contour, without taking into account
horizontal and pseudohorizontal segments. Finally, we apply
smoothing to eliminate the noise appearing at the borders of the
word image due to the normalizations mentioned above.
3.2 Segmentation of Words into Characters
In speech recognition, the basic units correspond to phonetic
events, for instance, phonemes. As it is hard to achieve an a priori
explicit segmentation of words into those units, the techniques
employed consist of sampling the speech signal into successive
frames with a sufficiently high frequency. This representation is
suitable because such a frequency allows a slow description of the
speech signal in such a way that the different phonetic events can
more or less be separately detected using minimal supervised
learning techniques [9], [10]. When dealing with handwritten words,
the basic units are naturally the alphabet letters. The employed
segmentation techniques are numerous, but can be categorized
into either implicit or explicit methods. Implicit methods are
inspired by those considered in speech recognition and can either
work at the pixel column level [4], [12] or realize an a priori
scanning of the image with sliding windows [21]. Explicit
methods, by contrast, use some characteristic points, such as
upper (or lower) contour minima, intersection points, or spaces, to
propose possible segmentation points (SPs). Due to the bidimen-
sional character of off-line handwritten word images and to the
overlap between letters, implicit methods are less efficient here
than in speech recognition or on-line handwriting recognition.
Indeed, vertical sampling loses the sequential aspect of the strokes,
which is better represented by explicit methods. Moreover, in
implicit methods, SPs have to be learned also. Nevertheless,
implicit methods complement explicit ones and are particularly
efficient in dealing with discrete touching characters. On the other
hand, because of the ambiguity encountered in handwritten
words, it is impossible to correctly segment a word into characters
without resorting to the recognition phase. Indeed, the same pixel
representation may have several interpretations, according to
context. In Fig. 2, for instance, the group of letters at the right of
e could be interpretedÐin the absence of contextÐby many
combinations of letters i, m, n, r, u, v,orw.
We choose an explicit segmentation algorithm that deliberately
proposes a high number of SPs, offering in this way several
segmentation options, the best one to be validated during
recognition. This algorithm is based on the following two
hypotheses: 1) There exists natural SPs corresponding to discon-
nected letters; 2) the physical SPs between connected letters are
located at the neighborhood of the image upper contour minima.
To segment a word, we make use of the upper and lower contours,
loops, and upper contour minima. Then, each minimum satisfying
some empirical rules gives rise to an SP. Mainly, we look in the
neighborhood of this minimum for the upper contour point that
permits a vertical transition from the upper contour to the lower
one without crossing any loop, while minimizing the vertical
transition histogram of the word image. If the crossing of a loop is
unavoidable, no SP is produced. This strategy may produce
correctly segmented, undersegmented, or oversegmented letters,
as shown in Fig. 3, for example.
3.3 Feature Extraction
The goal of the feature extraction phase is to extract, in an ordered
way (suitable to Markovian modeling), a set of relevant features
that reduce redundancy in the word image while preserving the
discriminative information for recognition. Our main philosophy
in this step is that, unlike isolated character recognition, lexicon-
driven word recognition approaches do not require features to be
very discriminative at the character or pseudo character level
because other information, such as context (particular letter
ordering in lexicon words), word length, etc., are available and
permit high discrimination of words. Thus, we consider features at
the segment level with the aim of clustering letters into classes. In
our system, the sequence of segments obtained by the segmenta-
tion process is transformed into a sequence of symbols by
considering two sets of features.
The first feature set is based on global features, namely loops,
ascenders, and descenders. Ascenders (descenders) are encoded in
two ways according to their relative size compared to the height of
the upper (lower) writing zone. Loops are encoded in various ways
according to their membership in each of the three writing zones
and their relative size compared to the sizes of these zones. The
horizontal order of the median loop and the ascender (or
descender) within a segment are also taken into account to ensure
a better discrimination between letters such as b and d or p and q.
1
Each combination of these features within a segment is encoded by
a distinct symbol, leading in this way to an alphabet of 27 symbols.
For example, in Fig. 3, the first segment is encoded by the symbol
L, reflecting the existence of a large ascender and a loop located
above the core region. The second segment is encoded by the
symbol o, indicating the presence of a small loop within the body
of the writing. The third segment is represented by the symbol -,
which encodes shapes without any interesting feature, etc.
The second feature set is based on the analysis of the
bidimensional contour transition histogram of each segment in
the horizontal and vertical directions. After a filtering phase
consisting of averaging each column (row) histogram value over a
five pixels-wide window centered in this column (row) and
rounding the result, the histogram values may be equal to 2, 4, or 6.
In each histogram, we focus only on the median part, representing
the stable area of the segment, and we determine the dominant
transition number defined as the value k (2, 4, or 6) for which the
number of columns (rows) with a histogram value equal to k is
754 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999
1. Henceforth, alphanumeric characters will be designated by courier
format and feature symbols with italic style.
Fig. 2. Ambiguity in handwritten words, here the French word Chemin.
Fig. 3. Segmentation of words into letters or pseudoletters.

maximum. Each different pair of dominant transition numbers is
then encoded by a different symbol or class. After having created
some further subclasses by a finer analysis of the segments, this
coding leads to a set of 14 symbols. For instance, in Fig. 4, Letters B,
C, and O, for which the pairs of dominant transition numbers are
(6, 2), (4, 2), and (4, 4), are encoded by symbols called B, C, and O,
respectively, but this, of course, is not always the case.
We also use five segmentation features that try to reflect the
way segments are linked together. For connected segments, two
configurations are distinguished: If the space width is less than a
third of the average segment width (ASW), we consider that there
is no space, and encode this configuration by the symbol n.
Otherwise, we validate the space and encode it in two ways,
depending on whether the space width is smaller (symbol @) or
larger (symbol #) than ASW. If the two segments are connected, the
considered feature is the segmentation point vertical position
which is encoded in two ways (symbols s or u) depending on
whether the segmentation point is close to or far from the writing
baseline. Finally, given an input word image, the output of the
feature extraction process is a pair of symbolic descriptions of
equal length, each consisting of an alternating sequence of segment
shape symbols and associated segmentation point symbols (Fig. 5).
4MARKOVIAN MODELING OF HANDWRITTEN WORDS
This section presents the application of HMMs in handwritten
word recognition. After briefly describing some related works in
this field, we give the justifications behind the design of the model
we propose and we detail the steps of learning and recognition as
used in our system.
4.1 Use of HMMs in Handwritten Word Recognition
Recently, HMMs have been applied to several areas in hand-
writing recognition, including character recognition [22], on-line
word recognition [23], [24] and off-line word recognition. In the
latter application, Gillies [4] was one of the first to use an implicit
segmentation-based HMM for cursive word recognition. First, a
label is given to each pixel in the image according to its
membership in strokes, holes, and concavities. Then, the image is
transformed into a sequence of symbols which result from the
vector quantization of each pixel column. Each letter is character-
ized using a different discrete HMM, the parameters of which are
estimated on hand-segmented data. The Viterbi algorithm is used
in recognition and allows an implicit segmentation of words into
letters as a by-product of the word matching process. Magdi and
Gader [12] use a similar technique in which the observations are
based on the location of black-white and white-black transitions on
each image column and a 12-state left-to-right HMM is designed
for each character. Cho et al. [21] also use an implicit segmentation
technique in which a cursive word image is first split into a
sequence of overlapping vertical gray-scale bitmap frames, which
are then encoded into discrete symbols using principal component
analysis and vector quantization. A word is modeled as an
interconnection network of character and ligature HMMs. To
improve the recognition strategy, several combinations of Forward
and Backward Viterbi procedures were investigated. Chen et al. [6]
use an explicit segmentation-based continuous density variable
duration HMM where the observations are based on geometrical
and topological features, pixel distributions, etc. Each letter is
identified with a state which can account for up to four segments
per letter. The parameters of the HMM are estimated using the
lexicon and the manually labeled training data. A modified Viterbi
algorithm is applied to provide several outputs, which are
postprocessed using a general string editing method. Finally,
Bunke et al. [25] propose an HMM approach to recognize cursive
words produced by cooperative writers. The features used in their
scheme are based on the edges of the skeleton graph of a word. A
semicontinuous HMM (the Isadora system [26]) is considered for
each character and the number of Gaussians was defined by
manual inspection of the data set. Recognition is performed using
a beam search-driven Viterbi algorithm.
4.2 The Proposed Model
As shown above, several HMM architectures can be considered for
handwritten word recognition. This stems from the fact that
handwriting is certainly not a Markovian process and, even if it
were so, the correct HMM architecture is actually not known. The
usual solution to overcome this problem is to first make structural
assumptions and then use parameter estimation to improve the
probability of generating the training data by the models. In our
case, the assumptions to be made are related to the behavior of the
segmentation process. As our segmentation process may produce
either a correct segmentation of a letter, a letter omission, or an
oversegmentation of a letter into two or three segments, we built
an eight-state HMM having three paths to take into account these
configurations (Fig. 6). In this model, observations are emitted
along transitions. Transition t
07
, emitting the null symbol ,
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999 755
Fig. 4. Transition histograms of segmented shapes.
Fig. 5. Pair of feature sequences representing a word (word sequence) image.

models the letter omission case. Transition t
06
emits a symbol
encoding a correctly segmented letter shape, while transition t
67
emits a symbol encoding the nature of the segmentation point
associated with this shape. Null transition t
36
models the case of
oversegmentation into only two segments. Transitions t
01
, t
23
, and
t
56
are associated with the shapes of the first, second, and third
parts of an oversegmented letter, while t
12
and t
45
model the nature
of the segmentation points that gave rise to this oversegmentation.
Note that the rare occurrence of splitting a letter into three pieces
makes the associated parameters likely not to be reliably estimated.
The solution to this problem is to share the transitions involved in
this phenomenon (t
34
, t
36
, t
45
, t
56
) over all character models, by
calling for the tied states principle. Nevertheless, this procedure is
not carried out for letters M, W, m,orw for which the probability of
segmentation into three pieces is high and, therefore, there are
enough examples to separately train the third segment parameters
for each of these letters. Finally, a refinement of the character
model consisted of considering context-dependent models for
upper-case letters depending on their position in the word: first
position, whether in an upper-case or cursive word, or any
different position in an upper-case word. The motivation behind
this is that features extracted from these two categories of letters
can be very different since they are based on global features, such
as ascenders, which strongly depend on the writing style.
Our model architecture is somewhat similar to that of other
approaches, such as [2], [27], but with some differences. Here, the
first segment presented to a character model is produced by two
different transitions depending on whether it corresponds to the
entire shape of a correctly segmented character (t
06
) or to the first
part of an oversegmented character (t
01
), while in [2], [27], for
example, the same transition is shared between these two
configurations. Our architecture allows the transitions in the
model to be fed by homogeneous data sources, leading to less
variability and higher accuracy (e.g., the first part of an over-
segmented d and a correctly segmented d,whicharevery
different, would be presented to different kinds of transitions, t
06
and t
01
, respectively). In other words, the variability coming from
the inhomogeneity in the source data, since it is known a priori, is
eliminated by separate modeling of the two data sources. In
addition, we have a special model for interword space, in the case
where the input image contains more than one word. This model
simply consists of two states linked by two transitions, modeling a
space (in which case only the symbols corresponding to spaces @
or # can be emitted) or no space between a pair of words (Fig. 7).
4.3 The Learning Phase
Since the exact orthographic transcription (labeling) of each
training word image is available, the word model is made up of
the concatenation of the appropriate letter models, the final state of
an HMM becoming the initial state of the next one, and so on
(Fig. 8).
Note that, here, we use an embedded Baum-Welch training
algorithm for which the segments produced by the segmentation
algorithm need not be manually labeled. This is an important
consideration for two reasons: First, manually segmenting a
database is a very expensive process and is therefore not desirable;
second, assuming we have a sufficient learning database,
embedded Baum-Welch training allows the recognizer to capture
contextual effects and permits the segmentation of the feature
sequence into letters and the reestimation of the associated
transitions so as to optimize the likelihood of the training database.
Thus, the recognizer decides for itself what the optimal segmenta-
tion might be, rather than being heavily constrained by a priori
knowledge based on human intervention [9]. This is particularly
true if we bear in mind the inherent incorrect assumptions made
about the HMM structure. From an implementation point of view,
given a word composed of L letters, a new parameter correspond-
ing to the index of the currently processed letter is added to the
probabilities involved in the Baum-Welch algorithm. Then, the
results of the final forward (initial backward) probabilities at the
last (initial) state of the elementary HMM associated with a letter
are moved forward (backward) to become the initial forward (final
backward) probabilities at the initial (last) state of the elementary
HMM associated with the following (previous) letter. If
l
t
i (
l
t
i)
denotes the forward (backward) probability associated with the
letter of index l, then this process is carried out according to the
following equations:
l1
t
0
l
t
N ÿ 1 l 0; ...;Lÿ 2 t 0; 1; ...;T ÿ 1 2
l1
t
0
l
t
N ÿ 1 l 0; ...;Lÿ 2 t 0; 1; ...;T ÿ 1; 3
where 0 and N ÿ 1 are the initial and final states of elementary
letter HMMs and t is time index. In addition to the learning set, we
use a validation set on which the reestimated model is tested after
each training iteration. The training stops when the likelihood of
the training set becomes sufficiently small or, more formally, when
the following inequality becomes true:
t
P
T
ÿ P
T
ÿ1
P
T
P
T
ÿ1
<": 4
Here, P
T
is the likelihood of the training set at iteration ,
t
is the
normalized increase of P
T
and " is a sufficiently small threshold,
typically 10
ÿ3
, 10
ÿ4
, etc. Once the training phase is over, the stored
optimal model parameters are those corresponding to the iteration
maximizing the likelihood of the validation set (and not the last
iteration). This strategy allows the model to acquire a better
generalization over unknown samples.
4.4 The Recognition Phase
The recognition process consists of determining the word
maximizing the a posteriori probability that a word w has generated
an unknown observation sequence O,
Pr
^
wjOmax
w
PrwjO: 5
Applying Bayes' rule, we obtain the fundamental equation of
pattern recognition,
PrwjO
PrOjw Prw
PrO
: 6
756 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 8, AUGUST 1999
Fig. 6. The character model.
Fig. 7. The interword space model.

Citations
More filters
Journal ArticleDOI

A Novel Connectionist System for Unconstrained Handwriting Recognition

TL;DR: This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies, significantly outperforming a state-of-the-art HMM-based system.
Journal ArticleDOI

Indian script character recognition: a survey

TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.
Journal ArticleDOI

Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models

TL;DR: The use of hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models for recognizing unconstrained offline handwritten texts and new techniques to remove slope and slant from handwritten text and to normalize the size of text images with supervised learning methods are presented.
Journal ArticleDOI

Landmine detection with ground penetrating radar using hidden Markov models

TL;DR: Novel, general methods for detecting landmine signatures in ground penetrating radar (GPR) using hidden Markov models (HMMs) are proposed and evaluated and successfully tested at two different locations.
Journal ArticleDOI

A survey on off-line Cursive Word recognition

TL;DR: This survey is divided into two parts, the first one dealing with the general aspects of Cursive Word Recognition, the second one focusing on the applications presented in the literature.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Journal ArticleDOI

The viterbi algorithm

TL;DR: This paper gives a tutorial exposition of the Viterbi algorithm and of how it is implemented and analyzed, and increasing use of the algorithm in a widening variety of areas is foreseen.
Journal Article

Vector quantization

TL;DR: During the past few years several design algorithms have been developed for a variety of vector quantizers and the performance of these codes has been studied for speech waveforms, speech linear predictive parameter vectors, images, and several simulated random processes.
Journal ArticleDOI

The expectation-maximization algorithm

TL;DR: The EM (expectation-maximization) algorithm is ideally suited to problems of parameter estimation, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation.
Journal ArticleDOI

A Maximum Likelihood Approach to Continuous Speech Recognition

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What contributions have the authors mentioned in the paper "An hmm-based approach for off-line unconstrained handwritten word modeling and recognition" ?

ÐThis paper describes a hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies. 

The goal of the feature extraction phase is to extract, in an ordered way (suitable to Markovian modeling), a set of relevant features that reduce redundancy in the word image while preserving the discriminative information for recognition. 

The usual solution to overcome this problem is to first make structural assumptions and then use parameter estimation to improve the probability of generating the training data by the models. 

Since the entire vocabulary of words is large, it is more realistic to model basic units, such as letters, rather than whole words. 

The main strength of the proposed system lies in its training phase, which does not require any manual segmentation of the data to train the character models. 

The ratio of the number of filtered maxima over the total number of maxima is used as an a priori selector of the writing style: cursive or uppercase (in which case no normalization is done). 

An obvious solution to handle the two sequences is to use two independent word recognition engines and combine theiroutputs in a subsequent stage. 

Although HMMs have some limitations such as the assumption of conditional independence of observations given the state sequence, these limitations are behind the well-defined theoretical foundations of HMMs and the existence of powerful algorithms for decoding and training. 

Character skew is estimated as the average slant of elementary segments obtained by sampling the word image contour, without taking into account horizontal and pseudohorizontal segments. 

An elegant method could be a hierarchical data representa-tion which provides more details as the feature sequence lengthgets smaller and the ambiguity between the dynamic lexiconcandidates (to be computed dynamically) gets higher.