scispace - formally typeset
Open AccessJournal ArticleDOI

Offline cursive script word recognition : a survey

TLDR
This work mainly deals with the various methods that were proposed to realize the core of recognition in a word recognition system, and classifies the field into three categories: segmentation-free methods, which compare a sequence of observations derived from a word image with similar references of words in the lexicon; segmentations-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word.
Abstract
We review the field of offline cursive word recognition. We mainly deal with the various methods that were proposed to realize the core of recognition in a word recognition system. These methods are discussed in view of the two most important properties of such a system: the size and nature of the lexicon involved, and whether or not a segmentation stage is present. We classify the field into three categories: segmentation-free methods, which compare a sequence of observations derived from a word image with similar references of words in the lexicon; segmentation-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word; and the perception-oriented approach, that relates to methods that perform a human-like reading technique, in which anchor features found all over the word are used to boot-strap a few candidates for a final evaluation phase.

read more

Content maybe subject to copyright    Report

IJDAR (1999) 2: 90–110
International
Journal on
IJDAR
Document Analysis and Recognition
c
Springer-Verlag 1999
Offline cursive script word recognition a survey
Tal Steinherz
1
, Ehud Rivlin
2
, Nathan Intrator
1
1
School of Mathematical Sciences, Sackler Faculty of Exact Sciences, Tel-Aviv University, Ramat Aviv 69978, Israel;
e-mail: {talstz,nin}@math.tau.ac.il
2
Department of Computer Science, Technion, Technion City 32000, Israel; e-mail: ehudr@cs.technion.ac.il
Received September 21, 1998 / Revised September 2, 1999
Abstract. We review the field of offline cursive word
recognition. We mainly deal with the various methods
that were proposed to realize the core of recognition in
a word recognition system. These methods are discussed
in view of the two most important properties of such
a system: the size and nature of the lexicon involved,
and whether or not a segmentation stage is present. We
classify the field into three categories: segmentation-free
methods, which compare a sequence of observations de-
rived from a word image with similar references of words
in the lexicon; segmentation-based methods, that look
for the best match between consecutive sequences of pri-
mitive segments and letters of a possible word; and the
perception-oriented approach, that relates to methods
that perform a human-like reading technique, in which
anchor features found all over the word are used to boot-
strap a few candidates for a final evaluation phase.
Key words: Offline Cursive Handwritten Word
recognition Segmentation Survey
1 Introduction
The field of offline cursive word recognition has made
great progress during the past ten years. Many methods
have been developed in an attempt to satisfy the need
for such systems that exists in various applications like
automatic reading of postal addresses and bank checks,
processing documents such as forms, etc.
Most of these methods, while presenting a large spec-
trum of perspectives on the problem, share a common
structure, having the same modules. In the flow chart
given in Fig. 1, three common alternative structures of
word recognition systems are presented. The typical mo-
dules are preprocessing, then a possible segmentation
or fragmentation phase, feature extraction, the core of
recognition, and post-processing. Preprocessing usually
includes normalization, noise reduction, reference line
Correspondence to: E. Rivlin
finding, and either contour or skeleton tracing if nec-
essary. Next, there is the segmentation phase and its
substitutes. In a segmentation process, in contrast with
simple fragmentation or splitting into pieces, there is an
attempt to split the word image into segments that relate
to characters. Some methods prefer to avoid segmenta-
tion altogether for reasons that will be discussed later
on. In the latter a new problem may arise due to the fact
that most recognition modules, which come next, require
a one-dimensional signal of features, and cannot handle
features taken directly from the word image. Segmenta-
tion can be bypassed when the features used are global,
i.e., they are located in the word resolution and there-
fore they can be organized in the order they appear in
the word from left to right. However, when local features
are preferred, one needs to divide the word image into
sequential fragments, before the feature extraction stage
takes place. In this case the fragmentation process sub-
stitutes the full segmentation. Next, a feature extraction
process takes place. When high resolution features are
used, the extraction process is more sensitive to noise.
It is common to use code books in this stage when the
feature space is discrete. The recognition process follows
next. This process is heavily influenced by the nature of
the segmentation process, as will be discussed later on.
The recognition process is followed by post-processing.
This process relates to lexicon lookup, string correction,
and re-evaluation of a word probability with respect to
syntax and context issues.
Most of this survey focuses on the algorithms that
were proposed in order to realize the recognition phase.
The other modules that usually constitute a word recog-
nition system are briefly discussed in Sect. 2.
One can classify the field of offline cursive word recog-
nition into three categories according to the size and
nature of the lexicon involved: large; limited, but dy-
namic; small and specific. Small lexicons do not include
more than 100 words, while limited lexicons may go up
to 1000. Large lexicons refer to any lexicon size beyond
that. When a dynamic lexicon (in contrast with specific
or constant) is used, it means that the words that will
be relevant during a recognition task are not available

T. Steinherz et al.: Offline cursive script word recognition a survey 91
post
extraction processing
feature
recognition
split to pieces
segmentation
preprocessing
Fig. 1. Three alternative structures of a
word recognition system. Each alternative
differs in the way it handles the segmen-
tation phase
during training because they belong to an unknown sub-
set of a much larger lexicon. This classification coincides
with the different modeling techniques associated with
the different recognition methods. When small or limited
lexicons are involved a model-discriminant approach is
often used. Using this approach, each word is represented
by a unique model. Given an observation sequence O
T
1
,
one goes over all word models and finds the word W
i
as-
sociated with the model that has maximum a posteriori
probability Pr(W
i
|O
T
1
). Using the Bayes rule
max
i
Pr(W
i
|O
T
1
) = max
Pr(O
T
1
|W
i
)Pr(W
i
)
Pr(O
T
1
)
Pr(W
i
) is usually assumed to have a uniform distribu-
tion due to the fact that statistics on the frequency ap-
pearance of each lexicon word is unavailable (unless spec-
ified otherwise). Since Pr(O
T
1
) is also independent of W
i
,
the a posteriori probability converges to the score given
by the word model Pr(O
T
1
|W
i
).
Practically, a system that is based on a model-dis-
criminant method cannot handle large lexicons. In this
case, usually a single model acts as a hypothesis gener-
ator that reacts to the input observations and produces
a ranked list of candidate words. Note that in this hy-
pothesis (path)-discriminant method, the generator pro-
duces spurious words (outside the lexicon), besides the
legal ones with their associated matching score. During
the process the lexicon is used to verify the existence
of complete words or prefixes of hypotheses. However,
models in both large or limited lexicon environments
have a lot in common. Since the lexicon available during
training is large in both cases, there cannot be a reliable
training if only some word samples will be used. There-
fore, models used in these scenarios are built from letter
models and hence might be called letter-oriented. This is
based on the assumption that a specific letter looks the
same when it appears in different locations and neigh-
borhoods (regarding the other letters that surround it).
Empirically, this assumption is very solid if one selects
valuable features that are invariant in this manner, in
spite of the noise that characterizes cursive script due to
the ligatures between letters.
Clearly, the larger the lexicon is, the more flexible the
application that utilizes it can be, but the recognition be-
comes more difficult and the results get less satisfactory.
Furthermore, all methods that were proposed for larger
lexicons are applicable to smaller ones as well but are
less suitable, meaning they will probably do worse.
Besides the lexicon size and nature, a major issue
that the recognition method should relate to is the seg-
mentation problem. This issue is one of the most impor-
tant decisions one needs to make when starting to de-
sign a word recognition system. Ideally a perfect segmen-
tation algorithm would split a cursive word image into
complete characters. Then, using character recognition
techniques, the word would be recognized with high con-
fidence. Unfortunately such a segmentation algorithm is
not available. Furthermore it may never be available due
to ambiguity in cursive words that was best expressed
by Sayre’s paradox [71]: ”To recognize a letter, one must
know where it starts and where it ends, to isolate a letter,
one must recognize it first”. This conflict was reflected
in the field of cursive word recognition methods result-
ing in a situation of having very few methods that rely
on pure segmentation followed by a separate recognition
phase. Three examples of methods of this kind ([88], [60]
and [66]) will be discussed in the relevant section.
Based on this last observation two approaches were
developed. One implements a segmentation-free
approach, i.e., there is no attempt to split the word im-
age into segments that relate to characters. Still, it is
possible that the image would be split into pieces in or-
der to produce a sequence of observations, i.e., symbols.
Instead of a letter-by-letter recognition, one tries to rec-
ognize the whole word as one entity, by searching for a
word with the most similar complete description to the
one obtained from the whole input image.
The other approach uses a segmentation-based meth-
od. Given a sequence of primitive segments derived from
a word image, and a group of possible words, we seek
for a top-level segmentation that maximizes the average
matching score between respective characters and seg-
ments. Top-level segmentation means that each segment
is a union of one or more consecutive primitive segments,
under the condition that each primitive segment appears
exactly one time. In other words a top-level segmenta-
tion is a subset of the primitive segmentation points.
The term primitive with respect to segments stands for
elementary segments that were created by the segmen-
tation algorithm that was used. We find this approach
preserves the semantic meaning of symbols with respect
to characters, while the segmentation-free approach can
prevent errors caused by unsuccessful pre-segmentation.
In b oth approaches many methods use dynamic-pro-
gramming tools in order to find the best interpretation.
Optimization problems often use dynamic-programming
techniques when the optimal solution is a combination of

92 T. Steinherz et al.: Offline cursive script word recognition a survey
1st letter
sub-HMM
. . .
sub-HMM
. . .
sub-HMM
ith letter last letter
Fig. 2. A single word HMM, built by concatenating the rel-
evant letter sub-HMMs
sub-HMM
. . .
sub-HMM
. . .
sub-HMM
’a’ ’m’ ’z’
Fig. 3. A single HMM that can handle a large lexicon. Each
one of the 26 letters is associated with a sub-HMM that is
connected to all the others. The resulting model gains a clique
topology in which different paths represent different word
interpretations
optimal solutions to partial problems. In a word recog-
nition case the optimal interpretation of a word image
may be constructed by concatenating the optimal in-
terpretations to disjoint parts of the word image. One
should refer to [3] and [18] for further explanations re-
garding dynamic programming. A very important appli-
cation that is also commonly used by methods of both
approaches is a hidden Markov Model (HMM). Some find
the HMMs to be a particular case of the more general
dynamic-programming field.
When small or limited lexicons are used (in segmen-
tation-free or segmentation-based approaches), each
word is associated with a separate HMM that is a con-
catenation of the relevant letter sub-HMMs as illustrated
in Fig. 2.
When large lexicons are used, a single HMM in which
different paths (state sequences) represent different word
interpretations is used. Such HMMs can be described as
a graph with a clique topology, where each node repre-
sents a letter sub-HMM and the arcs represent transi-
tions between them (Fig. 3). Section A briefly reviews
HMMs and some observations to do with their role in
word recognition.
In this paper we have decided to use the typ e of the
segmentation scheme as the major criterion for classify-
ing the recognition method used. As many recognition
methods can be easily adapted to handle both limited
and large lexicons, the size and nature of the lexicon
is not used as a major classification criterion. However,
the relationship between each approach and the various
lexicon size and nature is discussed.
We take perception-oriented methods as forming a
different/separate category. These methods differ from
both the segmentation-free and segmentation-based ori-
ented methods. The common principle to the methods
that belong to this class is first to identify some char-
acters and then continue with trial and error techniques
regarding the gaps that were left using the provided
lexicon for help. These methods attempt to recognize in-
dividual characters that are present in the word image,
and do not try to recognize the word as a whole. The
implementation of the recognition phase is not segment-
oriented and it does not follow the regular scheme of a
left to right search.
For practical reasons there is a common assumption
that an input word is not necessarily cursive. It is likely
that there will be a mixture of cursive, hand printed, a
mixture of lower case and upper case, etc. Therefore it
is common to find methods that propose extra model-
ing for more than one style. This could be achieved by
doubling each model and training each copy to handle a
different writing style, or by combining redundant sub-
models parallel to the existing ones in parts of the word
that may be written in another style. Since this treat-
ment can be applied to any method, we do not mention
specifically those who have decided to use it.
We believe that surveys are important tools that help
to synchronize different research efforts around the world
and make use of the knowledge and previous experience
that was acquired by others. The reader might be inter-
ested in other relevant surveys involving online cursive
recognition [82], OCRs [42,84], and segmentation tech-
niques [52,8].
2 Stages in a word recognition system
In this section a global structure of a word recognition
system is described. The following structure is a union of
all common operations that usually appear in systems of
this kind. A typical complete word recognition process
consists of the following parts: preprocessing, a possi-
ble segmentation or fragmentation, feature extraction,
recognition, and post-processing. The core of a recog-
nition system is the algorithm that produces word in-
terpretation given a sequence of observations in either
one of the various word representation levels that will
be mentioned. This is the main issue of this survey and
it will be discussed widely in the next sections.
The preprocessing starting point depends on the en-
vironment in which the system is running. It may in-
clude external word segmentation (extraction) from a
multi-word neighborhood and other various document
processing techniques. Given a stand-alone word, a few
normalization operations are performed, among which
are:
Skew correction a rotation transformation that
brings the word orientation parallel to the horizontal;
Slant correction a shear transformation that at-
tempts to make all the vertical strokes erect;
Smoothing including all different kinds of noise re-
duction;
Scaling invariance to size (used in rare cases only).
Another important procedure in a major part of the sys-
tems is reference line finding. This procedure is essential
for the feature extraction stage that comes next. Op-
tional contour or skeleton tracing are also parts of the
preprocessing phase.
In Fig. 4 one can observe some of the various prepro-
cessing algorithms that are commonly used. First, the

T. Steinherz et al.: Offline cursive script word recognition a survey 93
20 40 60 80 100 120 140 160 180 200
20
40
60
80
100
120
20 40 60 80 100 120 140 160 180 200
20
40
60
80
100
120
20 40 60 80 100 120 140 160 180 200
20
40
60
80
100
120
20 40 60 80 100 120 140 160 180 200
20
40
60
80
100
120
20 40 60 80 100 120 140 160 180 200
20
40
60
80
100
120
Fig. 4. Some of the various preprocessing algorithms that are commonly used. From left to right: the original image, then the
image after going through skew correction, slant correction, reference line finding, and eventually a thinning algorithm that
produces the word skeleton
original image is presented on the left, then the image
after going through skew correction, slant correction, ref-
erence line finding, and eventually a thinning algorithm
that produces the word skeleton. The algorithms were
executed in a raw, i.e., each algorithm was input the
image that resulted from its preceding algorithm.
Pre-segmentation is the process of word segmentation
into primitive segments. Many algorithms have been pro-
posed for this task and we recommend two surveys that
review this field [52] and [8]. In the case of a segmenta-
tion-free method, this stage may be ignored or replaced
with splitting the image into sequential fragments that
do not attempt to match complete characters.
The role of the feature extraction stage is to retrieve
observations out of the word image. There are several
classes of features. Segmentation-free methods use either
raw features, which are pixel-wise like strokes, or global
symbolic features such as ascenders, descenders, loops,
etc. Segmentation-based methods look for local features
in the segment that is being evaluated. The most pop-
ular features in this case are the global features (ascen-
ders, descenders, loops) and local irregularities such as
X and T crossings, end points, sharp curvatures, etc.
Other, less symbolic alternatives are different kinds of
pixel moments, distribution in the different parts of the
segment, etc. A feature is assumed to be more reliable
if it is less sensitive to noise and to the variance in writ-
ing style, and if it has good ability to distinguish among
words or letters as required.
After the recognition module has finished running, some
methods use post-processing techniques to improve the
recognition results. Some of these operations such as lex-
icon lookup and string correction are mentioned in this
paper because they are built in during the recognition
task. However additional syntax and context are very
popular to bust or disqualify legal words according to
the circumstances. A special case appears when large
lexicons are involved. In this case a hypothesis genera-
tor outputs possible words that might not be present in
the lexicon. Therefore, one may find it very difficult to
search for an optimal solution under the constraint of
being legal, i.e., to appear in the given large lexicon. A
unique post-processing was developed for this purpose,
based on the minimum edit-distance that will be men-
tioned later on in other aspects of recognition. Given a
hypothesis that is the optimal interpretation of the input
word image, we find the lexicon word that is the most
similar to the hypothesis. Similarity between words is
measured as the number of operations insertion, dele-
tion, substitution necessary to transform one word into
the other. From a different point of view this process is
considered as correction of garbled words [62]. Note that
regular post-processing of this kind requires going over
all the words in the lexicon and performing non-trivial
calculation for each one of them. This puts limitations
on the size of the lexicon that is assumed to be large.
Therefore, a proper usage of this process is in combina-
tion with some kind of lexicon reduction.
The interested reader may refer to [80] for more de-
tails on a complete handwritten text recognition system.
3 Segmentation-free methods
In a segmentation-free method, one should find the best
interpretation possible for an observation sequence de-
rived from the word image without performing a mean-
ingful segmentation first. An observation sequence can
be classified into three categories according to the rep-
resentation level of the word it stands for. The first cat-
egory relates to observations that are based on low-level
features taken directly from the word image. Such fea-
tures include smoothed traces (quantized/normalized
fragments) of the word contour, pieces of strokes between
anchor points, edges of a polygonal approximation of the
image skeleton, etc. The second category aggregates such
low-level features to serve as primitives. For example,
neighboring strokes can be merged into a smoothed pat-
tern, that will constitute a primitive. The main differ-
ence between the current category and the former one
is in the nature of the relevant feature space continu-
ous in contrast with discrete. The last category involves
methods that use even higher-level features of a word
image. The most popular features are the most irregu-
lar, i.e., holistic features that are hard to miss and are
invariant with respect to all the different writing styles.
In this case holistic means global with respect to a whole
word resolution, meaning that features of this kind, such
as ascenders, descenders, loops, i dots, t strokes, etc., are
prominent even in an image of a complete word. These
features may be sub-classified according to size, location
or orientation. These features will be referred to as sym-
bols.
In this section we discuss different algorithms that
were proposed for comparison between a pair of obser-
vation sequences. Most methods use either a minimum
edit-distance calculation based on dynamic-program-
ming, or a resemblance estimation provided by HMMs.

94 T. Steinherz et al.: Offline cursive script word recognition a survey
However, there are some specialized methods that in-
clude different comparison procedures. Nevertheless, in
most cases, such systems can utilize the minimum edit-
distance as well. The top-level organization of this sec-
tion matches the various comparison approaches (spe-
cialized, minimum edit-distance, HMMs). Each subsec-
tion is further sorted according to the feature level of the
word image representation that was used. The discussion
in this section will be in view of the size and nature of
the lexicon that the application may use. Table 1 shows
the location of each method that will be mentioned later
with respect to the comparison technique and represen-
tation level.
3.1 Specialized methods
In what follows we will discuss methods in which an ob-
servation sequence is mapped to a space with a distance
metric on complete sequences. All these methods are of
model-discriminant nature. Since finding the most likely
word is similar to other categories of model-discriminant
i.e., comparison between the input sequence and all
stored references, we focus on the unique details of each
of the methods.
1. Low level
Govindaraju et al., [36], represented a word image
as a sequence of strokes in the temporal dimension.
Temporal information, i.e., a complete linear order
among the strokes, is extracted by traversing the
strokes between consequent anchor points such as
peaks, valleys, intersection and end points. When an
intersection point is encountered, the traversal pro-
ceeds along the smoothest path, determined heuris-
tically based on orientation, trend and a gradient
smoothness criterion. Finally, a feature vector associ-
ated with the word image is obtained. No details were
specified regarding the matching algorithm. However,
the dynamic-programming or edit-distance seem to
be a possible go od choice in this scenario.
Gorsky [35] have used a holographic representation,
meaning that each sequence of consequent stroke
quanta that share approximately the same direction
(which will be referred to as fragments) is mapped
into a single point in a special parameter space. The
following features constitute the coordinates of a
point: global x-order (number of fragments along the
x-axis), local order of a fragment among the neigh-
boring fragments, and direction. These features are
insensitive to most of the distortions produced by
different writing styles and may also overcome miss-
ing or additional stroke comp onents. The intensity
of a point depends on the length of the fragment it
originated from and bonuses for distinguishing prop-
erties. Both indices and values were discrete and nor-
malized, therefore the resulting holograph is a three-
dimensional matrix. A certain word prototype, i.e.,
model, is created by mapping a set of words written
by various people to a single holograph, i.e., matrix.
Comparison between an input word and a prototype
both represented in holographic form could be carried
out either by cross-correlation or by computing the
percentage of word fragments that are ”explained”
by the prototype.
2. Medium level
In [63] a Markov model with no hidden states was
used to model a word. The set of legal observations
derived from a word image is used to define the states
of the model. Experiments with two sets of obser-
vations 8 strokes or 42 graphemes were made.
Choosing the order of the Markov model was based
on statistical criteria, and was eventually found to
be 2. Therefore, each word model was trained and as-
signed transition probabilities considering all the pos-
sible triple combinations of observations. The a pri-
ori probability of a word was also taken into account.
Hence, the probability of a word Markov model M
i
given the observed sequence Q is evaluated as follows:
P (M
i
|Q)
=
P (Q|M
i
)P (M
i
)
P (Q)
=
P (M
i
)P (X
1
|M
i
)P (X
2
|X
1
,M
i
) ...
P (X
1
)P (X
2
|X
1
)P (X
3
|X
1
,X
2
) ...
×
...P(X
3
|X
1
,X
2
,M
i
)P (X
n
|X
n2
,X
n1
,M
i
)
...P(X
n
|X
n2
,X
n1
)
3. High level
Guillevic et al., [38] have extracted seven types of
features from an input word image. Each feature is
associated with its relative position in the image (in
percents). In the training process, each lexicon word
goes through the same process but the feature loca-
tions are relative to the characters they belong to.
Thus, in the recognition process, for every lexicon
word, we translate positions of features in the in-
put image, that were given in percents, to charac-
ter locations of the word being compared and rate
the matching. The distance between an input feature
vector and a lexicon word feature vector is computed
as the minimum shift in feature locations needed in
order to match between the two.
A mixture of segmentation-free principles and a seg-
mentation algorithm takes place in the methods pre-
sented by Madhvanath et al. [53–56]. The produced
segments are used only to define the location of a
feature inside a word image. In the first case ([53]), a
concatenation of all features found results in a feature
vector used for comparison with references using
Euclidean distance. A few supporting mechanisms
were supplemented; feature equivalence rules enabled
one to define interchangeable sets of features. Macro-
features, i.e., a combination of features, lead to the
recognition of a sp ecific phrase with very high confi-
dence. In addition a new feature point of return
is included. However, this feature seems less robust
and more author-dependent. On the other hand if a
small lexicon of the order of ten words was engaged,
the ascender and descender features suffice to distin-
guish lexicon entries. In a more sophisticated method,
all features are sorted according to their type and
sub-sorted by their position from left to right [54–

Citations
More filters
Journal ArticleDOI

An overview of character recognition focused on off-line handwriting

TL;DR: The historical evolution of CR systems is presented, the available CR techniques, with their superiorities and weaknesses, are reviewed and directions for future research are suggested.
Journal ArticleDOI

Offline recognition of unconstrained handwritten texts using HMMs and statistical language models

TL;DR: The use of language models is shown to improve the accuracy of the system and the approach is described in detail and compared with other methods presented in the literature to deal with the same problem.
Journal ArticleDOI

Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models

TL;DR: The use of hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models for recognizing unconstrained offline handwritten texts and new techniques to remove slope and slant from handwritten text and to normalize the size of text images with supervised learning methods are presented.
Journal ArticleDOI

A survey on off-line Cursive Word recognition

TL;DR: This survey is divided into two parts, the first one dealing with the general aspects of Cursive Word Recognition, the second one focusing on the applications presented in the literature.
Journal ArticleDOI

Markov models for offline handwriting recognition: a survey

TL;DR: A comprehensive overview of the application of Markov models in the research field of offline handwriting recognition, covering both the widely used hidden Markov model and the less complex Markov-chain or n-gram models is provided.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Book

Dynamic Programming

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Offline cursive script word recognition – a survey" ?

The authors review the field of offline cursive word recognition. The authors mainly deal with the various methods that were proposed to realize the core of recognition in a word recognition system. These methods are discussed in view of the two most important properties of such a system: the size and nature of the lexicon involved, and whether or not a segmentation stage is present. The authors classify the field into three categories: segmentation-free methods, which compare a sequence of observations derived from a word image with similar references of words in the lexicon ; segmentation-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word ; and the perception-oriented approach, that relates to methods that perform a human-like reading technique, in which anchor features found all over the word are used to bootstrap a few candidates for a final evaluation phase. 

A typical complete word recognition process consists of the following parts: preprocessing, a possible segmentation or fragmentation, feature extraction, recognition, and post-processing. 

After the dynamic-programming phase is finished, the optimal segmentation that is associated with the highest score can be restored by back tracing. 

The term primitive with respect to segments stands for elementary segments that were created by the segmentation algorithm that was used. 

Because of the nature of handwriting that dictates a left to right movement, there are no backwards transitions in a sub-HMM representing a letter. 

Due to the well-known Baum-Welch algorithm, the HMMs are more easily trained and adapted than simple dynamic-programming methods based on minimum edit-distance. 

HMMs gained their popularity in classification problems in general and cursive word recognition in particular because they are most suitable to model the variance that usually appears in symbolic description chains of objects. 

The symbolic description chain is produced, like in some segmentation-based methods, by translating each feature vector that might be found in a segment into a unique symbol. 

The top-down process is used to validate or reject active words, according to the existence or absence, respectively, of features that are required to create the missing letters. 

The minimum edit-distance between two chains of symbols denoted by o1 . . . om and r1 . . . rn is the cheapest cost one needs to pay in order to transform one chain to the other using the operations of symbol deletion, substitution, or insertion of an extra one. 

In segmentation-based methods the recognition process is based on an attempt to find the best complete bipartite match between blocks of primitive segments and a word’s letters. 

In this paper the authors have decided to use the type of the segmentation scheme as the major criterion for classifying the recognition method used. 

When comparing the observations derived from a word image with a similar reference of a lexicon word, one attempts to find the best bipartite matching between the two, using a dynamic-programming algorithm.