How can the dynamic-programming phase be restored?

After the dynamic-programming phase is finished, the optimal segmentation that is associated with the highest score can be restored by back tracing.

Why are there no backwards transitions in a sub-HMM?

Because of the nature of handwriting that dictates a left to right movement, there are no backwards transitions in a sub-HMM representing a letter.

Why are HMMs more easily trained and adapted than dynamic programming methods?

Due to the well-known Baum-Welch algorithm, the HMMs are more easily trained and adapted than simple dynamic-programming methods based on minimum edit-distance.

Why did HMMs gain popularity in classification problems?

HMMs gained their popularity in classification problems in general and cursive word recognition in particular because they are most suitable to model the variance that usually appears in symbolic description chains of objects.

How is the symbolic description chain produced?

The symbolic description chain is produced, like in some segmentation-based methods, by translating each feature vector that might be found in a segment into a unique symbol.

What is the process used to validate or reject active words?

The top-down process is used to validate or reject active words, according to the existence or absence, respectively, of features that are required to create the missing letters.

What is the minimum edit distance between two chains of symbols?

The minimum edit-distance between two chains of symbols denoted by o1 . . . om and r1 . . . rn is the cheapest cost one needs to pay in order to transform one chain to the other using the operations of symbol deletion, substitution, or insertion of an extra one.

What is the way to find the match between blocks of primitive segments?

In segmentation-based methods the recognition process is based on an attempt to find the best complete bipartite match between blocks of primitive segments and a word’s letters.

What is the bipartite matching method?

When comparing the observations derived from a word image with a similar reference of a lexicon word, one attempts to find the best bipartite matching between the two, using a dynamic-programming algorithm.

(Open Access) Offline cursive script word recognition : a survey (1999) | Tal Steinherz

Q: What contributions have the authors mentioned in the paper "Offline cursive script word recognition – a survey" ?

The authors review the field of offline cursive word recognition. The authors mainly deal with the various methods that were proposed to realize the core of recognition in a word recognition system. These methods are discussed in view of the two most important properties of such a system: the size and nature of the lexicon involved, and whether or not a segmentation stage is present. The authors classify the field into three categories: segmentation-free methods, which compare a sequence of observations derived from a word image with similar references of words in the lexicon ; segmentation-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word ; and the perception-oriented approach, that relates to methods that perform a human-like reading technique, in which anchor features found all over the word are used to bootstrap a few candidates for a final evaluation phase.

IJDAR (1999) 2: 90–110

International

Journal on

IJDAR

Document Analysis and Recognition

 Springer-Verlag 1999

Oﬄine cursive script word recognition – a survey

Tal Steinherz

, Ehud Rivlin

, Nathan Intrator

School of Mathematical Sciences, Sackler Faculty of Exact Sciences, Tel-Aviv University, Ramat Aviv 69978, Israel;

e-mail: {talstz,nin}@math.tau.ac.il

Department of Computer Science, Technion, Technion City 32000, Israel; e-mail: ehudr@cs.technion.ac.il

Received September 21, 1998 / Revised September 2, 1999

Abstract. We review the ﬁeld of oﬄine cursive word

recognition. We mainly deal with the various methods

that were proposed to realize the core of recognition in

a word recognition system. These methods are discussed

in view of the two most important properties of such

a system: the size and nature of the lexicon involved,

and whether or not a segmentation stage is present. We

classify the ﬁeld into three categories: segmentation-free

methods, which compare a sequence of observations de-

rived from a word image with similar references of words

in the lexicon; segmentation-based methods, that look

for the best match between consecutive sequences of pri-

mitive segments and letters of a possible word; and the

perception-oriented approach, that relates to methods

that perform a human-like reading technique, in which

anchor features found all over the word are used to boot-

strap a few candidates for a ﬁnal evaluation phase.

Key words: Oﬄine – Cursive – Handwritten – Word

recognition – Segmentation – Survey

1 Introduction

The ﬁeld of oﬄine cursive word recognition has made

great progress during the past ten years. Many methods

have been developed in an attempt to satisfy the need

for such systems that exists in various applications like

automatic reading of postal addresses and bank checks,

processing documents such as forms, etc.

Most of these methods, while presenting a large spec-

trum of perspectives on the problem, share a common

structure, having the same modules. In the ﬂow chart

given in Fig. 1, three common alternative structures of

word recognition systems are presented. The typical mo-

dules are preprocessing, then a possible segmentation

or fragmentation phase, feature extraction, the core of

recognition, and post-processing. Preprocessing usually

includes normalization, noise reduction, reference line

Correspondence to: E. Rivlin

ﬁnding, and either contour or skeleton tracing if nec-

essary. Next, there is the segmentation phase and its

substitutes. In a segmentation process, in contrast with

simple fragmentation or splitting into pieces, there is an

attempt to split the word image into segments that relate

to characters. Some methods prefer to avoid segmenta-

tion altogether for reasons that will be discussed later

on. In the latter a new problem may arise due to the fact

that most recognition modules, which come next, require

a one-dimensional signal of features, and cannot handle

features taken directly from the word image. Segmenta-

tion can be bypassed when the features used are global,

i.e., they are located in the word resolution and there-

fore they can be organized in the order they appear in

the word from left to right. However, when local features

are preferred, one needs to divide the word image into

sequential fragments, before the feature extraction stage

takes place. In this case the fragmentation process sub-

stitutes the full segmentation. Next, a feature extraction

process takes place. When high resolution features are

used, the extraction process is more sensitive to noise.

It is common to use code books in this stage when the

feature space is discrete. The recognition process follows

next. This process is heavily inﬂuenced by the nature of

the segmentation process, as will be discussed later on.

The recognition process is followed by post-processing.

This process relates to lexicon lookup, string correction,

and re-evaluation of a word probability with respect to

syntax and context issues.

Most of this survey focuses on the algorithms that

were proposed in order to realize the recognition phase.

The other modules that usually constitute a word recog-

nition system are brieﬂy discussed in Sect. 2.

One can classify the ﬁeld of oﬄine cursive word recog-

nition into three categories according to the size and

nature of the lexicon involved: large; limited, but dy-

namic; small and speciﬁc. Small lexicons do not include

more than 100 words, while limited lexicons may go up

to 1000. Large lexicons refer to any lexicon size beyond

that. When a dynamic lexicon (in contrast with speciﬁc

or constant) is used, it means that the words that will

be relevant during a recognition task are not available

T. Steinherz et al.: Oﬄine cursive script word recognition – a survey 91

post

extraction processing

feature

recognition

split to pieces

segmentation

preprocessing

Fig. 1. Three alternative structures of a

word recognition system. Each alternative

diﬀers in the way it handles the segmen-

tation phase

during training because they belong to an unknown sub-

set of a much larger lexicon. This classiﬁcation coincides

with the diﬀerent modeling techniques associated with

the diﬀerent recognition methods. When small or limited

lexicons are involved a model-discriminant approach is

often used. Using this approach, each word is represented

by a unique model. Given an observation sequence O

one goes over all word models and ﬁnds the word W

as-

sociated with the model that has maximum a posteriori

probability Pr(W

). Using the Bayes rule

max

Pr(W

) = max

Pr(O

)Pr(W

)

Pr(O

)

Pr(W

) is usually assumed to have a uniform distribu-

tion due to the fact that statistics on the frequency ap-

pearance of each lexicon word is unavailable (unless spec-

iﬁed otherwise). Since Pr(O

) is also independent of W

the a posteriori probability converges to the score given

by the word model Pr(O

Practically, a system that is based on a model-dis-

criminant method cannot handle large lexicons. In this

case, usually a single model acts as a hypothesis gener-

ator that reacts to the input observations and produces

a ranked list of candidate words. Note that in this hy-

pothesis (path)-discriminant method, the generator pro-

duces spurious words (outside the lexicon), besides the

legal ones with their associated matching score. During

the process the lexicon is used to verify the existence

of complete words or preﬁxes of hypotheses. However,

models in both large or limited lexicon environments

have a lot in common. Since the lexicon available during

training is large in both cases, there cannot be a reliable

training if only some word samples will be used. There-

fore, models used in these scenarios are built from letter

models and hence might be called letter-oriented. This is

based on the assumption that a speciﬁc letter looks the

same when it appears in diﬀerent locations and neigh-

borhoods (regarding the other letters that surround it).

Empirically, this assumption is very solid if one selects

valuable features that are invariant in this manner, in

spite of the noise that characterizes cursive script due to

the ligatures between letters.

Clearly, the larger the lexicon is, the more ﬂexible the

application that utilizes it can be, but the recognition be-

comes more diﬃcult and the results get less satisfactory.

Furthermore, all methods that were proposed for larger

lexicons are applicable to smaller ones as well but are

less suitable, meaning they will probably do worse.

Besides the lexicon size and nature, a major issue

that the recognition method should relate to is the seg-

mentation problem. This issue is one of the most impor-

tant decisions one needs to make when starting to de-

sign a word recognition system. Ideally a perfect segmen-

tation algorithm would split a cursive word image into

complete characters. Then, using character recognition

techniques, the word would be recognized with high con-

ﬁdence. Unfortunately such a segmentation algorithm is

not available. Furthermore it may never be available due

to ambiguity in cursive words that was best expressed

by Sayre’s paradox [71]: ”To recognize a letter, one must

know where it starts and where it ends, to isolate a letter,

one must recognize it ﬁrst”. This conﬂict was reﬂected

in the ﬁeld of cursive word recognition methods result-

ing in a situation of having very few methods that rely

on pure segmentation followed by a separate recognition

phase. Three examples of methods of this kind ([88], [60]

and [66]) will be discussed in the relevant section.

Based on this last observation two approaches were

developed. One implements a segmentation-free

approach, i.e., there is no attempt to split the word im-

age into segments that relate to characters. Still, it is

possible that the image would be split into pieces in or-

der to produce a sequence of observations, i.e., symbols.

Instead of a letter-by-letter recognition, one tries to rec-

ognize the whole word as one entity, by searching for a

word with the most similar complete description to the

one obtained from the whole input image.

The other approach uses a segmentation-based meth-

od. Given a sequence of primitive segments derived from

a word image, and a group of possible words, we seek

for a top-level segmentation that maximizes the average

matching score between respective characters and seg-

ments. Top-level segmentation means that each segment

is a union of one or more consecutive primitive segments,

under the condition that each primitive segment appears

exactly one time. In other words a top-level segmenta-

tion is a subset of the primitive segmentation points.

The term primitive with respect to segments stands for

elementary segments that were created by the segmen-

tation algorithm that was used. We ﬁnd this approach

preserves the semantic meaning of symbols with respect

to characters, while the segmentation-free approach can

prevent errors caused by unsuccessful pre-segmentation.

In b oth approaches many methods use dynamic-pro-

gramming tools in order to ﬁnd the best interpretation.

Optimization problems often use dynamic-programming

techniques when the optimal solution is a combination of

92 T. Steinherz et al.: Oﬄine cursive script word recognition – a survey

1st letter

sub-HMM

. . .

sub-HMM

. . .

sub-HMM

ith letter last letter

Fig. 2. A single word HMM, built by concatenating the rel-

evant letter sub-HMMs

sub-HMM

. . .

sub-HMM

. . .

sub-HMM

’a’ ’m’ ’z’

Fig. 3. A single HMM that can handle a large lexicon. Each

one of the 26 letters is associated with a sub-HMM that is

connected to all the others. The resulting model gains a clique

topology in which diﬀerent paths represent diﬀerent word

interpretations

optimal solutions to partial problems. In a word recog-

nition case the optimal interpretation of a word image

may be constructed by concatenating the optimal in-

terpretations to disjoint parts of the word image. One

should refer to [3] and [18] for further explanations re-

garding dynamic programming. A very important appli-

cation that is also commonly used by methods of both

approaches is a hidden Markov Model (HMM). Some ﬁnd

the HMMs to be a particular case of the more general

dynamic-programming ﬁeld.

When small or limited lexicons are used (in segmen-

tation-free or segmentation-based approaches), each

word is associated with a separate HMM that is a con-

catenation of the relevant letter sub-HMMs as illustrated

in Fig. 2.

When large lexicons are used, a single HMM in which

diﬀerent paths (state sequences) represent diﬀerent word

interpretations is used. Such HMMs can be described as

a graph with a clique topology, where each node repre-

sents a letter sub-HMM and the arcs represent transi-

tions between them (Fig. 3). Section A brieﬂy reviews

HMMs and some observations to do with their role in

word recognition.

In this paper we have decided to use the typ e of the

segmentation scheme as the major criterion for classify-

ing the recognition method used. As many recognition

methods can be easily adapted to handle both limited

and large lexicons, the size and nature of the lexicon

is not used as a major classiﬁcation criterion. However,

the relationship between each approach and the various

lexicon size and nature is discussed.

We take perception-oriented methods as forming a

diﬀerent/separate category. These methods diﬀer from

both the segmentation-free and segmentation-based ori-

ented methods. The common principle to the methods

that belong to this class is ﬁrst to identify some char-

acters and then continue with trial and error techniques

regarding the gaps that were left – using the provided

lexicon for help. These methods attempt to recognize in-

dividual characters that are present in the word image,

and do not try to recognize the word as a whole. The

implementation of the recognition phase is not segment-

oriented and it does not follow the regular scheme of a

left to right search.

For practical reasons there is a common assumption

that an input word is not necessarily cursive. It is likely

that there will be a mixture of cursive, hand printed, a

mixture of lower case and upper case, etc. Therefore it

is common to ﬁnd methods that propose extra model-

ing for more than one style. This could be achieved by

doubling each model and training each copy to handle a

diﬀerent writing style, or by combining redundant sub-

models parallel to the existing ones in parts of the word

that may be written in another style. Since this treat-

ment can be applied to any method, we do not mention

speciﬁcally those who have decided to use it.

We believe that surveys are important tools that help

to synchronize diﬀerent research eﬀorts around the world

and make use of the knowledge and previous experience

that was acquired by others. The reader might be inter-

ested in other relevant surveys involving online cursive

recognition [82], OCRs [42,84], and segmentation tech-

niques [52,8].

2 Stages in a word recognition system

In this section a global structure of a word recognition

system is described. The following structure is a union of

all common operations that usually appear in systems of

this kind. A typical complete word recognition process

consists of the following parts: preprocessing, a possi-

ble segmentation or fragmentation, feature extraction,

recognition, and post-processing. The core of a recog-

nition system is the algorithm that produces word in-

terpretation given a sequence of observations in either

one of the various word representation levels that will

be mentioned. This is the main issue of this survey and

it will be discussed widely in the next sections.

The preprocessing starting point depends on the en-

vironment in which the system is running. It may in-

clude external word segmentation (extraction) from a

multi-word neighborhood and other various document

processing techniques. Given a stand-alone word, a few

normalization operations are performed, among which

are:

– Skew correction – a rotation transformation that

brings the word orientation parallel to the horizontal;

– Slant correction – a shear transformation that at-

tempts to make all the vertical strokes erect;

– Smoothing – including all diﬀerent kinds of noise re-

duction;

– Scaling – invariance to size (used in rare cases only).

Another important procedure in a major part of the sys-

tems is reference line ﬁnding. This procedure is essential

for the feature extraction stage that comes next. Op-

tional contour or skeleton tracing are also parts of the

preprocessing phase.

In Fig. 4 one can observe some of the various prepro-

cessing algorithms that are commonly used. First, the

T. Steinherz et al.: Oﬄine cursive script word recognition – a survey 93

20 40 60 80 100 120 140 160 180 200

100

120

20 40 60 80 100 120 140 160 180 200

100

120

20 40 60 80 100 120 140 160 180 200

100

120

20 40 60 80 100 120 140 160 180 200

100

120

20 40 60 80 100 120 140 160 180 200

100

120

Fig. 4. Some of the various preprocessing algorithms that are commonly used. From left to right: the original image, then the

image after going through skew correction, slant correction, reference line ﬁnding, and eventually a thinning algorithm that

produces the word skeleton

original image is presented on the left, then the image

after going through skew correction, slant correction, ref-

erence line ﬁnding, and eventually a thinning algorithm

that produces the word skeleton. The algorithms were

executed in a raw, i.e., each algorithm was input the

image that resulted from its preceding algorithm.

Pre-segmentation is the process of word segmentation

into primitive segments. Many algorithms have been pro-

posed for this task and we recommend two surveys that

review this ﬁeld [52] and [8]. In the case of a segmenta-

tion-free method, this stage may be ignored or replaced

with splitting the image into sequential fragments that

do not attempt to match complete characters.

The role of the feature extraction stage is to retrieve

observations out of the word image. There are several

classes of features. Segmentation-free methods use either

raw features, which are pixel-wise like strokes, or global

symbolic features such as ascenders, descenders, loops,

etc. Segmentation-based methods look for local features

in the segment that is being evaluated. The most pop-

ular features in this case are the global features (ascen-

ders, descenders, loops) and local irregularities such as

X and T crossings, end points, sharp curvatures, etc.

Other, less symbolic alternatives are diﬀerent kinds of

pixel moments, distribution in the diﬀerent parts of the

segment, etc. A feature is assumed to be more reliable

if it is less sensitive to noise and to the variance in writ-

ing style, and if it has good ability to distinguish among

words or letters as required.

After the recognition module has ﬁnished running, some

methods use post-processing techniques to improve the

recognition results. Some of these operations such as lex-

icon lookup and string correction are mentioned in this

paper because they are built in during the recognition

task. However additional syntax and context are very

popular to bust or disqualify legal words according to

the circumstances. A special case appears when large

lexicons are involved. In this case a hypothesis genera-

tor outputs possible words that might not be present in

the lexicon. Therefore, one may ﬁnd it very diﬃcult to

search for an optimal solution under the constraint of

being legal, i.e., to appear in the given large lexicon. A

unique post-processing was developed for this purpose,

based on the minimum edit-distance that will be men-

tioned later on in other aspects of recognition. Given a

hypothesis that is the optimal interpretation of the input

word image, we ﬁnd the lexicon word that is the most

similar to the hypothesis. Similarity between words is

measured as the number of operations – insertion, dele-

tion, substitution – necessary to transform one word into

the other. From a diﬀerent point of view this process is

considered as correction of garbled words [62]. Note that

regular post-processing of this kind requires going over

all the words in the lexicon and performing non-trivial

calculation for each one of them. This puts limitations

on the size of the lexicon that is assumed to be large.

Therefore, a proper usage of this process is in combina-

tion with some kind of lexicon reduction.

The interested reader may refer to [80] for more de-

tails on a complete handwritten text recognition system.

3 Segmentation-free methods

In a segmentation-free method, one should ﬁnd the best

interpretation possible for an observation sequence de-

rived from the word image without performing a mean-

ingful segmentation ﬁrst. An observation sequence can

be classiﬁed into three categories according to the rep-

resentation level of the word it stands for. The ﬁrst cat-

egory relates to observations that are based on low-level

features taken directly from the word image. Such fea-

tures include smoothed traces (quantized/normalized

fragments) of the word contour, pieces of strokes between

anchor points, edges of a polygonal approximation of the

image skeleton, etc. The second category aggregates such

low-level features to serve as primitives. For example,

neighboring strokes can be merged into a smoothed pat-

tern, that will constitute a primitive. The main diﬀer-

ence between the current category and the former one

is in the nature of the relevant feature space – continu-

ous in contrast with discrete. The last category involves

methods that use even higher-level features of a word

image. The most popular features are the most irregu-

lar, i.e., holistic features that are hard to miss and are

invariant with respect to all the diﬀerent writing styles.

In this case holistic means global with respect to a whole

word resolution, meaning that features of this kind, such

as ascenders, descenders, loops, i dots, t strokes, etc., are

prominent even in an image of a complete word. These

features may be sub-classiﬁed according to size, location

or orientation. These features will be referred to as sym-

bols.

In this section we discuss diﬀerent algorithms that

were proposed for comparison between a pair of obser-

vation sequences. Most methods use either a minimum

edit-distance calculation based on dynamic-program-

ming, or a resemblance estimation provided by HMMs.

94 T. Steinherz et al.: Oﬄine cursive script word recognition – a survey

However, there are some specialized methods that in-

clude diﬀerent comparison procedures. Nevertheless, in

most cases, such systems can utilize the minimum edit-

distance as well. The top-level organization of this sec-

tion matches the various comparison approaches (spe-

cialized, minimum edit-distance, HMMs). Each subsec-

tion is further sorted according to the feature level of the

word image representation that was used. The discussion

in this section will be in view of the size and nature of

the lexicon that the application may use. Table 1 shows

the location of each method that will be mentioned later

with respect to the comparison technique and represen-

tation level.

3.1 Specialized methods

In what follows we will discuss methods in which an ob-

servation sequence is mapped to a space with a distance

metric on complete sequences. All these methods are of

model-discriminant nature. Since ﬁnding the most likely

word is similar to other categories of model-discriminant

– i.e., comparison between the input sequence and all

stored references, we focus on the unique details of each

of the methods.

1. Low level

Govindaraju et al., [36], represented a word image

as a sequence of strokes in the temporal dimension.

Temporal information, i.e., a complete linear order

among the strokes, is extracted by traversing the

strokes between consequent anchor points such as

peaks, valleys, intersection and end points. When an

intersection point is encountered, the traversal pro-

ceeds along the smoothest path, determined heuris-

tically based on orientation, trend and a gradient

smoothness criterion. Finally, a feature vector associ-

ated with the word image is obtained. No details were

speciﬁed regarding the matching algorithm. However,

the dynamic-programming or edit-distance seem to

be a possible go od choice in this scenario.

Gorsky [35] have used a holographic representation,

meaning that each sequence of consequent stroke

quanta that share approximately the same direction

(which will be referred to as fragments) is mapped

into a single point in a special parameter space. The

following features constitute the coordinates of a

point: global x-order (number of fragments along the

x-axis), local order of a fragment among the neigh-

boring fragments, and direction. These features are

insensitive to most of the distortions produced by

diﬀerent writing styles and may also overcome miss-

ing or additional stroke comp onents. The intensity

of a point depends on the length of the fragment it

originated from and bonuses for distinguishing prop-

erties. Both indices and values were discrete and nor-

malized, therefore the resulting holograph is a three-

dimensional matrix. A certain word prototype, i.e.,

model, is created by mapping a set of words written

by various people to a single holograph, i.e., matrix.

Comparison between an input word and a prototype

both represented in holographic form could be carried

out either by cross-correlation or by computing the

percentage of word fragments that are ”explained”

by the prototype.

2. Medium level

In [63] a Markov model with no hidden states was

used to model a word. The set of legal observations

derived from a word image is used to deﬁne the states

of the model. Experiments with two sets of obser-

vations – 8 strokes or 42 graphemes – were made.

Choosing the order of the Markov model was based

on statistical criteria, and was eventually found to

be 2. Therefore, each word model was trained and as-

signed transition probabilities considering all the pos-

sible triple combinations of observations. The a pri-

ori probability of a word was also taken into account.

Hence, the probability of a word Markov model M

given the observed sequence Q is evaluated as follows:

P (M

|Q)

P (Q|M

)P (M

)

P (Q)

P (M

)P (X

) ...

P (X

)P (X

) ...

...P(X

)P (X

n−2

n−1

)

...P(X

n−2

n−1

)

3. High level

Guillevic et al., [38] have extracted seven types of

features from an input word image. Each feature is

associated with its relative position in the image (in

percents). In the training process, each lexicon word

goes through the same process but the feature loca-

tions are relative to the characters they belong to.

Thus, in the recognition process, for every lexicon

word, we translate positions of features in the in-

put image, that were given in percents, to charac-

ter locations of the word being compared and rate

the matching. The distance between an input feature

vector and a lexicon word feature vector is computed

as the minimum shift in feature locations needed in

order to match between the two.

A mixture of segmentation-free principles and a seg-

mentation algorithm takes place in the methods pre-

sented by Madhvanath et al. [53–56]. The produced

segments are used only to deﬁne the location of a

feature inside a word image. In the ﬁrst case ([53]), a

concatenation of all features found results in a feature

vector used for comparison with references – using

Euclidean distance. A few supporting mechanisms

were supplemented; feature equivalence rules enabled

one to deﬁne interchangeable sets of features. Macro-

features, i.e., a combination of features, lead to the

recognition of a sp eciﬁc phrase with very high conﬁ-

dence. In addition a new feature – point of return –

is included. However, this feature seems less robust

and more author-dependent. On the other hand if a

small lexicon of the order of ten words was engaged,

the ascender and descender features suﬃce to distin-

guish lexicon entries. In a more sophisticated method,

all features are sorted according to their type and

sub-sorted by their position from left to right [54–

Offline cursive script word recognition : a survey

Figures

Citations

An overview of character recognition focused on off-line handwriting

Offline recognition of unconstrained handwritten texts using HMMs and statistical language models

Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models

A survey on off-line Cursive Word recognition

Markov models for offline handwriting recognition: a survey

References

A tutorial on hidden Markov models and selected applications in speech recognition

Dynamic Programming

Binary codes capable of correcting deletions, insertions and reversals

Binary codes capable of correcting deletions, insertions, and reversals

Fundamentals of speech recognition

Related Papers (5)

Online and off-line handwriting recognition: a comprehensive survey

Off-line cursive script word recognition

A tutorial on hidden Markov models and selected applications in speech recognition

The IAM-database: an English sentence database for offline handwriting recognition

The state of the art in online handwriting recognition

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Offline cursive script word recognition – a survey" ?

Q2. What is the basic concept of a complete word recognition process?

Q3. How can the dynamic-programming phase be restored?

Q4. What is the term primitive with respect to segments?

Q5. Why are there no backwards transitions in a sub-HMM?

Q6. Why are HMMs more easily trained and adapted than dynamic programming methods?

Q7. Why did HMMs gain popularity in classification problems?

Q8. How is the symbolic description chain produced?

Q9. What is the process used to validate or reject active words?

Q10. What is the minimum edit distance between two chains of symbols?

Q11. What is the way to find the match between blocks of primitive segments?

Q12. What is the main criterion for classifying the recognition method used?

Q13. What is the bipartite matching method?