What are the contributions mentioned in the paper "A survey of methods and strategies in character segmentation" ?

This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages.

What have the authors stated for future works in "A survey of methods and strategies in character segmentation" ?

The authors apologize to researchers whose important contributions may have been overlooked.

What was used in the first stage of a hybrid recognition system?

Upper contour analysis was also used in [47] for a pre-segmentation algorithm that served as part of the second stage of a hybrid recognition system.

How can the authors perform the segmentation task at a low cost?

By testing their adjacency relationships to perform merging, or their size and aspect ratios to trigger splitting mechanisms, much of the segmentation task can be accurately performed at a low cost in computation.

What was the common method of representing words and letters?

In [67], words and letters were represented by means of tree dictionaries: possible words were described by a letter tree (also called a "trie") and letters were described by a feature tree.

What is the common method of segmenting an image?

Splitting of an image classified as connected is then accomplished by finding characteristic landmarks of the image that are likely to be segmentation points, rejecting those that appear to be situated within a character, and implementing a suitable cutting path.

What is the need for a term such as "Dissection"?

in many current studies, as the authors shall see, segmentation is a complex process, and there is a need for a term such as "dissection" to distinguish the image-cutting subprocess from the overall segmentation, which may use contextual knowledge and/or character shape description.

What was the effect of the technique on the output images?

The authors noted that the technique was heavily dependent on the quality of the input images, and tended to fail on both very heavy or very light printing.

What is the reason why segmentation is so rarely mentioned in pre-70s literature?

The twin facts that early OCR development dealt with constrained inputs, while research wasmainly concerned with representation and classification of individual symbols, explains why segmentation is so rarely mentioned in pre-70s literature.

How can the system make use of this knowledge?

As the system knows in advance what it is searching for, it can make use of high-level contextual knowledge to improve recognition, even at low-level stages.

(Open Access) A survey of methods and strategies in character segmentation (1996) | R.G. Casey

A SURVEY OF METHODS AND STRATEGIES IN

CHARACTER SEGMENTATION

Richard G. Casey † and Eric Lecolinet ‡

† ENST Paris and IBM Almaden Research Center

‡ ENST Paris

ABSTRACT

Character segmentation has long been a critical area of the OCR process. The higher

recognition rates for isolated characters vs. those obtained for words and connected

character strings well illustrate this fact. A good part of recent progress in reading

unconstrained printed and written text may be ascribed to more insightful handling of

segmentation.

This paper provides a review of these advances. The aim is to provide an appreci-

ation for the range of techniques that have been developed, rather than to simply list

sources. Segmentation methods are listed under four main headings. What may be

termed the "classical" approach consists of methods that partition the input image into

subimages, which are then classiﬁed. The operation of attempting to decompose the

image into classiﬁable units is called "dissection". The second class of methods avoids

dissection, and segments the image either explicitly, by classiﬁcation of prespeciﬁed

windows, or implicitly by classiﬁcation of subsets of spatial features collected from the

image as a whole. The third strategy is a hybrid of the ﬁrst two, employing dissection

together with recombination rules to deﬁne potential segments, but using classiﬁcation

to select from the range of admissible segmentation possibilities offered by these

subimages. Finally, holistic approaches that avoid segmentation by recognizing entire

character strings as units are described.

- 2 -

KEYWORDS

Optical character recognition, character segmentation, survey, holistic recognition, Hid-

den Markov Models, graphemes, contextual methods, recognition-based segmentation

- 3 -

1. Introduction

1.1. The role of segmentation in recognition processing

Character segmentation is an operation that seeks to decompose an image of a sequence of charac-

ters into subimages of individual symbols. It is one of the decision processes in a system for optical

character recognition (OCR). Its decision, that a pattern isolated from the image is that of a character (or

some other identiﬁable unit), can be right or wrong. It is wrong sufﬁciently often to make a major contri-

bution to the error rate of the system.

In what may be called the "classical" approach to OCR, Fig. 1, segmentation is the initial step in a

three-step procedure:

Given a starting point in a document image:

1. Find the next character image.

2. Extract distinguishing attributes of the character image.

3. Find the member of a given symbol set whose attributes best match those of the input, and output

its identity.

This sequence is repeated until no additional character images are found.

An implementation of step 1, the segmentation step, requires answering a simply-posed question:

"What constitutes a character?" The many researchers and developers who have tried to provide an algo-

rithmic answer to this question ﬁnd themselves in a Catch-22 situation. A character is a pattern that

resembles one of the symbols the system is designed to recognize. But to determine such a resemblance

the pattern must be segmented from the document image. Each stage depends on the other, and in com-

plex cases it is paradoxical to seek a pattern that will match a member of the system’s recognition alpha-

bet of symbols without incorporating detailed knowledge of the structure of those symbols into the pro-

cess.

Furthermore, the segmentation decision is not a local decision, independent of previous and subse-

quent decisions. Producing a good match to a library symbol is necessary, but not sufﬁcient, for reliable

recognition. That is, a poor match on a later pattern can cast doubt on the correctness of the current

segmentation/recognition result. Even a series of satisfactory pattern matches can be judged incorrect if

contextual requirements on the system output are not satisﬁed. For example, the letter sequence "cl" can

often closely resemble a "d", but usually such a choice will not constitute a contextually valid result.

Thus it is seen that the segmentation decision is interdependent with local decisions regarding shape

similarity, and with global decisions regarding contextual acceptability. This sentence summarizes the

reﬁnement of character segmentation processes in the past 40 years or so. Initially, designers sought to

perform segmentation as per the "classical" sequence listed above. As faster, more powerful electronic

- 4 -

circuitry has encouraged the application of OCR to more complex documents, designers have realized

that step 1 can not be divorced from the other facets of the recognition process.

In fact, researchers have been aware of the limitations of the classical approach for many years.

Researchers in the 1960s and 1970s observed that segmentation caused more errors than shape distortions

in reading unconstrained characters, whether hand- or machine-printed. The problem was often masked

in experimental work by the use of databases of well-segmented patterns, or by scanning character strings

printed with extra spacing. In commercial applications stringent requirements for document preparation

were imposed. By the beginning of the 1980’s workers were beginning to encourage renewed research

interest [73] to permit extension of OCR to less constrained documents.

The problems of segmentation persist today. The well-known tests of commercial printed text OCR

systems by University of Nevada, Las Vegas [64][65] consistently ascribe a high proportion of errors to

segmentation. Even when perfect patterns, the bitmapped characters that are input to digital printers,

were recognized, commercial systems averaged 0.5% spacing errors. This is essentially a segmentation

error by a process that attempts to isolate a word subimage. The article [6] emphatically illustrates the

woes of current machine-print recognition systems as segmentation difﬁculties increase (see Fig. 2). The

degradation in performance of NIST tests of handwriting recognition on segmented [86] and unseg-

mented [88] images underscore the continuing need for reﬁnement and fresh approaches in this area. On

the positive side of the ledger, the study [29] shows the dramatic improvements that can be obtained

when a thoroughgoing segmentation scheme replaces one of prosaic design

Some authors previously have surveyed segmentation, often as part of a more comprehensive work,

e.g., cursive recognition [36] [19] [20] [55] [58] [81], or document analysis [23] [29]. In the present

paper we present a survey whose focus is character segmentation, and which attempts to provide broad

coverage of the topic.

1.2 Organization of methods

A major problem in discussing segmentation is how to classify methods. Tappert et al [81], for

example, speaks of "external" vs. "internal" segmentation, depending on whether recognition is required

in the process. Dunn and Wang [20] use "straight segmentation" and "segmentation-recognition" for a

similar dichotomization.

A somewhat different point of view is proposed in this paper. The division according to use or

non-use of recognition in the process fails to make clear the fundamental distinctions among present-day

approaches. For example, it is not uncommon in text recognition to use a spelling corrector as a post-

processor. This stage may propose the substitution of two letters for a single letter output by the

classiﬁer. This is in effect a use of recognition to resegment the subimage involved. However, the process

represents only a trivial advance on traditional methods that segment independent of recognition.

- 5 -

In this paper the distinction between methods is based on how segmentation and classiﬁcation

interact in the overall process. In the example just cited, segmentation is done in two stages, one before

and one after image classiﬁcation. Basically an unacceptable recognition result is re-examined and

modiﬁed by a (implied) resegmentation. This is a rather "loose" coupling of segmentation and

classiﬁcation.

A more profound interaction between the two aspects of recognition occurs when a classiﬁer is

invoked to select the segments from a set of possibilities. In this family of approaches segmentation and

classiﬁcation are integrated. To some observers it even appears that the classiﬁer performs segmentation

since, conceptually at least, it could select the desired segments by exhaustive evaluation of all possible

sets of subimages of the input image.

After reviewing available literature, we have concluded that there are three "pure" strategies for

segmentation, plus numerous hybrid approaches that are weighted combinations of these three. The ele-

mental strategies are:

1. the classical approach, in which segments are identiﬁed based on "character-like" properties. This

process of cutting up the image into meaningful components is given a special name, "dissection",

in discussions below.

2. recognition-based segmentation, in which the system searches the image for components that match

classes in its alphabet.

3. holistic methods, in which the system seeks to recognize words as a whole, thus avoiding the need

to segment into characters.

In strategy (1) the criterion for good segmentation is the agreement of general properties of the segments

obtained with those expected for valid characters. Examples of such properties are height, width, separa-

tion from neighboring components, disposition along a baseline, etc. In method (2) the criterion is recog-

nition conﬁdence, perhaps including syntactic or semantic correctness of the overall result. Holistic

methods (3) in essence revert to the classical approach with words as the alphabet to be read. The reader

interested to obtain an early illustration of these basic techniques may glance ahead to Fig. 6 for exam-

ples of dissection processes, Fig. 13 for a recognition-based strategy, and Fig. 16 for a holistic approach.

Although examples of these basic strategies are offered below, much of the literature reviewed for

this survey reports a blend of methods, using combinations of dissection, recognition searching, and word

characteristics. Thus, although the paper necessarily has a discrete organization, the situation is perhaps

better conceived as in Fig. 3. Here the three fundamental strategies occupy orthogonal axes: hybrid

methods can be represented as weighted combinations of these lying at points in the intervening space.

There is a continuous space of segmentation strategies rather than a discrete set of classes with well-

deﬁned boundaries. Of course, such a space exists only conceptually; it is not meaningful to assign pre-

cise weights to the elements of a particular combination.

A survey of methods and strategies in character segmentation

Citations

Online and off-line handwriting recognition: a comprehensive survey

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

Twenty years of document image analysis in PAMI

An overview of character recognition focused on off-line handwriting

Word spotting in the wild

References

The state of the art in online handwriting recognition

Off-line cursive script word recognition

An off-line cursive handwriting recognition system

Segmentation methods for character recognition: from segmentation to document structure analysis

Machine recognition of handwritten words: A project report

Related Papers (5)

Feature extraction methods for character recognition--a survey

Online and off-line handwriting recognition: a comprehensive survey

A threshold selection method from gray level histograms

An overview of character recognition focused on off-line handwriting

Historical review of OCR research and development

Frequently Asked Questions (10)

Q1. What are the contributions mentioned in the paper "A survey of methods and strategies in character segmentation" ?

Q2. What have the authors stated for future works in "A survey of methods and strategies in character segmentation" ?

Q3. What was used in the first stage of a hybrid recognition system?

Q4. How can the authors perform the segmentation task at a low cost?

Q5. What was the common method of representing words and letters?

Q6. What is the common method of segmenting an image?

Q7. What is the need for a term such as "Dissection"?

Q8. What was the effect of the technique on the output images?

Q9. What is the reason why segmentation is so rarely mentioned in pre-70s literature?

Q10. How can the system make use of this knowledge?