scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A survey of methods and strategies in character segmentation

01 Jul 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 18, Iss: 7, pp 690-706
TL;DR: H holistic approaches that avoid segmentation by recognizing entire character strings as units are described, including methods that partition the input image into subimages, which are then classified.
Abstract: Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the "classical" approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called "dissection." The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described.

Summary (9 min read)

Jump to: [Introduction][An implementation of step 1, the segmentation step, requires answering a simply-posed question:][In fact, researchers have been aware of the limitations of the classical approach for many years.][A somewhat different point of view is proposed in this paper. The division according to use or][A more profound interaction between the two aspects of recognition occurs when a classifier is][In strategy (1) the criterion for good segmentation is the agreement of general properties of the segments][2. Dissection techniques for segmentation][In the late 1950s and early 1960s, during the earliest attempts to automate character recognition,][In machine printing, vertical whitespace often serves to separate successive characters. This pro-][In many machine print applications involving limited font sets each character occupies a block of][In the SSA reader segmentation was accomplished in two scans of a print line by a flying-spot][3. Detection of end-of-character.][A peak-to-valley function was designed to improve on this method in [59]. A minimum of the pro-][A different kind of prefiltering was used in [57] to sharpen discrimination in the vicinity of holes][An experimental comparison of character segmentation by projection analysis vs. segmentation by][In [85] an algorithm was constructed based on a categorization of the vertexes of stroke elements at][A distance transform is applied to the input image in [31] in order to compute the splitting path.][A shortest-path method investigated in [84] produces an "optimum" segmentation path using][In recognition of cursive writing it is common to analyze the image of a character string in order to][2.2 Dissection with contextual postprocessing: graphemes][An alternative approach still based on dissection is to divide the input image into subimages that are][As in [72], the grapheme concept has been applied mainly to cursive script by later researchers.][In typical systems these problems are treated at a later contextual stage that jointly treats both segmenta-][A similar presegmenter was presented in [42]. In this case analysis of the upper contour, and a set][A technique for segmenting handwritten strings of variable length, was described in [27]. It][3. Recognition-based segmentation][In recognition-based techniques, recognition can be performed by following either a serial or a][As easy to state as these principles are, they were a long time in developing. Probably the earliest][A technique combining dynamic programming and neural net recognition was proposed in [10].][A Hidden Markov Model (often abbreviated HMM) models variations in printing or cursive writing as an][A method stemming from concepts used in machine vision for recognition of occluded objects is][A method that recognizes word feature graphs is presented in [71]. This system attempts to match][A different approach uses the concept of regularities and singularities [77]. In this system, a stroke][A top-down directed word verification method called "backward matching" (see Fig. 14) is pro-][4. Mixed strategies: "Oversegmenting"][In this section intermediate approaches, essentially hybrids of the first two, are discussed. This fam-][It is also possible to carry out an oversegmenting procedure sequentially by evaluating trial separa-][A holistic process recognizes an entire word as a unit. A major drawback of this class of methods][In the machine printed text area characters are regular so that feature representations are stable, and] and [6. Concluding remarks]

Introduction

  • Character segmentation has long been a critical area of the OCR process.
  • This paper provides a review of these advances.
  • The aim is to provide an appreci- ation for the range of techniques that have been developed, rather than to simply list sources.
  • The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages.
  • Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described.

An implementation of step 1, the segmentation step, requires answering a simply-posed question:

  • The many researchers and developers who have tried to provide an algorithmic answer to this question find themselves in a Catch-22 situation.
  • But to determine such a resemblance the pattern must be segmented from the document image.
  • Each stage depends on the other, and in complex cases it is paradoxical to seek a pattern that will match a member of the system’s recognition alphabet of symbols without incorporating detailed knowledge of the structure of those symbols into the process.
  • Furthermore, the segmentation decision is not a local decision, independent of previous and subse- quent decisions.
  • Even a series of satisfactory pattern matches can be judged incorrect if contextual requirements on the system output are not satisfied.

In fact, researchers have been aware of the limitations of the classical approach for many years.

  • Researchers in the 1960s and 1970s observed that segmentation caused more errors than shape distortions in reading unconstrained characters, whether hand- or machine-printed.
  • The problem was often masked in experimental work by the use of databases of well-segmented patterns, or by scanning character strings printed with extra spacing.
  • In commercial applications stringent requirements for document preparation were imposed.
  • The well-known tests of commercial printed text OCR systems by University of Nevada, Las Vegas [64][65] consistently ascribe a high proportion of errors to segmentation.

A somewhat different point of view is proposed in this paper. The division according to use or

  • Non-use of recognition in the process fails to make clear the fundamental distinctions among present-day approaches.
  • This stage may propose the substitution of two letters for a single letter output by the classifier.
  • The process represents only a trivial advance on traditional methods that segment independent of recognition.
  • In the example just cited, segmentation is done in two stages, one before and one after image classification.

A more profound interaction between the two aspects of recognition occurs when a classifier is

  • In this family of approaches segmentation and classification are integrated.
  • To some observers it even appears that the classifier performs segmentation since, conceptually at least, it could select the desired segments by exhaustive evaluation of all possible sets of subimages of the input image.
  • After reviewing available literature, the authors have concluded that there are three "pure" strategies for segmentation, plus numerous hybrid approaches that are weighted combinations of these three.

In strategy (1) the criterion for good segmentation is the agreement of general properties of the segments

  • Obtained with those expected for valid characters.
  • Examples of such properties are height, width, separation from neighboring components, disposition along a baseline, etc. Sections 2 and 3 describe contrasting strategies: one in which segmentation is based on image features, and a second in which classification is used to select from segmentation candidates generated without regard to image content.
  • They can be applied to the recognition of any vocabulary.
  • Markov models appear frequently in the literature, justifying further subclassification of holistic and recognition-based strategies, as indicated in Fig.

2. Dissection techniques for segmentation

  • By dissection is meant the decomposition of the image into a sequence of subimages using general features (as, for example, in Fig. 5).
  • This is opposed to later methods that divide the image into subimages independent of content.
  • The two terms are equivalent, also known as the entire segmentation process.
  • In many current studies, as the authors shall see, segmentation is a complex process, and there is a need for a term such as "dissection" to distinguish the image-cutting subprocess from the overall segmentation, which may use contextual knowledge and/or character shape description.

In the late 1950s and early 1960s, during the earliest attempts to automate character recognition,

  • Research was focused on the identification of isolated images.
  • Handprinted characters were printed in boxes that were invisible to the scanner, or else the writer was constrained in ways that aided both segmentation and recognition.
  • A very thorough survey of status in 1961 [79] gives only implicit acknowledgment of the existence of the segmentation problem.
  • Segmentation is not shown at all in the master diagram constructed to accompany discussion of recognition stages.
  • In the several pages devoted to preprocessing (mainly thresholding) the function is indicated only peripherally as part of the operation of registering a character image.

In machine printing, vertical whitespace often serves to separate successive characters. This pro-

  • In applications such as billing, where document layout is specifically designed for OCR, additional spacing is built into the fonts used.
  • The notion of detecting the vertical white space between successive characters has naturally been an important concept in dissecting images of machine print or handprint.

In many machine print applications involving limited font sets each character occupies a block of

  • The pitch, or number of characters per unit of horizontal distance, provides a basis for should be approximately equally spaced at the distance corresponding to the pitch.
  • This provides a global basis for segmentation, since separation points are not independent.
  • Segmentation points not lying near these boundaries can be rejected as probably due to broken characters.
  • One well-documented early commercial machine that dealt with a relatively unconstrained environ- ment was the reader IBM installed at the U. S. Social Security Administration in 1965 [38].
  • There was no way for SSA to impose constraints on the printing process.

In the SSA reader segmentation was accomplished in two scans of a print line by a flying-spot

  • On the initial scan, from left to right, the character pitch distance D was estimated by analog circuitry.
  • On the return scan, right to left, the actual segmentation decisions were made using parameter D. The principal rule applied was that a double white column triggered a segmentation boundary.
  • If none was found within distance D, then segmentation was forced.
  • Hoffman and McCullough [43] generalized this process and gave it a more formal framework (see Fig. 5).
  • In their formulation the segmentation stage consisted of three steps:.

3. Detection of end-of-character.

  • Sectioning, step 2, was the critical step.
  • An estimate of character pitch was a parameter of the process, although in experiments it was specified for 12-character per inch typewriting.
  • The vertical projection (also called the "vertical histogram") of a print line, Fig. 6a, consists of a simple running count of the black pixels in each column.
  • In [66], in segmenting Kanji handprinted addresses, columns where the projection fell below a predefined threshold were candidates for splitting the image.
  • In [1] the projection was first obtained, then the ratio of second derivative of this curve to its height was used as a criterion for choosing separating columns (see Fig. 6b).

A peak-to-valley function was designed to improve on this method in [59]. A minimum of the pro-

  • Jection is located and the projection value noted.
  • The sum of the differences between this minimum value and the peaks on each side is calculated.
  • The ratio of the sum to the minimum value itself (plus 1, presumably to avoid division by zero) is the discriminator used to select segmentation boundaries.
  • This ratio exhibits a preference for low valley with high peaks on both sides.

A different kind of prefiltering was used in [57] to sharpen discrimination in the vicinity of holes

  • In addition to the projection itself, the difference between upper and lower profiles of the pattern was used in a formula analogous to that of [1].
  • Here the "upper profile" is a function giving the maximum y-value of the black pixels for each column in the pattern array.
  • The lower profile is defined similarly on the minimum y-value in each column.

An experimental comparison of character segmentation by projection analysis vs. segmentation by

  • Both segmenters were tested on a large data base (272,870 handprinted digits) using the same follow-on classifier.
  • An algorithm may assume that some but not all input characters can be connected.
  • One of the earliest studies to use contour analysis for the detection of likely segmentation points was reported in [69].
  • This scheme was refined in a later technique [51][52], which determines not only "how" to segment characters but also "when" to segment them.
  • Then, several possible segmentation paths are generated.

In [85] an algorithm was constructed based on a categorization of the vertexes of stroke elements at

  • Segmentation consists in detecting the most likely contact point among the various vertexes proposed by analysis of the image, and performing a cut similar in concept to that illustrated in Fig.
  • Methods for defining splitting paths have been examined in a number of other studies as well.
  • The algorithm of [17] performs background analysis to extract the face-up and face-down valleys, strokes and loop regions of component images.
  • A "marriage score matrix" is then used to decide which pair of valleys is the most appropriate.
  • The separating path is deduced by combining three lines respectively segmenting the upper valley, the stroke and the lower valley.

A distance transform is applied to the input image in [31] in order to compute the splitting path.

  • The objective is to find a path that stays as far from character strokes as possible without excessive curvature.
  • This is achieved by employing the distance transform as a cost function, and using.

A shortest-path method investigated in [84] produces an "optimum" segmentation path using

  • The path is computed iteratively by considering successive rows in the image.
  • A one dimensional cost array contains the accumulated cost of a path emanating from a pre-determined starting point at the top of the image to each column of the current row.
  • Several tries can be made from different starting points.
  • The selection of the best solution is based on classification confidence (which is obtained using a neural network).
  • Redundant shortest-path calculations are avoided in order to improve segmentation speed.

In recognition of cursive writing it is common to analyze the image of a character string in order to

  • This permits the ready detection of ascenders and descenders, features that can serve as "landmarks" for segmentation of the image.
  • This technique was applied to online recognition in pioneering work by Frischkopf and Harmon [36].
  • Using an estimate of character width, they dissected the image into patterns centered about the landmarks, and divided remaining image components on width alone.
  • This scheme does not succeed with letters such as "u", "n", "m", which do not contain landmarks.
  • The basic method for detecting ascenders and descenders has been adopted by many other researchers in later years.

2.2 Dissection with contextual postprocessing: graphemes

  • The system seeks to correct such errors by minimizing an edit distance between recognition output and words in a given lexicon.
  • Thus it does not directly evaluate alternative segmentation hypotheses, it merely tries to correct poorly made ones.
  • A non-Markovian system reported in [12] uses a spell-checker to correct repeatedly-made merge and split errors in a complete text, rather than in single words as above.

An alternative approach still based on dissection is to divide the input image into subimages that are

  • The dissection is performed at stable image features that may occur within or between characters, as for example, a sharp downward indentation can occur in the center of an "M" or at the connection of two touching characters.
  • A contextual mapping function from grapheme classes to symbols can then complete the recognition process.
  • The dissection step of this process is sometimes called "pre-segmentation" or, when the intent is to leave no composite characters, "over-segmentation".
  • The classes recognized by the classifier did not correspond to letters, but to specific shapes that could be reliably segmented (typically combinations of letters, but also portions of letters).

As in [72], the grapheme concept has been applied mainly to cursive script by later researchers.

  • Techniques for dissecting cursive script are based on heuristic rules derived from visual observation.
  • There is no "magic" rule and it is not feasible to segment all handwritten words into perfectly separated characters in the absence of recognition.
  • In practice, this means that a single character decomposes into at most two graphemes, and conversely, a single grapheme represents at most a two- or three-character sequence.
  • The line segments that form connections between characters in cursive script are known as "liga- tures".
  • Thus some dissection techniques for script seek "lower ligatures", connections near the baseline that link most lowercase characters.

In typical systems these problems are treated at a later contextual stage that jointly treats both segmenta-

  • Such processing is included in the system since cursive writing is often ambiguous without the help of lexical context.
  • The quality of segmentation still remains very much dependent on the effectiveness of the dissection scheme that produces the graphemes.
  • Dissection techniques based on the principle of detecting ligatures were developed in [22], [61] and [53].
  • The last study was based on a dual approach: — the detection of possible pre-segmentation zones, — the use of a "pre-recognition" algorithm, whose aim was not to recognize characters, but to evaluate whether a subimage defined by the pre-segmenter was likely to constitute a valid character.
  • These paths were chosen to respect several heuristic rules expressing continuity and connectivity constraints.

A similar presegmenter was presented in [42]. In this case analysis of the upper contour, and a set

  • Of rules based on contour direction, closure detection, and zone location were used.
  • Upper contour analysis was also used in [47] for a pre-segmentation algorithm that served as part of the second stage of a hybrid recognition system.
  • The first stage of this system also implemented a form of the hit and deflect strategy previously mentioned.

A technique for segmenting handwritten strings of variable length, was described in [27]. It

  • Employs upper and lower contour analysis and a splitting technique based on the hit and deflect strategy.
  • In this study presegmentation points were chosen in the neighborhood of these minima and emergency segmentation performed between points that were highly separated.
  • The method requires handwriting to be previously deslanted in order to ensure proper separation.

3. Recognition-based segmentation

  • Methods considered here also segment words into individual units (which are usually letters).
  • How- ever, the principle of operation is quite different.
  • Rather, the image is divided systematically into many overlapping pieces without regard to content.
  • Systems using such a principle perform "recognition-based" segmentation: letter segmentation is a byproduct of letter recognition, which may itself be driven by contextual analysis.
  • Thus the possibly misleading connotations of "segmentation-free" will be avoided in their own terminology.

In recognition-based techniques, recognition can be performed by following either a serial or a

  • In the first case, e.g. [11], recognition is done iteratively in a left-to-right scan of words, searching for a "satisfactory" recognition result.
  • The parallel method [48] proceeds in a more global way.
  • It generates a lattice of all (or many) possible feature-to-letter combinations.
  • The final decision is found by choosing an optimal path through the lattice.
  • The windowing process can operate directly on the image pixels, or it can be applied in the form of weightings or groupings of positional feature measurements made on the images.

As easy to state as these principles are, they were a long time in developing. Probably the earliest

  • Theoretical and experimental application of the concept is reported by Kovalevsky [48].
  • He developed a solution under the assumption that segmentation occurred along columns.
  • Kovalevsky’s model (Fig. 10) assumes that the probability of observing a given version of a proto- type character is a spherically symmetric function of the difference between the two images.
  • Then the optimal objective function for segmentation is the sum of the squared distances between segmented images and matching prototypes.
  • This process was implemented in hardware to produce a working OCR system.

A technique combining dynamic programming and neural net recognition was proposed in [10].

  • This technique, called "Shortest Path Segmentation", selects the optimal consistent combination of cuts from a predefined set of windows.
  • Given this set of candidate cuts, all possible "legal" segments are constructed by combination.
  • The paths of this graph represent all the legal segmentations of the word.
  • Each node of the graph is then assigned a "distance" obtained by the neural net recognizer.
  • The method of "selective attention" [30] takes neural networks even further in the handling of seg- mentation problems.

A Hidden Markov Model (often abbreviated HMM) models variations in printing or cursive writing as an

  • Underlying probabilistic structure which is not directly observable.
  • This structure consists of a set of states plus transition probabilities between states.
  • In addition, the observations that the system makes on an image are represented as random variables whose distribution depends on the state.
  • These observations constitute a sequential feature representation of the input image.
  • The survey [34] provides an introduction to its use in recognition applications.

A method stemming from concepts used in machine vision for recognition of occluded objects is

  • Here various features and their positions of occurrence are recorded for an image.
  • The positions are quantized into bins such that the evidence for each character indicated in a bin can be summed.
  • These scores are subjected to contextual processing using a predefined lexicon in order to recognize words.

A method that recognizes word feature graphs is presented in [71]. This system attempts to match

  • Subgraphs of features with predefined character prototypes.
  • Dynamic programming was used with a warping function that permitted the process to skip unnecessary features.
  • In the system proposed in [5] a sequence of structural features (like x- and y-extrema, curvature signs, cusps, crossings, penlifts, and closures) was extracted from the word to generate all the legible sequences of letters.
  • Then, the "aspect" of the word (which was deduced from ascender and descender detection) was taken into account to chose the best solution(s) among the list of generated words.
  • The letters were predicted by finding in the letter tree the paths compatible with the extracted features and were verified by checking their compatibility with the word dictionary.

A different approach uses the concept of regularities and singularities [77]. In this system, a stroke

  • Graph representing the word is obtained after skeletonization.
  • The "singular parts", which are supposed to convey most of the information, were deduced by eliminating "regular part" of the word (the sinusoidlike path joining all cursive ligatures).
  • The most robust features and characters (the "anchors") were then detected from a description chain derived from these singular parts and dynamic matching was used for analyzing the remaining parts.

A top-down directed word verification method called "backward matching" (see Fig. 14) is pro-

  • In cursive word recognition, all letters do not have the same discriminating power, and some of them are easier to recognize.
  • So, in this method, recognition is not performed in a left-to-right scan, but follows a "meaningful" order which depends on the visual and lexical significance of the letters.
  • Moreover, this order also follows an edge-toward-center movement, as in human vision [82].
  • Matching between symbolic and physical descriptions can be performed at the letter, feature and even sub-feature levels.
  • This system is an attempt to provide a general framework allowing efficient cooperation between low-level and high-level recognition processes.

4. Mixed strategies: "Oversegmenting"

  • Two radically different segmentation strategies have been considered to this point.
  • One (Section 2) attempts to choose the correct segmentation points (at least for generating graphemes) by a general analysis of image features.

In this section intermediate approaches, essentially hybrids of the first two, are discussed. This fam-

  • Ily of methods also uses presegmenting, with requirements that are not as strong as in the grapheme approach.
  • Here a great deal of effort was expended in analyzing the shapes of pairs of touching digits in the neighborhood of contact, leading to algorithms for determining likely separation boundaries.
  • Each candidate segmentation was tested separately by classification, and the split giving the highest recognition confidence was accepted.
  • In the first step a set of likely cutting paths is determined, and the input image is divided into elementary components by separating along each path.
  • All combinations meeting certain acceptability constraints (such as size, position, etc.) are produced and scored by classification confidence.

It is also possible to carry out an oversegmenting procedure sequentially by evaluating trial separa-

  • In this work a neural net was trained to detect likely cutting columns for machine printed characters using neighborhood characteristics.
  • Using these as a base, the optimization algorithm recursively explored a tree of possible segmentation hypotheses.
  • The left column was fixed at each step, and various right columns were evaluated using recognition confidence.
  • Recursion is used to vary the left column as well, but pruning rules are employed to avoid testing all possible combinations.

A holistic process recognizes an entire word as a unit. A major drawback of this class of methods

  • This point is especially critical when training on word samples is required: a training stage is thus mandatory to expand or modify the lexicon of possible words.
  • Recognition was based on the comparison of a collection of simple features extracted from the whole word against a lexicon of "codes" representing the "theoretical" shape of the possible words.
  • This strategy still typifies recent holistic methods.
  • The "middle zone" was not delimited by straight lines, but by means of smooth curves ated to every feature and uncertainty coefficients were introduced to make this representation more tolerant to distortion by avoiding binary decisions.
  • Moreover, this second system also implement several Markov models at different recognition stages (word recognition and cheque amount recognition).

In the machine printed text area characters are regular so that feature representations are stable, and

  • In a long document repetitions of the most common words occur with predictable frequency.
  • In [45] these characteristics were combined to cluster the ten most common short words with good accuracy, as a precursor to word recognition.
  • It was suggested that identification of the clusters could be done on the basis of unigram and bigram frequencies.
  • More general applications require a dynamic generation stage of holistic descriptions.
  • The system was able to achieves 50% size reduction with under 2% error.

6. Concluding remarks

  • Methods for treating the problem of segmentation in character recognition have developed remark- ably in the last decade.
  • It is hoped that this comprehensive discussion will provide insight into the concepts involved, and perhaps provoke further advances in the area.
  • For cursive script from many writers and a large vocabulary, at the other extreme, methods of ever increasing sophistication are being pursued.
  • The authors have not attempted to compare the effectiveness of algorithms, or to discuss the crucial topic of evaluation.
  • The authors apologize to researchers whose important contributions may have been overlooked.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A SURVEY OF METHODS AND STRATEGIES IN
CHARACTER SEGMENTATION
Richard G. Casey and Eric Lecolinet
ENST Paris and IBM Almaden Research Center
ENST Paris
ABSTRACT
Character segmentation has long been a critical area of the OCR process. The higher
recognition rates for isolated characters vs. those obtained for words and connected
character strings well illustrate this fact. A good part of recent progress in reading
unconstrained printed and written text may be ascribed to more insightful handling of
segmentation.
This paper provides a review of these advances. The aim is to provide an appreci-
ation for the range of techniques that have been developed, rather than to simply list
sources. Segmentation methods are listed under four main headings. What may be
termed the "classical" approach consists of methods that partition the input image into
subimages, which are then classified. The operation of attempting to decompose the
image into classifiable units is called "dissection". The second class of methods avoids
dissection, and segments the image either explicitly, by classification of prespecified
windows, or implicitly by classification of subsets of spatial features collected from the
image as a whole. The third strategy is a hybrid of the first two, employing dissection
together with recombination rules to define potential segments, but using classification
to select from the range of admissible segmentation possibilities offered by these
subimages. Finally, holistic approaches that avoid segmentation by recognizing entire
character strings as units are described.

- 2 -
KEYWORDS
Optical character recognition, character segmentation, survey, holistic recognition, Hid-
den Markov Models, graphemes, contextual methods, recognition-based segmentation

- 3 -
1. Introduction
1.1. The role of segmentation in recognition processing
Character segmentation is an operation that seeks to decompose an image of a sequence of charac-
ters into subimages of individual symbols. It is one of the decision processes in a system for optical
character recognition (OCR). Its decision, that a pattern isolated from the image is that of a character (or
some other identifiable unit), can be right or wrong. It is wrong sufficiently often to make a major contri-
bution to the error rate of the system.
In what may be called the "classical" approach to OCR, Fig. 1, segmentation is the initial step in a
three-step procedure:
Given a starting point in a document image:
1. Find the next character image.
2. Extract distinguishing attributes of the character image.
3. Find the member of a given symbol set whose attributes best match those of the input, and output
its identity.
This sequence is repeated until no additional character images are found.
An implementation of step 1, the segmentation step, requires answering a simply-posed question:
"What constitutes a character?" The many researchers and developers who have tried to provide an algo-
rithmic answer to this question find themselves in a Catch-22 situation. A character is a pattern that
resembles one of the symbols the system is designed to recognize. But to determine such a resemblance
the pattern must be segmented from the document image. Each stage depends on the other, and in com-
plex cases it is paradoxical to seek a pattern that will match a member of the system’s recognition alpha-
bet of symbols without incorporating detailed knowledge of the structure of those symbols into the pro-
cess.
Furthermore, the segmentation decision is not a local decision, independent of previous and subse-
quent decisions. Producing a good match to a library symbol is necessary, but not sufficient, for reliable
recognition. That is, a poor match on a later pattern can cast doubt on the correctness of the current
segmentation/recognition result. Even a series of satisfactory pattern matches can be judged incorrect if
contextual requirements on the system output are not satisfied. For example, the letter sequence "cl" can
often closely resemble a "d", but usually such a choice will not constitute a contextually valid result.
Thus it is seen that the segmentation decision is interdependent with local decisions regarding shape
similarity, and with global decisions regarding contextual acceptability. This sentence summarizes the
refinement of character segmentation processes in the past 40 years or so. Initially, designers sought to
perform segmentation as per the "classical" sequence listed above. As faster, more powerful electronic

- 4 -
circuitry has encouraged the application of OCR to more complex documents, designers have realized
that step 1 can not be divorced from the other facets of the recognition process.
In fact, researchers have been aware of the limitations of the classical approach for many years.
Researchers in the 1960s and 1970s observed that segmentation caused more errors than shape distortions
in reading unconstrained characters, whether hand- or machine-printed. The problem was often masked
in experimental work by the use of databases of well-segmented patterns, or by scanning character strings
printed with extra spacing. In commercial applications stringent requirements for document preparation
were imposed. By the beginning of the 1980’s workers were beginning to encourage renewed research
interest [73] to permit extension of OCR to less constrained documents.
The problems of segmentation persist today. The well-known tests of commercial printed text OCR
systems by University of Nevada, Las Vegas [64][65] consistently ascribe a high proportion of errors to
segmentation. Even when perfect patterns, the bitmapped characters that are input to digital printers,
were recognized, commercial systems averaged 0.5% spacing errors. This is essentially a segmentation
error by a process that attempts to isolate a word subimage. The article [6] emphatically illustrates the
woes of current machine-print recognition systems as segmentation difficulties increase (see Fig. 2). The
degradation in performance of NIST tests of handwriting recognition on segmented [86] and unseg-
mented [88] images underscore the continuing need for refinement and fresh approaches in this area. On
the positive side of the ledger, the study [29] shows the dramatic improvements that can be obtained
when a thoroughgoing segmentation scheme replaces one of prosaic design
Some authors previously have surveyed segmentation, often as part of a more comprehensive work,
e.g., cursive recognition [36] [19] [20] [55] [58] [81], or document analysis [23] [29]. In the present
paper we present a survey whose focus is character segmentation, and which attempts to provide broad
coverage of the topic.
1.2 Organization of methods
A major problem in discussing segmentation is how to classify methods. Tappert et al [81], for
example, speaks of "external" vs. "internal" segmentation, depending on whether recognition is required
in the process. Dunn and Wang [20] use "straight segmentation" and "segmentation-recognition" for a
similar dichotomization.
A somewhat different point of view is proposed in this paper. The division according to use or
non-use of recognition in the process fails to make clear the fundamental distinctions among present-day
approaches. For example, it is not uncommon in text recognition to use a spelling corrector as a post-
processor. This stage may propose the substitution of two letters for a single letter output by the
classifier. This is in effect a use of recognition to resegment the subimage involved. However, the process
represents only a trivial advance on traditional methods that segment independent of recognition.

- 5 -
In this paper the distinction between methods is based on how segmentation and classification
interact in the overall process. In the example just cited, segmentation is done in two stages, one before
and one after image classification. Basically an unacceptable recognition result is re-examined and
modified by a (implied) resegmentation. This is a rather "loose" coupling of segmentation and
classification.
A more profound interaction between the two aspects of recognition occurs when a classifier is
invoked to select the segments from a set of possibilities. In this family of approaches segmentation and
classification are integrated. To some observers it even appears that the classifier performs segmentation
since, conceptually at least, it could select the desired segments by exhaustive evaluation of all possible
sets of subimages of the input image.
After reviewing available literature, we have concluded that there are three "pure" strategies for
segmentation, plus numerous hybrid approaches that are weighted combinations of these three. The ele-
mental strategies are:
1. the classical approach, in which segments are identified based on "character-like" properties. This
process of cutting up the image into meaningful components is given a special name, "dissection",
in discussions below.
2. recognition-based segmentation, in which the system searches the image for components that match
classes in its alphabet.
3. holistic methods, in which the system seeks to recognize words as a whole, thus avoiding the need
to segment into characters.
In strategy (1) the criterion for good segmentation is the agreement of general properties of the segments
obtained with those expected for valid characters. Examples of such properties are height, width, separa-
tion from neighboring components, disposition along a baseline, etc. In method (2) the criterion is recog-
nition confidence, perhaps including syntactic or semantic correctness of the overall result. Holistic
methods (3) in essence revert to the classical approach with words as the alphabet to be read. The reader
interested to obtain an early illustration of these basic techniques may glance ahead to Fig. 6 for exam-
ples of dissection processes, Fig. 13 for a recognition-based strategy, and Fig. 16 for a holistic approach.
Although examples of these basic strategies are offered below, much of the literature reviewed for
this survey reports a blend of methods, using combinations of dissection, recognition searching, and word
characteristics. Thus, although the paper necessarily has a discrete organization, the situation is perhaps
better conceived as in Fig. 3. Here the three fundamental strategies occupy orthogonal axes: hybrid
methods can be represented as weighted combinations of these lying at points in the intervening space.
There is a continuous space of segmentation strategies rather than a discrete set of classes with well-
defined boundaries. Of course, such a space exists only conceptually; it is not meaningful to assign pre-
cise weights to the elements of a particular combination.

Citations
More filters
Journal ArticleDOI
TL;DR: The nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms are described.
Abstract: Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered.

2,653 citations

Journal ArticleDOI
TL;DR: This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.
Abstract: A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

592 citations


Cites methods from "A survey of methods and strategies ..."

  • ...Since documents usually have clean backgrounds, binarization methods [10] are often adopted for segmenting characters....

    [...]

Journal ArticleDOI
TL;DR: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.
Abstract: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.

544 citations

Journal ArticleDOI
01 May 2001
TL;DR: The historical evolution of CR systems is presented, the available CR techniques, with their superiorities and weaknesses, are reviewed and directions for future research are suggested.
Abstract: Character recognition (CR) has been extensively studied in the last half century and has progressed to a level that is sufficient to produce technology-driven applications. Now, rapidly growing computational power is enabling the implementation of the present CR methodologies and is creating an increasing demand in many emerging application domains which require more advanced methodologies. This paper serves as a guide and update for readers working in the CR area. First, the historical evolution of CR systems is presented. Then, the available CR techniques, with their superiorities and weaknesses, are reviewed. Finally, the current status of CR is discussed and directions for future research are suggested. Special attention is given to off-line handwriting recognition, since this area requires more research in order to reach the ultimate goal of machine simulation of human reading.

517 citations

Book ChapterDOI
05 Sep 2010
TL;DR: It is argued that the appearance of words in the wild spans this range of difficulties and a new word recognition approach based on state-of-the-art methods from generic object recognition is proposed, in which object categories are considered to be the words themselves.
Abstract: We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs - text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines - one open source and one proprietary - with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.

503 citations

References
More filters
Journal ArticleDOI
TL;DR: The state of the art of online handwriting recognition during a period of renewed activity in the field is described, based on an extensive review of the literature, including journal articles, conference proceedings, and patents.
Abstract: This survey describes the state of the art of online handwriting recognition during a period of renewed activity in the field. It is based on an extensive review of the literature, including journal articles, conference proceedings, and patents. Online versus offline recognition, digitizer technology, and handwriting properties and recognition problems are discussed. Shape recognition algorithms, preprocessing and postprocessing techniques, experimental systems, and commercial products are examined. >

922 citations


Additional excerpts

  • ...e.g., cursive recognition [36] [19] [20] [55] [58] [ 81 ], or document analysis [23] [29]....

    [...]

  • ...A major problem in discussing segmentation is how to classify methods. Tappert et al [ 81 ], for...

    [...]

Journal ArticleDOI
TL;DR: In this paper, a word image is transformed through a hierarchy of representation levels: points, contours, features, letters, and words, and a unique feature representation is generated bottom-up from the image using statistical dependences between letters and features.
Abstract: Cursive script word recognition is the problem of transforming a word from the iconic form of cursive writing to its symbolic form. Several component processes of a recognition system for isolated offline cursive script words are described. A word image is transformed through a hierarchy of representation levels: points, contours, features, letters, and words. A unique feature representation is generated bottom-up from the image using statistical dependences between letters and features. Ratings for partially formed words are computed using a stack algorithm and a lexicon represented as a trie. Several novel techniques for low- and intermediate-level processing for cursive script are described, including heuristics for reference line finding, letter segmentation based on detecting local minima along the lower contour and areas with low vertical profiles, simultaneous encoding of contours and their topological relationships, extracting features, and finding shape-oriented events. Experiments demonstrating the performance of the system are also described. >

502 citations

Journal ArticleDOI
TL;DR: Describes a complete system for the recognition of off-line handwriting, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness.
Abstract: Describes a complete system for the recognition of off-line handwriting. Preprocessing techniques are described, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness. Representation of the image is discussed and the skeleton and stroke features used are described. A recurrent neural network is used to estimate probabilities for the characters represented in the skeleton. The operation of the hidden Markov model that calculates the best word in the lexicon is also described. Issues of vocabulary choice, rejection, and out-of-vocabulary word recognition are discussed.

271 citations

Journal ArticleDOI
01 Jul 1992
TL;DR: A pattern- oriented segmentation method for optical character recognition that leads to document structure analysis is presented, and an extended form of pattern-oriented segmentation, tabular form recognition, is considered.
Abstract: A pattern-oriented segmentation method for optical character recognition that leads to document structure analysis is presented. As a first example, segmentation of handwritten numerals that touch are treated. Connected pattern components are extracted, and spatial interrelations between components are measured and grouped into meaningful character patterns. Stroke shapes are analyzed and a method of finding the touching positions that separates about 95% of connected numerals correctly is described. Ambiguities are handled by multiple hypotheses and verification by recognition. An extended form of pattern-oriented segmentation, tabular form recognition, is considered. Images of tabular forms are analyzed, and frames in the tabular structure are extracted. By identifying semantic relationships between label frames and data frames, information on the form can be properly recognized. >

243 citations


"A survey of methods and strategies ..." refers methods in this paper

  • ...e.g., cursive recognition [36] [19] [20] [55] [58] [81], or document analysis [23] [ 29 ]....

    [...]

  • ...the positive side of the ledger, the study [ 29 ] shows the dramatic improvements that can be obtained...

    [...]

  • ...The strategy in a simple form is illustrated in [ 29 ]....

    [...]

Journal ArticleDOI
TL;DR: A cursive script recognition program which has correctly identified 79 per cent of a test sample of 84 words, which compares favorably in performance level with previously reported programs appropriately “normalized”, while not requiring input pertaining to stroke sequence and stroke segmentation that is essential to these other programs.

232 citations


"A survey of methods and strategies ..." refers background or methods in this paper

  • ...The first reported use of this concept was probably [72], a report on a system for off-line cursive script recognition....

    [...]

  • ...As in [72], the grapheme concept has been applied mainly to cursive script by later researchers....

    [...]

Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "A survey of methods and strategies in character segmentation" ?

This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. 

The authors apologize to researchers whose important contributions may have been overlooked. 

Upper contour analysis was also used in [47] for a pre-segmentation algorithm that served as part of the second stage of a hybrid recognition system. 

By testing their adjacency relationships to perform merging, or their size and aspect ratios to trigger splitting mechanisms, much of the segmentation task can be accurately performed at a low cost in computation. 

In [67], words and letters were represented by means of tree dictionaries: possible words were described by a letter tree (also called a "trie") and letters were described by a feature tree. 

Splitting of an image classified as connected is then accomplished by finding characteristic landmarks of the image that are likely to be segmentation points, rejecting those that appear to be situated within a character, and implementing a suitable cutting path. 

in many current studies, as the authors shall see, segmentation is a complex process, and there is a need for a term such as "dissection" to distinguish the image-cutting subprocess from the overall segmentation, which may use contextual knowledge and/or character shape description. 

The authors noted that the technique was heavily dependent on the quality of the input images, and tended to fail on both very heavy or very light printing. 

The twin facts that early OCR development dealt with constrained inputs, while research wasmainly concerned with representation and classification of individual symbols, explains why segmentation is so rarely mentioned in pre-70s literature. 

As the system knows in advance what it is searching for, it can make use of high-level contextual knowledge to improve recognition, even at low-level stages.