scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 1992"


Journal ArticleDOI
TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.
Abstract: Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed. >

466 citations


Journal ArticleDOI
Ching Y. Suen1, C. Nadal1, R. Legault1, T.A. Mai1, Louisa Lam1 
01 Jul 1992
TL;DR: It is shown that it is possible to reduce the substitution rate to a desired level while maintaining a fairly high recognition rate in the classification of totally unconstrained handwritten ZIP code numerals.
Abstract: Four independently, developed expert algorithms for recognizing unconstrained handwritten numerals are presented. All have high recognition rates. Different experimental approaches for incorporating these recognition methods into a more powerful system are also presented. The resulting multiple-expert system proves that the consensus of these methods tends to compensate for individual weaknesses, while preserving individual strengths. It is shown that it is possible to reduce the substitution rate to a desired level while maintaining a fairly high recognition rate in the classification of totally unconstrained handwritten ZIP code numerals. If reliability is of the utmost importance, substitutions can be avoided completely (reliability=100%) while retaining a recognition rate above 90%. Results are compared with those for some of the most effective numeral recognition systems found in the literature. >

422 citations


Journal ArticleDOI
01 Jul 1992
TL;DR: A pattern- oriented segmentation method for optical character recognition that leads to document structure analysis is presented, and an extended form of pattern-oriented segmentation, tabular form recognition, is considered.
Abstract: A pattern-oriented segmentation method for optical character recognition that leads to document structure analysis is presented. As a first example, segmentation of handwritten numerals that touch are treated. Connected pattern components are extracted, and spatial interrelations between components are measured and grouped into meaningful character patterns. Stroke shapes are analyzed and a method of finding the touching positions that separates about 95% of connected numerals correctly is described. Ambiguities are handled by multiple hypotheses and verification by recognition. An extended form of pattern-oriented segmentation, tabular form recognition, is considered. Images of tabular forms are analyzed, and frames in the tabular structure are extracted. By identifying semantic relationships between label frames and data frames, information on the form can be properly recognized. >

243 citations


Patent
10 Jan 1992
TL;DR: In this paper, an electronic organizer is provided that incorporates an internal electronic scanner and a touch sensitive display screen to enter text and image data into a relational database format, which allows the operator to quickly and easily enter and retrieve related information between a number of different databases with a minimal amount of effort.
Abstract: An electronic organizer is provided that incorporates an internal electronic scanner and a touch sensitive display screen to enter text and image data. The internal electronic scanner permits both machine generated text and image data to be scanned and directly entered into the electronic organizer. Hand-printed text data is also entered directly via the touch sensitive display screen using a stylus or pen. The scanned machine generated text, the scanned image data and the hand-printed text can either be preserved as an image-oriented bit map, or optical character recognition routines can be applied to the data to identify characters and convert the identified characters to computer coded text data. Data entered into the electronic organizer is arranged in a relational database format, which permits the operator to quickly and easily enter and retrieve related information between a number of different databases with a minimal amount of effort. A small document transport mechanism is provided to aid in the scanning of small size documents.

229 citations


Journal ArticleDOI
TL;DR: It is shown that neural network classifiers with single-layer training can be applied efficiently to complex real-world classification problems such as the recognition of handwritten digits and provided appropriate data representations and learning rules are used, performance comparable to that obtained by more complex networks can be achieved.
Abstract: It is shown that neural network classifiers with single-layer training can be applied efficiently to complex real-world classification problems such as the recognition of handwritten digits. The STEPNET procedure, which decomposes the problem into simpler subproblems which can be solved by linear separators, is introduced. Provided appropriate data representations and learning rules are used, performance comparable to that obtained by more complex networks can be achieved. Results from two different databases are presented: an European database comprising 8700 isolated digits and a zip code database from the US Postal Service comprising 9000 segmented digits. A hardware implementation of the classifier is briefly described. >

196 citations


Journal ArticleDOI
01 Jul 1992
TL;DR: The state of the art in handwriting recognition, especially in cursive word recognition, is surveyed, and some basic notions are reviewed in the field of picture recognition, particularly, line image recognition.
Abstract: The state of the art in handwriting recognition, especially in cursive word recognition, is surveyed, and some basic notions are reviewed in the field of picture recognition, particularly, line image recognition. The usefulness of 'regular' versus 'singular' classes of features is stressed. These notions are applied to obtain a graph, G, representing a line image, and also to find an 'axis' as the regular part of G. The complements to G of the axis are the 'tarsi', singular parts of G, which correspond to informative features of a cursive word. A segmentation of the graph is obtained, giving a symbolic description chain (SDC). Using one or more as robust anchors, possible words in a list of words are selected. Candidate words are examined to see if the other letters fit the rest of the SDC. Good results are obtained for clean images of words written by several persons. >

183 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors extend the dynamic time warping algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications.
Abstract: The authors extend the dynamic time warping (DTW) algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications. Although direct application of the optimality principle reduced the computational complexity somewhat, the DPW (or image alignment) problem is exponential in the dimensions of the image. It is shown that by applying constraints to the image alignment problem, e.g., limiting the class of possible distortions, one can reduce the computational complexity dramatically, and find the optimal solution to the constrained problem in linear time. A statistical model, the planar hidden Markov model (PHMM), describing statistical properties of images is proposed. The PHMM approach was evaluated using a set of isolated handwritten digits. An overall digit recognition accuracy of 95% was achieved. It is expected that the advantage of this approach will be even more significant for harder tasks, such cursive-writing recognition and spotting. >

162 citations



Patent
15 Jul 1992
TL;DR: In this paper, a bit-mapped representation of the page is then stored in a memory means such as the memory of a computer system and a processor processes the bitmapped image to produce an output comprising coded character representations of the text on the page.
Abstract: A system for recognition of characters on a medium. The system includes a scanner for scanning a medium such as a page of printed text and graphics and producing a bit-mapped representation of the page. The bit-mapped representation of the page is then stored in a memory means such as the memory of a computer system. A processor processes the bit-mapped image to produce an output comprising coded character representations of the text on the page. The present invention discloses parsing a page to allow for production of the output characters in a logical sequence, a combination of feature detection methods and template matching methods for recognition of characters and a number of methods for feature detection such as use of statistical data and polygon fitting.

152 citations


Journal ArticleDOI
TL;DR: It is shown that the image representation, vectorization techniques, and optical character recognition subsystem are quite general and that the methodology implemented in the system can be generalized to the acquisition of other classes of line drawings.
Abstract: A system for the automatic acquisition of land register maps is described. The system converts paper-based documents into digital form for integration into an existing database. Processing a map begins with its digitization by a scanning device. A key step in the system is the conversion from raster format to graph representation, a special binary image format suitable for processing line structures. Subsequent steps include vectorization of the line structures and recognition of the symbols interspersed in the drawing. The system requires operator interaction only to resolve ambiguities and correct errors in the automatic processes. The final result is a set of descriptors for all detected map entities, which is then stored in a database. It is shown that the image representation, vectorization techniques, and optical character recognition subsystem are quite general and that the methodology implemented in the system can be generalized to the acquisition of other classes of line drawings. >

140 citations


Journal ArticleDOI
01 Jul 1992
TL;DR: It is argued that it is time for a major change of approach to optical character recognition (OCR) research, and new OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components.
Abstract: It is argued that it is time for a major change of approach to optical character recognition (OCR) research. The traditional approach, focusing on the correct classification of isolated characters, has been exhausted. The demonstration of the superiority of a new classification method under operational conditions requires large experimental facilities and databases beyond the resources of most researchers. In any case, even perfect classification of individual characters is insufficient for the conversion of complex archival documents to a useful computer-readable form. Many practical OCR tasks require integrated treatment of entire documents and well-organized typographic and domain-specific knowledge. New OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components. They should also exploit the unavoidable interaction with human operators to improve themselves without explicit 'training'. >

01 Jul 1992
TL;DR: In this paper, an optical character recognition (OCR) engine that is omnifont and reasonably robust on individual degraded characters is presented, but the weakest link is its handling of characters which are difficult to segment.
Abstract: An optical character recognition (OCR) engine that is omnifont and reasonably robust on individual degraded characters is presented. The weakest link is its handling of characters which are difficult to segment. The engine is divided into four phases: segmentation, image recognition, ambiguity resolution, and document analysis. The features are zonal and reduce the image to a blurred, gray-level representation. The classifier is data-driven, trained offline, and model-free. Handcrafted features and decision trees tend to be brittle in the presence of noise. To satisfy the needs of full-text applications, the system captures the structure of the document so that, when viewed in a word processor or spreadsheet program, the formatting of the optically recognized document reflects that of the original document. To satisfy the needs of the forms market, a proofing and correction tool displays 'pop-up' images of uncertain characters. >

Journal ArticleDOI
01 Jul 1992
TL;DR: Intense research performed over the past 15 years to answer the most pressing recognition problems is described and the man-machine interfaces made possible by online handwriting recognition and anticipated advances in both hardware and software are discussed.
Abstract: For large-alphabet languages, like Japanese, handwriting input using an online recognition technique is essential for input accuracy and speed. However, there are serious problems that prevent high recognition accuracy of unconstrained handwriting. First, the thousands of ideographic Japanese characters of Chinese origin (called Kanji) can be written with wide variations in the number and order of strokes and significant shape distortions. Also, writing box-free recognition of characters is required to create a better man-machine interface. Intense research performed over the past 15 years to answer the most pressing recognition problems is described. Prototype systems are also described. The man-machine interfaces made possible by online handwriting recognition and anticipated advances in both hardware and software are discussed. >

Journal ArticleDOI
01 Jul 1992
TL;DR: An intelligent forms processing system (IFPS) which provides capabilities for automatically indexing form documents for storage/retrieval to/from a document library and for capturing information from scanned form images using intelligent character recognition (ICR).
Abstract: This paper describes an intelligent forms processing system (IFPS) which provides capabilities for automatically indexing form documents for storage/retrieval to/from a document library and for capturing information from scanned form images using intelligent character recognition (ICR). The system also provides capabilities for efficiently storing form images. IFPS consists of five major processing components: (1) An interactive document analysis stage that analyzes a blank form in order to define a model of each type of form to be accepted by the system; the parameters of each model are stored in a form library. (2) A form recognition module that collects features of an input form in order to match it against one represented in the form library; the primary features used in this step are the pattern of lines defining data areas on the form. (3) A data extraction component that registers the selected model to the input form, locates data added to the form in fields of interest, and removes the data image to a separate image area. A simple mask defining the center of the data region suffices to initiate the extraction process; search routines are invoked to track data that extends beyond the masks. Other special processing is called on to detect lines that intersect the data image and to delete the lines with minimum distortion to the rest of the image. (4) An ICR unit that converts the extracted image data to symbol code for input to data base or other conventional processing systems. Three types of ICR logic have been implemented in order to accommodate monospace typing, proportionally spaced machine text, and handprinted alphanumerics. (5) A forms dropout module that removes the fixed part of a form and retains only the data filled in for storage. The stored data can be later combined with the fixed form to reconstruct the original form. This provides for extremely efficient storage of form images, thus making possible the storage of very large number of forms in the system. IFPS is implemented as part of a larger image management system called Image and Records Management system (IRM). It is being applied in forms data management in several state government applications.

Journal ArticleDOI
Shuichi Tsujimoto1, Haruo Asada1
01 Jul 1992
TL;DR: Experiments have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and thatThe proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other.
Abstract: The document image processes used in a recently developed text reading system are described. The system consists of three major components: document analysis, document understanding, and character segmentation/recognition. The document analysis component extracts lines of text from a page for recognition. The document understanding component extracts logical relationships between the document constituents. The character segmentation/recognition component extracts characters from a text line and recognizes them. Experiments on more than a hundred documents have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and that the proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other. >

Proceedings ArticleDOI
Lawrence O'Gorman1
30 Aug 1992
TL;DR: Three techniques are described for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression, and for subsampling the text image to fit on the computer screen white maintaining readability.
Abstract: Describes some of the document processing techniques used in the RightPages electronic library system. Since the system deals with scanned images of document pages, these techniques are critical to the use and appearance of the system. The author describes three techniques: (1) for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression; (2) for subsampling the text image to fit on the computer screen white maintaining readability; and (3) a document layout analysis technique to determine text blocks. >

Journal ArticleDOI
TL;DR: A simple method is presented for automatically identifying regions in envelope images which are candidates for being the destination address and the success of the texture-based segmentation algorithm for identifying address blocks is demonstrated.

Journal ArticleDOI
K.S. Baird1
01 Jul 1992
TL;DR: An experimental printed-page reader that is easy to adapt to various languages is described, and an attempt has been made to rid the algorithms of all language-specific rules, relying instead on automatic learning from examples and generalized table-driven methods.
Abstract: An experimental printed-page reader that is easy to adapt to various languages is described. Changing the target language may involve simultaneous changes in symbol sets, typefaces, sizes of text, page layouts, linguistic contexts, and imaging defects. The strategy has been to isolate the effects of these sources of variation within separate, independent engineering subsystems. In this way, it has been possible to construct, with a minimum of manual effort, classifiers for arbitrary combinations of symbols, typefaces, sizes, and imaging defects. An attempt has been made to rid the algorithms of all language-specific rules, relying instead on automatic learning from examples and generalized table-driven methods. For some tasks it has been feasible to avoid language dependency altogether. Linguistic context can be exploited through data-directed filtering algorithms in a uniform and modular manner, so that preexisting tools developed by computational linguistics can readily be applied. These principles are illustrated by trials on English, Swedish, Tibetan, and special technical texts. >

Journal ArticleDOI
01 Jul 1992
TL;DR: In this paper, the architecture of a reading machine designed to achieve a high rate of correct interpretation of text as well as high speed in performing the interpretation is described, and the refinement of the architecture for a specialized reading machine, to find and interpret addresses on a stream of postal letters, is also described.
Abstract: The architecture of a reading machine designed to achieve a high rate of correct interpretation of text as well as high speed in performing the interpretation is described. The refinement of the architecture for a specialized reading machine, to find and interpret addresses on a stream of postal letters, is also described. The addresses can be either machine-printed or handwritten. The primary subtasks correspond to finding the block of text corresponding to the destination address, recognizing characters and words within the address, and interpreting the text using postal directories. The need for multiple algorithms and multiple scales for recognition (holistic and analytic) and for methods for combining results of multiple algorithms, the efficacy of artificial neural nets and fuzzy matching, and the feasibility of reading unconstrained handwritten words when there exist accompanying numeric fields that limit word choices are shown. >

Journal ArticleDOI
TL;DR: A system to recognize handwritten Chinese characters is presented, where a new efficient algorithm is proposed, based on accumulated chain codes, for line approximation, in the first stage.

Proceedings ArticleDOI
30 Nov 1992
TL;DR: A method for the recognition of multifont printed characters, giving emphasis to the identification of structural descriptions of character shapes using prototypes, accomplishing robustness to noise with less than two prototypes per class, on the average.
Abstract: A method for the recognition of multifont printed characters is proposed, giving emphasis to the identification of structural descriptions of character shapes using prototypes. Noise and shape variations are modeled as series of transformations from groups of features in the data to features in each prototype. Thus, the method manages systematically the relative distortion between a candidate shape and its prototype, accomplishing robustness to noise with less than two prototypes per class, on the average. Our method uses a flexible matching between components and a flexible grouping of the individual components to be matched. A number of shape transformations are defined. Also, a measure of the amount of distortion that these transformations cause is given. The problem of classification of character shapes is defined as a problem of optimization among the possible transformations that map an input shape into prototypical shapes. Some tests with hand printed numerals confirmed the method's high robustness level. >

Patent
25 Mar 1992
TL;DR: In this article, a system and method are disclosed for enabling the technique of deferred processing of OCR scanned mail to be compatible with existing techniques for mechanical sortation of mail that use standard sort barcode formats which are common to a given destination postal system.
Abstract: A system and method are disclosed for enabling the technique of deferred processing of OCR scanned mail to be compatible with existing techniques for mechanical sortation of mail that use standard sort barcode formats which are common to a given destination postal system. This enables deferred OCR processed mail to be sorted on an unsegregated basis along with other types of mail which have not been processed by the deferred OCR technique. This allows the OCR encoded mail to be processed along with other types of encoded mail during standard sort barcode that has been imprinted using prior technology such as OCR or manual code desks.

Patent
14 Dec 1992
TL;DR: In this article, a process and system for processing a digitally stored image on a digital computer is described, which scans and digitizes an image, separate text from non-text components, enhances and deskews the image, compresses the resulting image file, and stores the enhanced, deskewed, and compressed file for later transmission, optical character recognition, or high quality printing or viewing of the image.
Abstract: This specification discloses a process and system for processing a digitally stored image on a digital computer. The system scans and digitizes an image, separate text from non-text components, enhances and deskews the image, compresses the resulting image file, and stores the enhanced, deskewed, and compressed file for later transmission, optical character recognition, or high quality printing or viewing of the image.

Book
29 Jul 1992
TL;DR: A new approach to visual recognition is offered that avoids limitations and has been used to recognize trees, bushes, grass, and trails in ground-level scenes of a natural environment and improves its recognition abilities by exploiting the context provided by what it has previously recognized.
Abstract: An autonomous vehicle that is to operate outdoors must be able to recognize features of the natural world as they appear in ground-level imagery. Geometric reconstruction alone is insufficient for an agent to plan its actions intelligently--objects in the world must be recognized, and not just located. Most work in visual recognition by computer has focused on recognizing objects by their geometric shape, or by the presence or absence of some prespecified collection of locally measurable attributes (e.g., spectral reflectance, texture, or distinguished markings). On the other hand, most entities in the natural world defy compact description of their shapes, and have no characteristic features with discriminatory power. As a result, image-understanding research has achieved little success towards recognizing natural scenes. In this thesis we offer a new approach to visual recognition that avoids these limitations and has been used to recognize trees, bushes, grass, and trails in ground-level scenes of a natural environment. Reliable recognition is achieved by employing an architecture with a number of innovative aspects. These include: context-controlled generation of hypotheses instead of universal partitioning; a hypothesis comparison scheme that allows a linear growth in computational complexity as the recognition vocabulary is increased; recognition at the level of complete contexts instead of individual objects; and provisions for contextual information to guide processing at all levels. Recognition results are added to a persistent, labeled, three-dimensional model of the environment which is used as context for interpreting subsequent imagery. In this way, the system constructs a description of the objects it sees, and, at the same time, improves its recognition abilities by exploiting the context provided by what it has previously recognized.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A method which combines dynamic programming and a neural network recognizer for segmenting and recognizing character strings and has achieved a per-zip-code raw recognition rate of 81% on a 2368 handwritten zip-code test set.
Abstract: The authors describe a method which combines dynamic programming and a neural network recognizer for segmenting and recognizing character strings. The method selects the optimal consistent combination of cuts from a set of candidate cuts generated using heuristics. The optimal segmentation is found by representing the image, the candidate segments, and their scores as a graph in which the shortest path corresponds to the optimal interpretation. The scores are given by neural net outputs for each segment. A significant advantage of the method is that the labor required to segment images manually is eliminated. The system was trained on approximately 7000 unsegmented handwritten zip codes provided by the United States Postal Service. The system has achieved a per-zip-code raw recognition rate of 81% on a 2368 handwritten zip-code test set. >

Journal ArticleDOI
01 Jul 1992
TL;DR: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document"s content and layout.
Abstract: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document's content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward. >

Journal ArticleDOI
01 Jul 1992
TL;DR: Computer Vision, Graphics, and Image Processing (CVGIP) CVGIP: Graphical Models and Image processing (CVgIP GMIP) and CVgIP: Image Understanding (cvGIP IU)
Abstract: Computer Vision, Graphics, and Image Processing (CVGIP) CVGIP: Graphical Models and Image Processing (CVGIP GMIP) CVGIP: Image Understanding (CVGIP IU) IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) Machine Vision and Applications Journal (MVA) Image and Vision Computing (IVC). International Journal of Pattern Recognition and Artificial Intelligence (PRAI) Pattern Recognition (PR) Pattern Recognition Letters (PRL) International Journal of Computer Vision (IJCV)

Patent
21 Jan 1992
TL;DR: In this article, a method and apparatus for processing image data of dot-matrix/ink-jet printed text to perform Optical Character Recognition (OCR) of such image data is disclosed.
Abstract: Method and apparatus are disclosed for processing image data of dot-matrix/ink-jet printed text to perform Optical Character Recognition (OCR) of such image data. In the method and apparatus, the image data is viewed for detecting if dot-matrix/ink-jet printed text is present. Any detected dot-matrix/ink-jet produced text is then pre-processed by determining the image characteristic thereof by forming a histogram of pixel density values in the image data. A 2-D spatial averaging operation as a second pre-processing step smooths the dots of the characters into strokes and reduces the dynamic range of the image data. The resultant spatially averaged image data is then contrast stretched in a third pre-processing step to darken dark regions of the image data and lighten light regions of the image data. Edge enhancement is then applied to the contrast stretched image data in a fourth pre-processing step to bring out higher frequency line details. The edge enhanced image data is then binarized and applied to a dot-matrix/ink jet neural network classifier for recognizing characters in the binarized image data from a predetermined set of symbols prior to OCR.

Proceedings ArticleDOI
30 Aug 1992
TL;DR: The main features of the so called SARAT-system are the segmentation into single characters through recognition, contour based features, statistical distance classification, and a word module.
Abstract: Presents a new system for the automatic recognition of grabic printed text. The system is still under development. Here the concept of the so called SARAT-system is presented together with some very promising first results. The main features of the system are the segmentation into single characters through recognition, contour based features, statistical distance classification, and a word module. >

Journal ArticleDOI
TL;DR: An optical character recognition (OCR) system, which uses a multilayer perceptron (MLP) neural network classifier, which has the advantage of being fast, easily trainable, and capable of creating arbitrary partitions of the input feature space is described.