scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 1988"


Patent
21 Dec 1988
TL;DR: In this article, a document processing system for processing documents having data printed thereon including pre-printed markings visible to the eye and machine-readable characters includes a scanner for capturing the color image of the document being processed.
Abstract: A document processing system for processing documents having data printed thereon including pre-printed markings visible to the eye and machine-readable characters includes a scanner for capturing the color image of the document being processed. Circuitry is provided for reducing the contract between the pre-printed marking and the background surrounding a machine-readable character based upon the hue of the pre-printed markings, such that the pre-printed markings and the background surrounding a character are not distinguishable with respect to light reflected from the document thereby generating a filtered image of the document. An optical character recognition device receives the filtered image of the document representing character information only for identifying the characters appearing on the document.

110 citations


Journal ArticleDOI
TL;DR: A handprinted symbol recognition system for identifying through computer techniques free-form, unconstrained handprinting in which accuracy is decoupled from efficiency converts a thin-line (skeletonized) figure output from a preprocessing system into segment-oriented lists.

85 citations


Journal ArticleDOI
Henry S. Baird1
TL;DR: Large-scale statistically-significant trials, in the context of a mixed-font, variable-size optical character recognition (OCR) system, have shown that the technique is superior to simpler, fixed mappings, and is effective in generalizing common characteristics in mixtures of fonts.
Abstract: A general technique for combining the strengths of structural shape analysis with statistical classification is proposed. The approach is to construct a function, called a feature identification mapping, from the representation generated by structural analysis to the one required for statistical classification. It is shown that if a certain continuity property holds for the parameterizations of the structural shape types, then it is possible to infer the mapping automatically. Inference is slow and heuristic, but is highly automated, controlled by only a few statistical parameters, and is applicable uniformly to all shape types. In addition, if the shape types are sufficiently elementary, the resulting mapping can be computed quickly using kD-trees. Large-scale statistically-significant trials, in the context of a mixed-font, variable-size optical character recognition (OCR) system, have shown that the technique is superior to simpler, fixed mappings, and is effective in generalizing common characteristics in mixtures of fonts.

68 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: The author assesses the current status of the field and places the problem of Chinese recognition into perspective with other areas of optical character recognition.
Abstract: The author assesses the current status of the field and places the problem of Chinese recognition into perspective with other areas of optical character recognition. Early experiments are briefly reviewed, and sources of more up-to-date information, including review articles, are indicated, advances in computer technology are discussed that have had a significant impact on the problem, and a sampling of relatively recent research on the classification of both printed and handprinted ideographs is presented. Included in the discussion are techniques of preprocessing (character location and segmentation) and hierarchical classification. >

59 citations


Journal ArticleDOI
TL;DR: This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas and uses it as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a documents.
Abstract: The realization of the paper-free office seems to be difficult that expected. Therefore, good paper-computer interfaces are necessary to transform paper documents into an electronic form, which allows the use of a filing and retrieval system. An electronic document page is an optically scanned and digitized representation of a printed page. Document analysis is the problem of interpreting and labeling the constitutents of the document. Although there are very reliable optical character recognition (OCR) methods, the process could be very inefficient. To prune the search space and to become more efficient, some search supporting methods have to be developed. This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas. The procedure is used as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a document. We have implemented our method in Common Lisp on a Symbolies 3640 Workstation and have run it for a large population of office documents. The results obtained have been very encouraging and have convincingly confirmed the soundness of our approach.

43 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: Preliminary results are presented to show how the initial stages of syntactic verification can improve character recognition performance.
Abstract: An optical character recognition (OCR) system is developed for recognizing handwritten and handprinted addresses which include a British postcode written within character boxes. The system makes use of syntactic information concerning postcodes and a postcode database which interacts with the character recognition process to ensure that only valid postcodes are recognized. Postulated valid postcodes are then verified using semantic features of the remainder of the address, to produce a final postcode which both matches the input characters and is compatible with the remainder of the address. Preliminary results are presented to show how the initial stages of syntactic verification can improve character recognition performance. >

17 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: Considering that the characters on the license plate of a moving vehicle may be severely blurred, a method of recognizing province name on a license plate is proposed using the projection histogram of Chinese characters, which has strong noise-resistance and a high processing speed.
Abstract: Considering that the characters on the license plate of a moving vehicle may be severely blurred, a method is proposed of recognizing province name on a license plate. Using the projection histogram of Chinese characters, the appropriate feature value is extracted, the fuzzy matching and dynamic program is applied, and the 29 province names of China are correctly classified. The simulation results show that this method has strong noise-resistance and a high processing speed. The recognition rate is more than 90% for the blurred characters, and the recognition time is less than one second. >

12 citations


Proceedings ArticleDOI
Koichi Kise1, K. Yamada1, N. Tanaka, Noboru Babaguchi, Yoshikazu Tezuka 
14 Nov 1988
TL;DR: The authors present the visiting card understanding system, whose output is suitable for the input of a visiting-card database and is applicable to many kinds of documents.
Abstract: The authors present the visiting card understanding system, whose output is suitable for the input of a visiting-card database. The system consists of two modules. One is a document model which represents the hierarchical knowledge about the layout structure of visiting cards. The other is an understanding module which interprets the document model to general and test hierarchical hypotheses about the contents of a visiting card. Since the understanding module is fundamentally independent of document type, the system is applicable to many kinds of documents. >

12 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: A feasibility study was undertaken to investigate methods of preprocessing envelope images to extract the address region from the image in the presence of the other data and presort the addresses into subclasses suitable for recognition by an optical character recognition system.
Abstract: A feasibility study was undertaken to investigate methods of preprocessing envelope images to extract the address region from the image in the presence of the other data and presort the addresses into subclasses suitable for recognition by an optical character recognition system with separate recognition channels for machine and handwritten address classes. Preliminary results are presented that are based on trials using a sample of nearly 1000 envelope images. The data show a successful address extraction rate of around 98% and a correct classification rate of about 90%. >

10 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: The concept of model driven segmentation allows quick focussing of the analysis on important regions of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document.
Abstract: The task of document recognition requires the scanning of a paper document and the analysis of its content and structure. The resulting electronic representation has to capture the content as well as the logic and layout structure of the document. The first step in the recognition process is scanning, filtering and binarization of the paper document. Based on the preprocessing results we delineate key areas like address or signature for a letter, or the abstract for a report. This segmentation procedure uses a specific document layout model. The validity of this segmentation can be verified in a second step by using the results of more time-consuming procedures like text/graphic classification, optical character recognition (OCR) and the comparison with more elaborate models for specific document parts. Thus our concept of model driven segmentation allows quick focussing of the analysis on important regions. The segmentation is able to operate directly on the raster image of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document. A test version for the analysis of simple business letters has been implemented.

9 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: The performance of some ANN-based classifiers is evaluated and their relative performance is compared and factors such as recognition accuracy and reliability of classification fault tolerance to misregistration and spatial quantizations, computational costs in training, and classification are evaluated.
Abstract: Artificial neutral networks (ANN) provide a robust computational paradigm for character recognition. The character classifier needs to have the capability to separate arbitrarily shaped regions in the pattern space. The recent development in such ANN-based classifiers and their learning methods are reviewed. Such classifiers are based on multilayer (hidden layer) ANN or higher-order correlation ANN and use backprojection learning. The performance of some ANN-based classifiers is evaluated and their relative performance is compared. The performance evaluation is based on factors such as recognition accuracy and reliability of classification fault tolerance to misregistration and spatial quantizations, computational costs in training, and classification. >

Proceedings ArticleDOI
11 Dec 1988
TL;DR: The first prototype 3-D intelligent image sensor has been designed, fabricated, and tested, and it is a monolithic character recognition system including about 10K transistors and 3K diodes, which is implemented in a 3D IC process with laser-crystallization technology.
Abstract: The first prototype 3-D intelligent image sensor has been designed, fabricated, and tested. It is a monolithic character recognition system including about 10K transistors and 3K diodes, which are implemented in a 3-D IC process with laser-crystallization technology. With this system, incomplete character inputs are recognized and complete character outputs are displayed. It features asynchronous parallel data processing and parallel access to memory. >

Proceedings ArticleDOI
19 Feb 1988
TL;DR: Algorithms are designed for automating the generation of loops with minimum redundancy from bit-map, identifying those loops thus generated if they are simple ones, decomposing the complex loops into simpler interpretable shapes, and finally establishing succinct description files for the graphics.
Abstract: Automating the input of mixed text/graphic documents require more than just a character recognition system. We require algorithms to separate text strings from graphic and also to recognize the graphic and generate a description file for it. In this paper recent results on the recognition and structural description of graphics are reported. During machine recognition on the graphics, some heuristics are introduced to equip the system with a certain amount of decision making functions so as to narrow and optimize the search. Algorithms are designed for automating the generation of loops with minimum redundancy from bit-map, identifying those loops thus generated if they are simple ones, decomposing the complex loops into simpler interpretable shapes, and finally establishing succinct description files for the graphics. Error corrections on misalignments introduced by the feeding mechanism have been given consideration. Extensive experiments have been done on various graphics, and satisfactory results have been obtained. This technique is also useful for the analysis of computer vision segmented images.

Proceedings ArticleDOI
25 Oct 1988
TL;DR: By extracting and connecting basic character patterns based on the descriptions, this method improves description ability of patterns and processing speed and can remove unstable writing movements and can separate strokes stably.
Abstract: The present paper reports an on-line recognition method of cursive Korean characters. In the present method, we treat a Korean character pattern as a finite sequence of basic character patterns. After extracting candidate basic character patterns from an input character pattern, we determine basic character patterns making a Korean character from the candidates by connecting processing. We described basic character patterns and their connected patterns used in the present method according to their features. By extracting and connecting basic character patterns based on the descriptions, we improve description ability of patterns and processing speed. Precise description ability of patterns and extracting candidates can remove unstable writing movements and can separate strokes stably.

Proceedings ArticleDOI
14 Nov 1988
TL;DR: A text recognition system for Japanese documents is described, consisting of a personal computer, which is used as a controller; an image scanner; and a recognition unit.
Abstract: A text recognition system for Japanese documents is described. The system is composed of a personal computer, which is used as a controller; an image scanner; and a recognition unit. There are four processing stages: text-line segmentation, character segmentation, character recognition, and postprocessing using the Japanese dictionary. Experimental results of the tests for Japanese handwritten technical reports are presented. >

Proceedings ArticleDOI
25 Oct 1988
TL;DR: High character recognition performance is obtained at a reading speed of 8 Japanese characters per second which is sufficient for hand-scanning data input operations and the recognition rate is higher than 98% for about 3,300 Japanese characters.
Abstract: A prototype OCR is constructed using a very compact parallel processing unit. This unit, designed for interactive character recognition applications, is equipped with a hand-scanner for input and a personal computer for word and/or image processing. The heart of this unit is a bit-serial Single Instruction Multiple Data stream (SIMD) array processor constructed with four identical cellular array LSIs (AAP2). The processor is fully programmable and the complex pro-cess of Japanese character recognition can be carried out with a single program package. Its architecture permits flexible and high-speed SIMD operations to process bitline data such as local fields of scanned documents. The processor components were integrated into one board and confirmed to be more than ten times faster than present image processors of the same size through various image processing tests. High character recognition performance is obtained at a reading speed of 8 Japanese characters per second which is sufficient for hand-scanning data input operations. The recognition rate is higher than 98% for about 3,300 Japanese characters.

Proceedings ArticleDOI
22 Aug 1988
TL;DR: In this article, the maximum-likelihood strategy, an important tool in the field of statistical decision theory, is applied to the image classification problem, which can be implemented in a standard image correlation system and that excellent classification results can be obtained.
Abstract: An essential feature of a practical automatic image recognition system is the ability to tolerate certain types of variations within images The recognition of images subject to intrinsic variations can be treated as a sorting task in which an image is identified as a member of some class of images Herein, the maximum-likelihood strategy, an important tool in the field of statistical decision theory, is applied to the image classification problem We show that the strategy can be implemented in a standard image correlation system and that excellent classification results can be obtained

Proceedings ArticleDOI
14 Nov 1988
TL;DR: A trial approach is presented for the realization of an optical character reader for printed Chinese characters (kanji) taking advantage of optical/digital hybrid processing and recognition results are promising.
Abstract: A trial approach is presented for the realization of an optical character reader for printed Chinese characters (kanji). This approach is different from conventional ones as far as kanji features and recognition are concerned. Use is made of structural features based on only the horizontal and vertical strokes in a kanji, taking advantage of optical/digital hybrid processing. The recognition experimental results are promising: 99.6%, 0.3%, and 0.1% for the correct, rejected, and error rates, respectively. >

Proceedings ArticleDOI
16 Dec 1988
TL;DR: In this paper, a novel statistical approach for recognizing handwritten Arabic characters is introduced, which involves, as a first step, digitization of the segmented character and the secondary characters are then isolated and identified separately.
Abstract: This paper introduces a novel statistical approach for recognizing handwritten Arabic characters. The proposed method involves, as a first step, digitization of the segmented character. The secondary characters are then isolated and identified separately thereby reducing the recognition issue to a 20 class problem. The moments of the horizontal and vertical projections of the remaining primary characters are estimated and normalized with respect to the zero order moment. Simple measures of shape are obtained from the normalized moments and incorporated into a feature vector. Classification is accomplished using quadratic discriminant functions. Results confirming that the method show considerable merit are presented.

Patent
14 Sep 1988
TL;DR: In this article, the pattern information is corrected by work stations 5 1 5 5m placed at a location apart from the store and forward exchange 1, the correction is executed efficiently independently of location.
Abstract: PURPOSE:To efficiently correct a bit of pattern information by comparing a bit of image information with the pattern information at an external work station CONSTITUTION:The pattern information distributed to a work station (W/S)5-m and the image information corresponding thereto are sent to a CPU 13 via an I/O interface circuit 12 in the station (W/S)5-m and the CPU 13 displays the pattern information and the image information corresponding thereto on a display device 14 Then the pattern information displayed on the display device 14 and the image information corresponding thereto are compared, the console is operated to correct the pattern information When an error exists in the recognition of the image information by an optical character recognition device 10, the correction is applied Since the pattern information is corrected by work stations 5-1-5-m placed at a location apart from the store and forward exchange 1, the correction is executed efficiently independently of location

Proceedings ArticleDOI
14 Nov 1988
TL;DR: The results presented conclude that the n-tuple recognizer is capable of achieving approximately 60% correct first-choice classification on totally unconstrained handwritten characters from any writer and over 90% correctFirst-choice classified characters from one writer if the class size is between 15 and 20.
Abstract: The recognition of relatively unconstrained handwritten characters using the n-tuple recognition technique in two application areas is reported. In the case of Electronic Paper, a high resolution flat panel display with a transparent digitizer on its surface, the character set consists of 73 distinct characters and in the case of British postcodes the character set consists of 36 distinct characters. The technique is investigated as a fast method of producing a ranked list of likely classes for each character. The results presented conclude that the n-tuple recognizer is capable of achieving approximately 60% correct first-choice classification on totally unconstrained handwritten characters from any writer and over 90% correct first-choice classification on unconstrained handwritten characters from one writer if the class size is between 15 and 20. In cases where the actual character is not at the top of the ranked list it is usually in a high-ranked position. Recognition performance deteriorates rapidly as the class size exceeds 20. >

Proceedings ArticleDOI
Masayuki Kimura1, T. Ejima, Hirotomo Aso, H. Yashiro, N. Son, M. Suzuki 
14 Nov 1988
TL;DR: An intelligent character recognition system with high accuracy and high speed is realized by integrating image-type and logical-type information processing modules by integrating associative pattern matching and structural analysis of characters.
Abstract: An intelligent character recognition system with high accuracy and high speed is realized by integrating image-type and logical-type information processing modules. For image-type information processing, a technique called associative pattern matching is proposed, and its usefulness is verified by experiments. The results show that it is useful for rough classification of input patterns. For logical-type processing, structural analysis of characters is realized by a relaxation matching technique, which is described in detail, and some experimental results are shown. The integration of these two techniques is discussed. >

Patent
29 Jul 1988
TL;DR: In this paper, the authors propose to reduce the amount of information of images for recognition, by finding a difference of projecting histogram, and extracting the feature of a character pattern, which is extracted by matching with the pattern of a differential dictionary.
Abstract: PURPOSE:To reduce the amount of information of images for recognition, by finding a difference of projecting histogram, and extracting the feature of a character pattern. CONSTITUTION:A document and journal is read optically with a scanner 1, and the picture element of a read image, after being binarized at a binarization part 2, is stored in a picture memory 3. A binarized picture element is accumulated in one direction at a projecting histogram part generating part 4, and the projecting histogram is generated, and the difference between the adjacent values of the projecting histograms is found at a histogram difference generating part 8. A collating part 9 performs a character recognition processing by performing the matching with the pattern of a differential dictionary 11 based on the difference.

Proceedings ArticleDOI
14 Nov 1988
TL;DR: An optical handwritten numeral recognition system is presented which uses a G3 facsimile transceiver (fax) as the input device and the recognition rate is from 95% to 99.5% depending on the sample.
Abstract: An optical handwritten numeral recognition system is presented which uses a G3 facsimile transceiver (fax) as the input device. Flexible recognition algorithms perform raster scanning of the Huffman code produced by the fax only once. The hanger-chain algorithms separate handwritten numerals and obtain the first features at the same time. All the recognition software is coded in Turbo C language. The system recognition speed is from 10 to 100 characters per second, and the recognition rate is from 95% to 99.5% depending on the sample. >

Proceedings ArticleDOI
25 Oct 1988
TL;DR: A method to estimate the unexpected rotational angle of the image is proposed and using the pipelined CORDIC array processor architecture to rotate the image back quickly will increase the performance of the automatic document input system.
Abstract: In the document analysis system or the understanding system[1,2], the rotation of the document's image will cause optical character recognition error. Then the document must be scanned and recognized again. This phenomenon will degrade the performance of the automatic document input system. In this paper, we propose a method to estimate the unexpected rotational angle of the image. And we suggest using the pipelined CORDIC array processor architecture to rotate the image back quickly. Thus the performance of the automatic document input system will increase.

Patent
23 Dec 1988
TL;DR: In this paper, a character pattern indicated by a character code converted by character recognition section together with an image pattern extracted from a character picture not recognized so as to allow manual correction.
Abstract: PURPOSE:To attain efficient and accurate correction by displaying a character pattern indicated by a character code converted by a character recognition section together with an image pattern extracted from a character picture not recognized so as to allow manual correction. CONSTITUTION:A picture of an OCR original 1 to be read is inputted to a picture input section 2 and stored once in an input picture storage section 3. A character recognition section 4 converts the stored picture into a character code recognized normally and a reject code not recognized and stores the result to a code data storage section 5. A character pattern generating section 6 generates a character pattern indicated by the character code being the result of conversion by the normal recognition and a character picture extracting means 12 extracts a character picture not recognized by the recognition section 4 to generate the image pattern. The character pattern recognized normally and the image pattern not recognized are displayed on a display section 8. Thus, the operator uses an input device 11 to operate a correction processing means 10 to attain efficient and accuracy correction.

01 Nov 1988
TL;DR: An investigation into the feasibility of placing machine-readable symbology (bar codes or OCR text) on map products and the issues that surfaced during the design and testing of this prototype system are documents.
Abstract: : This report documents an investigation into the feasibility of placing machine-readable symbology (bar codes or OCR text) on map products. The approach to this research included a survey of optical-scanning devices, procurement of suitable devices, and interfacing the equipment to a personal computer for the development of a prototype automated feature attribute access system. This report documents the issues that surfaced during the design and testing of this prototype system.

Proceedings ArticleDOI
25 Oct 1988
TL;DR: This paper presents major achievements made towards the development of a high-speed optical character recognition (OCR) workstation for characters of various fonts and sizes based upon an efficient feature extraction concept centred around an edge-vectorization technique.
Abstract: This paper presents major achievements made towards the development of a high-speed optical character recognition (OCR) workstation for characters of various fonts and sizes. The system is based upon an efficient feature extraction concept centred around an edge-vectorization technique. The resulting edges are mapped into a feature space from where a binary feature vector is built and subsequently fed to a standard statistical Bayesian classifier. The technique has been demonstrated on an IBM-PC/XT (without coprocessor) to operate at least 25 times the speed of conventional OCR techniques, achieving a 100% recognition rate with learned characters and 87% with unlearned.