scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 1987"


Journal ArticleDOI
TL;DR: The selection of a set of moments that provide good discrimination between characters, the comparison of three classification schemes, the choice of a weighting vector that improves the classification performance, and a series of experiments to determine how the recognition rate is affected by the number of library feature vector sets are presented.
Abstract: An investigation of the use of two-dimensional moments as features for recognition has resulted in the development of a systematic method of character recognition. The method has been applied to six machine-printed fonts. Documents used to test the method contained 24 lines of alphanumeric characters. Before scanning a document to be processed, a training document having the same font must be scanned and stored in memory. Characters on the training document are isolated by contour tracing, and then the 2D moments of each character are computed and stored in a library of feature vectors. The document to be recognized is then scanned, and the 2D moments of its characters are compared with those in the library for classification. In this paper we present the selection of a set of moments that provide good discrimination between characters, the comparison of three classification schemes, the selection of a weighting vector that improves the classification performance, and a series of experiments to determine how the recognition rate is affected by the number of library feature vector sets. Recognition rates between 98.5% and 99.7% have been achieved for all fonts tested.

146 citations


Patent
26 Jun 1987
TL;DR: In this paper, a digital imaging file processing system for processing and filing digitized documents is described, in which a unique header page is used in each group of documents to be digitized, and gives the document processor such pertinent information as the scanning resolution, whether the processor is to look for yellow highlight marks, and the copy quality of the documents in the batch.
Abstract: A digital imaging file processing system for processing and filing digitized documents is disclosed. In the preferred embodiment a distributed data processing system implements the invention and is comprised of a central computer linked to the following components: a document processor; optical character recognition device; mass storage devices; a printer; and at least one intelligent workstation computer. The document processor automatically digitizes a stack of documents without the need for user intervention. A unique header page is used in each group of documents to be digitized, and gives the document processor such pertinent information as the scanning resolution, whether the processor is to look for yellow highlight marks, and the copy quality of the documents in the batch. Other information includes instructions per the use or non-use of the optical character recognition device, and filing or indexing instructions. A unique yellow highlight mark detection method and means are disclosed. The method comprises illuminating the documents to be processed with light of different wavelengths and comparing the reflected output in volts. The highlight marks can be used to identify the title of a document as well as key words therein, and the location within the document of the highlighted regions. The digitized documents can be classified by title, key words or other methods.

109 citations


Patent
26 Mar 1987
TL;DR: A thresholding algorithm selection apparatus for the selecting the threshold algorithm to be applied to a specimen in a digital imaging process is disclosed in this article, where the specimen is preliminarily scanned and thresholded using a predetermined thresholding method.
Abstract: A thresholding algorithm selection apparatus for the selecting the threshold algorithm to be applied to a specimen in a digital imaging process is disclosed. The specimen is preliminarily scanned and thresholded using a predetermined thresholding algorithm. The resulting preliminary digital image is divided into a number of cells or regions. Each cell is classified according to its optical characteristics; e.g. textual, photographic, etc. The cell classification is effected in the preferred embodiment by comparing the average white and black run-lengths of each cell to experimental average run-length values which are representative of the different types of specimens. The specimen is re-scanned and thresholded with an appropriate thresholding algorithm for each cell. Additionally, the digital imaging system may contain an optical character recognition means, in which case the cell characterization step is used to trigger the optical character recognition means in a textual region, while causing the optical character recognition means to ignore non-textual cells.

30 citations


Journal ArticleDOI
J. Voisin1, Pierre A. Devijver1
TL;DR: The multiedit/condensing technique offers an automatic solution to this problem which avoids the proliferation of references without impairing the recognition performance, as demonstrated by experimental results in a print recognition context.

19 citations


Proceedings ArticleDOI
11 May 1987
TL;DR: The development and implementation of a new algorithm for automated text string separation which is relatively independent of changes in text font style and size, and of string orientation is described.
Abstract: An automated system for document analysis is extremely desirable. A digitized image consisting of a mixture of text and graphics should be segmented in order to more efficiently represent both the areas of text and graphics. This paper describes the development and implementation of a new algorithm for automated text string separation which is relatively independent of changes in text font style and size, and of string orientation. The algorithm does not explicitly recognize individual characters. The principal components of the algorithm are the generation of connected components and the application of the Hough transform in order to logically group together components into character strings which may then be separated from the graphics. The algorithm outputs two images, one containing text strings, and the other graphics. These images may then be processed by suitable character recognition and graphics recognition systems. The performance of the algorithm, both in terms of its effectiveness and computational efficiency, was evaluated using several test images. The results of the evaluations are described. The superior performance of this algorithm compared to other techniques is clear from the evaluations.

14 citations


Proceedings ArticleDOI
Yoshitake Tsuji1, Jun Tsukumo1, Ko Asai1
14 Oct 1987
TL;DR: A hierarchical image segmentation is described, which separates a document image into its entities and was successful in reading 99.30% of the Japanese characters and Chinese ideographs, as used in printed text.
Abstract: A fundamental problem in machine vision is to detect and identify special objects in an image. In the field of machine-reading for existing printed matter and books, a very important technique allows extracting and recognizing characters in desired text lines from a document image. This paper describes a hierarchical image segmentation, which separates a document image into its entities. Furthermore, a character segmentation, with minimum variance criterion, and a character recognition, based on three improved loci feature, have been developed as two elemental methods for reading books. In these experimental results using different commercial Japanese pocket books, 99% of text lines were correctly extracted. Also, it was successful in reading 99.30% of the Japanese characters and Chinese ideographs, as used in printed text.

7 citations


Journal ArticleDOI
TL;DR: Approaches are advanced for pattern recognition when a large number of classes must be identified andMultilevel encoded multiple-iconic filters are considered for this problem.
Abstract: Approaches are advanced for pattern recognition when a large number of classes must be identified. Multilevel encoded multiple-iconic filters are considered for this problem. Hierarchical arrangements of iconic filters and/or preprocessing stages are described. A theoretical basis for the sidelobe level and noise effects of filters designed for large class problems is advanced. Experimental data are provided for an optical character recognition case study.

6 citations


01 Jan 1987
TL;DR: In this article, a literature study on Parallel Processing and Pattern Recognition is presented, which is combined into a treatise consisting of four parts: 1 A Literature Study, 2 The Design and Implementation, 3 Results, Conclusions and Recommendations, and 4 References and Appendices.
Abstract: Three theses are combined into a treatise consisting of four parts: 1 A Literature Study, 2 The Design and Implementation, 3 Results, Conclusions and Recommendations, and 4 References and Appendices A through F The first Part is a self-contained literature study on Parallel Processing and Pattern Recognition It consists of four Chapters The first Chapter introduces the NCube four/+ parallel computer on which the knowledge-based optical character-recognition system has been implemented; it also discusses computers based on other structures than a hypercube structure The second Chapter is a survey on Parallel Processing In this section the most common methods to parallelise a problem on existing architectures are described The third Chapter provides some background to Pattern Recognition An important issue will be the difference between statistical and syntactical Pattern Recognition In Chapter four the patternrecognising process is examined more closely and is partitioned in preprocessing, classification and postprocessing Some currently available Optical Character Recognition systems are discussed and as an extension of our literature research a model of the recogniser is presented The second Part specifies in extenso the design, realization and implementation of the model presented in Chapter four of Part 1 Part 2 consists of four Chapters being the Chapter 5 through 8 Chapter five describes the programming environment consisting of a Scheme interpreter and a concurrent user interface that has been implemented In the Chapters 6, 7 and 8 the design and implementations of the Preprocessor, Classifier and Postprocessor are respectively thoroughly investigated In the third Part the results of the implemented system, some (tentative) conclusions and recommendations are presented Part 3 contains two Chapters; 9 and 10 In Chapter nine the Results are presented, and Chapter ten provides the reader with the conclusions Here, as a conclusion we want explicitly remark that it has turned out to be possible to combine Artificial Intelligence with traditional Pattern Recognition and that the resulting system can be speeded up by parallelisation The fourth Part holds the appendices Appendix A contains statistical knowledge about the fonts used in test results In appendix B the trigram frequency tables of the ICCA Journal, the UNIX manual and this treatise can be found Appendix C describes pre- and suffixes used in the spelling corrector Appendix D describes the Frames used Appendix E contains all C sources of implemented programs Appendix F comprises the documentation concerning the Flisp interpreter

6 citations



Proceedings ArticleDOI
13 Oct 1987
TL;DR: A powerful Chinese multifont recognition system that can recognize at the same time in the same program different character styles and can recognize characters of different sizes is developed.
Abstract: Since the input problem of Chinese character is the barrier of the integration of computer, Chinese and communication ( C & C & C ), we studied and developed a powerful Chinese multifont recognition system. In this system, the algorithm we developed can recognize at the same time in the same program different character styles. In the meantime, it can recognize characters of different sizes.

3 citations


Proceedings ArticleDOI
27 Mar 1987
TL;DR: A special set of multi-level multi-filters (referred to as "iconic filters") is described for use in the super-class recognition problem and the utility of iconic filters for large multi-class problems is demonstrated.
Abstract: New large class filters are used for the automatic recognition of characters. A special set of multi-level multi-filters (referred to as "iconic filters") is described for use in the super-class recognition problem. Dif-ferent types of iconic filters are considered and the results of such tests are reported. System performance under simulated nonideal conditions is detailed. The observed behavior of iconic filters is quantified, and an initial explanation for the effect of a large number of training set images is provided. New solutions are advanced, and the the utility of iconic filters for large multi-class problems is demonstrated. This is the first data to be reported on the use of multi-level multi-filters for a large class pattern recognition problem.

Proceedings ArticleDOI
30 Apr 1987
TL;DR: A description is given of a prototype address block finding system which is implemented on a LISP machine to facilitate the extension of the complexity of feature interpretation on an Expert System.
Abstract: A description is given of a prototype address block finding system which is currently under development. This system is intended to support a wider application of machine vision and robotic technologies for the handling of mail. Monochrome digital images of mail are subjected to a sequence of binarization, clustering, and ranking algorithms which are designed to automatically determine the position, extent, and orientation of the destination address block. The prototype system is implemented on a LISP machine to facilitate the extension of the complexity of feature interpretation on an Expert System. Success rates for the end-to-end process were tested on 40 mail images representing a wide variety of mail characteristics.

01 May 1987
TL;DR: Beyond the scope originally intended for Phase I, ERIM has implemented that end-to-end system and has determined that it achieves ninety percent digit identification on limited test data.
Abstract: : This report describes the Phase I activities in Advanced Research in Recognition of Handwritten Address ZIP Codes conducted for the United States Postal Service at the Environmental Research Institute of Michigan These activities include an in-depth review of the optical character recognition literature, the development of a handwritten addresses digitized image data base, the development of a hardware and software testbed for investigating the recognition of handwritten address, and the design of a prototype end-to-end ZIP Code recognition system Beyond the scope originally intended for Phase I, ERIM has implemented that end-to-end system and has determined that it achieves ninety percent digit identification on limited test data Featured within the overall activities is the concept that development of image algorithms is an incremental process This concept is strongly reflected in the testbed architecture that has resulted from this work This approach is unique in that it enables continued system refinement in a way that is both understandable and meaningful A plan for such refinement of the prototype system is proposed for Phase II of this project

Proceedings ArticleDOI
30 Apr 1987
TL;DR: A new character recognition scheme using improved extended octal code as primitive is introduced, which has certain advantages such as flexible size, orientation, variations, fewer learning samples needed and lower degree of ambiguity.
Abstract: This paper concerns a critique of several line-drawing pattern recognition methods such as picture descriptive language[15], Berthod and Maroy's methods[1], extended Freeman's chain code [4,22], tree grammar[5-6] and array grammar[3] [20,21]. Then a new character recognition scheme using improved extended octal code as primitive is introduced, which has certain advantages such as flexible size, orientation, variations, fewer learning samples needed and lower degree of ambiguity. Finally the concept of semantic pattern recognition is discussed.

Patent
15 May 1987
TL;DR: In this article, the authors optimize the contents of dictionaries by recognizing a document where dictionary information to a character dictionary or knowledge dictionary is entered by a character recognition part and registering the dictionary information into the character dictionary and knowledge dictionary.
Abstract: PURPOSE:To optimize the contents of dictionaries by recognizing a document where dictionary information to a character dictionary or knowledge dictionary is entered by a character recognition part and registering the dictionary information to the character dictionary or knowledge dictionary CONSTITUTION:Knowledge information to be added to a document 1 for knowledge dictionary generation is written in specific format, the character recognition part 2 recognizes knowledge information in the document 1, and a control part 3 adds it to the knowledge dictionary 5 When knowledge information is corrected or deleted, its contents are written in the document 1 and recognized by the recognition part 2 When the dictionary is corrected or deleted or when information is added to the dictionary, the control part 3 compiles the contents of the dictionary 5 of source images and stores it in a knowledge dictionary load module 4 The contents of this module 4 are loaded in the memory in the control part 3 before original recognition is started and the recognition result is copied in this memory on the basis of the stored knowledge information

Proceedings ArticleDOI
Hiroyuki Kami1, Tsutomu Temma1, Ko Asai1
21 Aug 1987
TL;DR: It has been shown that this two stage discriminant analysis method is further applicable to character sequence recognition without the need for a character isolation process.
Abstract: Two stage discriminant analysis has been proposed for multi class recognition. In the second stage, multiple discriminant analysis is applied to the identification for each set of classes, which are not distinctly classified in the first stage. The proposed method is applied to character recognition for the method estimation. The recognition rate was 99.3% for 91 categories of alphanumerics and special symbols. The recognition speed was 20 milliseconds per character, when this analysis program was executed on image pipelined processors. It has been shown that this method is further applicable to character sequence recognition without the need for a character isolation process.

01 Jan 1987
TL;DR: The authors used optical character recognition (OCR) for character recognition of the original text file, but there may be some mistakes both in the text and in the formatting of the file.
Abstract: This file was produced by optical character recognition of the original. There may be some mistakes both in the text and in the formatting