scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 1983"


Journal ArticleDOI
J. M. White1, G. D. Rohrer1
TL;DR: Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machine- or hand-printed documents are described, with a more aggressive approach directed toward specialized, high-volume applications which justify extra complexity.
Abstract: Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machine- or hand-printed documents are described. The creation of a binary representation from an analog image requires such algorithms to determine whether a point is converted into a binary one because it falls within a character stroke or a binary zero because it does not. This thresholding is a critical step in Optical Character Recognition (OCR). It is also essential for other Character Image Extraction (CIE) applications, such as the processing of machine-printed or handwritten characters from carbon copy forms or bank checks, where smudges and scenic backgrounds, for example, may have to be suppressed. The first algorithm, a nonlinear, adaptive procedure, is implemented with a minimum of hardware and is intended for many CIE applications. The second is a more aggressive approach directed toward specialized, high-volume applications which justify extra complexity.

283 citations


Journal ArticleDOI
TL;DR: An algorithm for text recognition/correction that effectively merges a bottom-up refinement process that is based on the utilization of transitional probabilities and letter confusion probabilities, known as the Viterbi algorithm [VA], together with a top-down process based on searching a trie structure representation of a lexicon.
Abstract: The capabilities of present commercial machines for producing correct text by recognizing words in print, handwriting and speech are very limited. For example, most optical character recognition [OCR] machines are limited to a few fonts of machine print, or text that is handprinted under certain constraints; any deviation from these constraints will produce highly garbled text. This paper describes an algorithm for text recognition/correction that effectively merges a bottom-up refinement process that is based on the utilization of transitional probabilities and letter confusion probabilities, known as the Viterbi algorithm [VA], together with a top-down process based on searching a trie structure representation of a lexicon. The algorithm is applicable to text containing an arbitrary number of character substitution errors such as that produced by OCR machines.

109 citations


Patent
01 Aug 1983
TL;DR: In this article, a method for recognizing and providing an output corresponding to a character in which the character is received by an imager, digitized, and transmitted to a memory is presented.
Abstract: A method for recognizing and providing an output corresponding to a character in which the character is received by an imager, digitized, and transmitted to a memory. Data in the memory is read in a sequence which circumnavigates the test character. Only data representative of the periphery of the character are read. During the circumnavigation, character parameters, such as height, width, perimeter, area and waveform are determined. The character parameters are compared with reference character parameters and the ASCII code for the reference character which matches the character is provided as an output.

52 citations


Journal ArticleDOI
Richard G. Casey1, C. R. Jih1
TL;DR: A previously developed classification technique, based on decision trees, has been extended in order to improve reading accuracy in an environment of considerable character variation, including the possibility that documents in the same font style may be produced using quite different print technologies.
Abstract: A low-cost optical character recognition (OCR) system can be realized by means of a document scanner connected to a CPU through an interface. The interface performs elementary image processing functions, such as noise filtering and thresholding of the video image from the scanner. The processor receives a binary image of the document, formats the image into individual character patterns, and classifies the patterns one-by-one. A CPU implementation is highly flexible and avoids much of the development and manufacturing costs for special-purpose, parallel circuitry typically used in commercial OCR. A processor-based recognition system has been investigated for reading documents printed in fixed-pitch conventional type fonts, such as occur in routine office typing. Novel, efficient methods for tracking a print line, resolving it into individual character patterns, detecting underscores, and eliminating noise have been devised. A previously developed classification technique, based on decision trees, has been extended in order to improve reading accuracy in an environment of considerable character variation, including the possibility that documents in the same font style may be produced using quite different print technologies. The system has been tested on typical office documents, and also on artificial stress documents, obtained from a variety of typewriters.

27 citations


Journal ArticleDOI
01 Oct 1983
TL;DR: Human f a c t o r s r e s e a r c h h a s n o t b e e n d i r e c t e d a t u n d e 7 s t a n d d i n g t h i s p r o c e s s .
Abstract: A m a i n t a s k o f s e c r e t a r i e s a n d t y p i s t s i s t o r e t y p e d o c u m e n t s a f t e r t h e y h a v e b e e n e d i t e d i n p e n c i l by p r i n c i p a l s . I n c r e a s i n g l y , t h e y u s e w o r d p r o c e s s i n g s y s t e m s t o do t h i s . I n a d d i t i o n , some p r i n c i p a l s t y p e t h e i r own r e v i s i o n s a f t e r f i r s t m a k i n g t h e m i n p e n c i l . We h a v e i n f o r m a l l y o b s e r v e d t h a t p e o p l e u s i n g t e x t e d i t o r s s p e n d much o f t h e i r t i m e i n ( a ) v i s u a l s e a r c h ( l o o k i n g b a c k a n d f o r t h b e t w e e n t h e m a n u s c r i p t a n d t h e s c r e e n ) ; ( b ) d e c i s i o n m a k i n g ( d e c i d i n g how t o l o c a t e t h e r i g h t p l a c e i n t h e c o m p u t e r f i l e , d e c i d i n g how t o make t h e r e v i s i o n ) ; a n d ( c ) r e r e a d i n g . A c t u a l t i m e s p e n t e x e c u t i n g a command seems s m a l l i n c o m p a r i s o n . Human f a c t o r s r e s e a r c h h a s n o t b e e n d i r e c t e d a t u n d e 7 s t a n d i n g t h i s p r o c e s s .

9 citations



01 Mar 1983
TL;DR: The DMA Subtask objectives are provides and the general structure of the Handprinted Symbol Recognition System is outlined, which considers the key issues of information content, problems in the thinning or vectorization of a character, shape measurement and feature extraction, and finally character recognition or labeling.
Abstract: : This NORDA Technical Note is composed of five chapters. The first chapter presents an overview of optical character recognition (OCR) and its relation to the automated cartography environment. It provides the DMA Subtask objectives and discusses them in the light of symbol digitizing and information transformations. The division of a total OCR system into data acquisition/document management and isolated character recognition is considered along with NORDA's recent tasking (FY-82) prototype for DMA production centers. Chapter Two presents a discussion of the different ways in which recognition systems are constructed. In particular, it considers the differences in approach necessary for constrained and free-form OCR. Chapter Three describes the DMA environment in which a handprinted OCR system must operate and discusses performance requirements. The general structure of the Handprinted Symbol Recognition System is outlined in Chapter Four. This material considers the key issues of information content, problems in the thinning or vectorization of a character, shape measurement and feature extraction, and finally character recognition or labeling. The interaction between each of these elements is emphasized. Chapter Five provides a brief summary of the current Subtask accomplishments and status along with areas where work is in progress toward developing other handprinted OCR capabilities for DMA.

1 citations


Proceedings ArticleDOI
22 Jun 1983
TL;DR: In this paper, a handheld camera which translates inkprint into computer readable text has a wide variety of potential applications, such as the input device for a voice output reading machine for the visually impaired.
Abstract: A handheld camera which translates inkprint into computer readable text has a wide variety of potential applications, such as the input device for a voice output reading machine for the visually impaired. Commercial optical character recognition systems typically operate on controlled input and read with high speed and accuracy. As an input device for a voice output reading machine, much slower speeds of 200 words per minute are acceptable, but the text input is much less controlled in terms of quality, type size, style and format. In order to acquire text from a complex format which may include multiple columns, pictures, and graphs, operator control of the scanning is important. Automatic control of threshold and magnification combined with user control of the scanning sequence based on direct feedback from the camera image offers a potentially efficient input structure. Automatic thresholding and magnification control algorithms which work well on newsprint and good quality type are presented. Based on these results, spatial resolution and quantization requirements can be established for a system which will read text with an order of magnitude variation in size. Direct conversion auditory and tactile feedback for user control of scanning are considered.