scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 1998"


Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations


Proceedings ArticleDOI
16 Aug 1998
TL;DR: Compared with some traditional text location methods, this method has the following advantages: 1) low computational cost; 2) robust to font size; and 3) high accuracy.
Abstract: Automatic text location (without character recognition capabilities) deals with extracting image regions that contain text only. The images of these regions can then be fed to an optical character recognition module or highlighted for users. This is very useful in a number of applications such as database indexing and converting paper documents to their electronic versions. The performance of our automatic text location algorithm is shown in several applications. Compared with some traditional text location methods, our method has the following advantages: 1) low computational cost; 2) robust to font size; and 3) high accuracy.

560 citations


Patent
22 Oct 1998
TL;DR: In this paper, an optical-input print reading device with voice output for people with impaired or no vision is presented, in which the user provides input to the system from hand gestures.
Abstract: An optical-input print reading device with voice output for people with impaired or no vision in which the user provides input to the system from hand gestures. Images of the text to be read, on which the user performs finger- and hand-based gestural commands, are input to a computer, which decodes the text images into their symbolic meanings through optical character recognition, and further tracks the location and movement of the hand and fingers in order to interpret the gestural movements into their command meaning. In order to allow the user to select text and align printed material, feedback is provided to the user through audible and tactile means. Through a speech synthesizer, the text is spoken audibly. For users with residual vision, visual feedback of magnified and image enhanced text is provided. Multiple cameras of the same or different field of view can improve performance. In addition, alternative device configurations allow portable operation, including the use of cameras located on worn platforms, such as eyeglasses, or on a fingertip system. The use of gestural commands is natural, allowing for rapid training and ease of use. The device also has application as an aid in learning to read, and for data input and image capture for home and business uses.

425 citations


Journal ArticleDOI
TL;DR: A complete Optical Character Recognition (OCR) system for printed Bangla, the fourth most popular script in the world, is presented and extension of the work to Devnagari, the third most popular Script in the World, is discussed.

381 citations


Journal ArticleDOI
TL;DR: In this article, the authors present the state of Arabic character recognition research throughout the last two decades and present the main objective of this paper is to present the current state of the research.

319 citations


Journal ArticleDOI
TL;DR: A new image compression technique called DjVu is presented that enables fast transmission of document images over low-speed connections, while faithfully reproducing the visual aspect of the document, including color, fonts, pictures, and paper texture.

312 citations


Journal ArticleDOI
TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.
Abstract: Concerns the extraction of rotation invariant texture features and the use of such features in script identification from document images Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures These features are then used in an attempt to solve a practical but hitherto mostly overlooked problem in document image processing-the identification of the script of a machine printed document Automatic script and language recognition is an essential front-end process for the efficient and correct use of OCR and language translation products in a multilingual environment Six languages (Chinese, English, Greek, Russian, Persian, and Malayalam) are chosen to demonstrate the potential of such a texture-based approach in script identification

293 citations


Proceedings ArticleDOI
03 Jan 1998
TL;DR: This paper applies an interpolation filter, multi-frame integration and a combination of four filters to solve the problems of character recognition for videos: low resolution characters and extremely complex backgrounds.
Abstract: Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented, the names of people and places or descriptions of objects. In this paper, two difficult problems of character recognition for videos are addressed: low resolution characters and extremely complex backgrounds. We apply an interpolation filter, multi-frame integration and a combination of four filters to solve these problems. Segmenting characters is done by a recognition-based segmentation method and intermediate character recognition results are used to improve the segmentation. The overall recognition results are good enough for use in news indexing. Performing video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.

284 citations


Journal ArticleDOI
TL;DR: Describes a complete system for the recognition of off-line handwriting, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness.
Abstract: Describes a complete system for the recognition of off-line handwriting. Preprocessing techniques are described, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness. Representation of the image is discussed and the skeleton and stroke features used are described. A recurrent neural network is used to estimate probabilities for the characters represented in the skeleton. The operation of the hidden Markov model that calculates the best word in the lexicon is also described. Issues of vocabulary choice, rejection, and out-of-vocabulary word recognition are discussed.

271 citations


Patent
15 Jun 1998
TL;DR: In this article, an optically scanned image (34, 208) of at least a portion of document containing visual data, in a particular format, representing information related to the financial transaction was generated.
Abstract: The present invention provides financial transaction processing systems and methods. One preferred embodiment of a method according to one aspect of the present invention includes generating an optically scanned image (34, 208) of at least a portion of document containing visual data, in a particular format, representing information related to the financial transaction. Recognition characteristics (32, 204) are generated from the scanned image and are compared (40, 220) to respective sets of reference recognition characteristics generated from respective other transaction documents having different respective formats to determine therefrom whether the particular format of the visual data matches one of the respective formats of the other documents. When such a match is found to exist, location is determined (40, 218) of a field in the scanned image to which optical character recognition may be applied to generate therefrom the information, based upon the respective format found to match the particular format of the visual data. Optical character recognition is then utilized to generate said visual data (60, 232) from said location.

266 citations


Journal ArticleDOI
TL;DR: A new statistical approach based on global typographical features is proposed to the widely neglected problem of font recognition that aims at the identification of the typeface, weight, slope and size of the text from an image block without any knowledge of the content of that text.
Abstract: A new statistical approach based on global typographical features is proposed to the widely neglected problem of font recognition. It aims at the identification of the typeface, weight, slope and size of the text from an image block without any knowledge of the content of that text. The recognition is based on a multivariate Bayesian classifier and operates on a given set of known fonts. The effectiveness of the adopted approach has been experimented on a set of 280 fonts. Font recognition accuracies of about 97 percent were reached on high-quality images. In addition, rates higher than 99.9 percent were obtained for weight and slope detection. Experiments have also shown the system robustness to document language and text content and its sensitivity to text length.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: This work has developed a scheme for automatically extracting text from digital images and videos for content annotation and retrieval that results in segmented characters that can be directly processed by an OCR system to produce ASCII text.
Abstract: Efficient content-based retrieval of image and video databases is an important application due to rapid proliferation of digital video data on the Internet and corporate intranets. Text either embedded or superimposed within video frames is very useful for describing the contents of the frames, as it enables both keyword and free-text based search, automatic video logging, and video cataloging. We have developed a scheme for automatically extracting text from digital images and videos for content annotation and retrieval. We present our approach to robust text extraction from video frames, which can handle complex image backgrounds, deal with different font sizes, font styles, and font appearances such as normal and inverse video. Our algorithm results in segmented characters that can be directly processed by an OCR system to produce ASCII text. Results from our experiments with over 5000 frames obtained from twelve MPEG video streams demonstrate the good performance of our system in terms of text identification accuracy and computational efficiency.

Patent
29 Apr 1998
TL;DR: In this paper, a processor-based fax routing method receives digital data representing a facsimile document and performs OCR on the image data extracting therefrom texts for the keyword, the name of the addressee, and other text present in the document.
Abstract: A processor-based fax routing method receives digital data representing a facsimile document. Without performing optical character recognition ("OCR"), the method identifies in the image data a keyword block of text, and an addressee-name block of text that is located near the keyword block of text. The fax routing method then performs OCR on the image data extracting therefrom texts for the keyword, the name of the addressee, and other text present in the facsimile. Using probabilities computed between the text of the name of the addressee and names in a list of possible addressees, and between the keyword and keywords in a list of keywords, the fax routing method determines an addressee for the document. The fax routing method then converts all text into email addressed to the fax's addressee, and stores the email onto an email server from which it may be retrieved.

BookDOI
01 Apr 1998
TL;DR: Evaluating the performance of techniques for the extraction of primitives from line drawings composed of horizontal and vertical lines and evaluating the development of a general framework for intelligent document image retrieval.
Abstract: Evaluating the performance of techniques for the extraction of primitives from line drawings composed of horizontal and vertical lines, J.F. Arias et al the development of a general framework for intelligent document image retrieval, D. Doermann et al perdition of OCR accuracy using a neural network, J. Gonzalez et al evaluating Japanese document recognition in the Internet/intranet environment, T. Hong et al DocBrowse - a system for textual and graphical querying on degraded document image data, M.Y. Jaisimha et al language identification in complex, unoriented and degraded document images, D. Lee et al document analysis and the World Wide Web, D. Lopresti and J. Zhou language-independent and segmentation-free optical character recognition, J. Makhoul et al documents on the move - DA&IR-driven mail piece processing today and tomorrow, U. Miletzki priming the recognizer, G. Nagy and Y. Xu semiautomatic production of highly accurate word bounding box ground truth, R.P. Rogers et al SPAM - a scientific paper access method, A.L. Spitz automated CAD conversion with the machine drawing understanding system, L. Wenyin and D. Dori. (Part contents)

Journal ArticleDOI
TL;DR: This work proposes a new text location algorithm that is suitable in a number of applications, including conversion of newspaper advertisements from paper documents to their electronic versions, World Wide Web search, color image indexing and video indexing, and emphasize on extracting important text with large size and high contrast.

Journal ArticleDOI
Berrin Yanikoglu1, Peter A. Sandon1
TL;DR: This work introduces a new segmentation algorithm, guided in part by the global characteristics of the handwriting, which finds the successive segmentation points by evaluating a cost function at each point along the baseline.

Book
01 Jan 1998
TL;DR: Alpaydin and Gurgen, Comparison of Statistical and Neural Classifiers and their Applications to Optical Character Recognition and Speech Classification and Chen and Chang, Learning Algorithms and Applications of Principal Component Analysis.
Abstract: Lampinen, Pattern Recognition. Alpaydin and Gurgen, Comparison of Statistical and Neural Classifiers and their Applications to Optical Character Recognition and Speech Classification. Sun and Nekovei, MedicalImaging. Takeda and Omatu, Paper Currency Recognition. Cordella and Stefano, Neural Network Classification Reliability: Problems and Applications. Yagi, Kobayaski, and Matsumoto, Parallel Analog Image Processing: Solving Regularization Problems with Architecture Inspired by the Vertebrate Retinal Circuit. Setiono, Algorithmic Techniques and their Applications. Chen and Chang, Learning Algorithms and Applications of Principal Component Analysis. Merat and Villalobos, Learning Evaluation and Pruning Techniques.

Journal ArticleDOI
TL;DR: In this article, an off-line recognition system based on multifeature and multilevel classification is presented for handwritten Chinese characters, where 10 classes of multifeatures, such as peripheral shape features, stroke density features, and stroke direction features, are used in this system.
Abstract: In this paper, an off-line recognition system based on multifeature and multilevel classification is presented for handwritten Chinese characters. Ten classes of multifeatures, such as peripheral shape features, stroke density features, and stroke direction features, are used in this system. The multilevel classification scheme consists of a group classifier and a five-level character classifier, where two new technologies, overlap clustering and Gaussian distribution selector are developed. Experiments have been conducted to recognize 5,401 daily-used Chinese characters. The recognition rate is about 90 percent for a unique candidate, and 98 percent for multichoice with 10 candidates.

Book ChapterDOI
04 Nov 1998
TL;DR: A new approach to table structure recognition as well as to layout analysis that realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down.
Abstract: This paper presents a new approach to table structure recognition as well as to layout analysis The discussed recognition process differs significantly from existing approaches as it realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down The following analysis of the recognized layout elements is based on the construction of a tile structure and detects row- and/or column spanning cells as well as sparse tables with a high degree of confidence The overall system is completely domain independent, optionally neglects textual contents and can thus be applied to arbitrary mixed-mode documents (with or without tables) of any language and even operates on low quality OCR documents (eg facsimiles)

Journal ArticleDOI
TL;DR: A machine-printed and handwritten text classification method to automatically identify the identity of texts segmented from a document image to facilitate later optical character recognition task.

Patent
Robert Cooperman1
30 Apr 1998
TL;DR: In this paper, a method for detecting insets in the structure of a document page so as to further complement the document layout and textual information provided in an optical character recognition system is presented.
Abstract: The present invention is a method for detecting insets in the structure of a document page so as to further complement the document layout and textual information provided in an optical character recognition system. A system employing the present method preferably includes a document layout analysis system wherein the inset detection methodology is used to extend the capability of an associated character recognition package to more accurately recreate the document being processed.

Journal ArticleDOI
TL;DR: An NN classification scheme based on an enhanced multilayer perceptron (MLP) is presented and an end-to-end system for form-based handprint OCR applications designed by the National Institute of Standards and Technology (NIST) Visual Image Processing Group is described.
Abstract: Over the last five years or so, neural network (NN)-based approaches have been steadily gaining performance and popularity for a wide range of optical character recognition (OCR) problems, from isolated digit recognition to handprint recognition. We present an NN classification scheme based on an enhanced multilayer perceptron (MLP) and describe an end-to-end system for form-based handprint OCR applications designed by the National Institute of Standards and Technology (NIST) Visual Image Processing Group. The enhancements to the MLP are based on (i) neuron activations functions that reduce the occurrences of singular Jacobians; (ii) successive regularization to constrain the volume of the weight space; and (iii) Boltzmann pruning to constrain the dimension of the weight space. Performance characterization studies of NN systems evaluated at the first OCR systems conference and the NIST form-based handprint recognition system are also summarized.

Patent
Shmuel Ur1
07 Apr 1998
TL;DR: In this article, a computer system is provided for transferring graphical textual information into an application program, which consists of an information transfer means, activated in response to a user action on an input device of the computer system, for identifying on the computer display screen a user selected source of textual information, and transferring said textual information as a bit image into a first predetermined location of computer memory.
Abstract: A computer system is provided for transferring graphical textual information into an application program. The arrangement comprises an information transfer means, activated in response to a user action on an input device of the computer system, for identifying on the computer display screen a user selected source of textual information, and transferring said textual information as a bit image into a first predetermined location of the computer memory. The arrangement using optical character recognition logic (OCR) coupled to the information transfer means, for generating a character code for each character image identified in the image stored in the first memory location. The generated character codes are stored by said information transfer means into a second predetermined location of the computer memory. The source information is available to this second location to be inserted into a destination application program by being pasted into a user defined screen location.

Journal ArticleDOI
John D. Hobby1
TL;DR: A more robust procedure is to follow up by using an optimization algorithm to refine the transformation by finding a transformation that matches a scanned image to the machine-readable document description that was used to print the original.
Abstract: Since optical character recognition systems often require very large amounts of training data for optimum performance, it is important to automate the process of finding ground truth character identities for document images. This is done by finding a transformation that matches a scanned image to the machine-readable document description that was used to print the original. Rather than depend on finding feature points, a more robust procedure is to follow up by using an optimization algorithm to refine the transformation. The function to optimize can be based on the character bounding boxes – it is not necessary to have access to the actual character shapes used when printing the original.

Proceedings ArticleDOI
07 Jul 1998
TL;DR: A PC based number plate recognition system is presented, using the Niblack algorithm, which was found to outperform all binarization techniques previously used in similar systems.
Abstract: A PC based number plate recognition system is presented Digital gray-level images of cars are thresholded using the Niblack algorithm, which was found to outperform all binarization techniques previously used in similar systems A simple yet highly effective rule-based algorithm detects the position and size of number plates Characters are segmented from the thresholded plate using blob-colouring, and passed as 15/spl times/15 pixel bitmaps to a neural network based optical character recognition (OCR) system A novel dimension reduction technique reduces the neural network inputs from 225 to 50 features Six small networks in parallel are used, each recognising six characters The system can recognize single and double line plates under varying lighting conditions and slight rotation Successful recognition of complete registration plates is about 861%

Journal ArticleDOI
TL;DR: A methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented.

Proceedings Article
16 Aug 1998
TL;DR: A camera system which translates Japanese texts in a scene using a digital camera, which extracts character strings from a region which a user specifies, and translates them into English.
Abstract: We propose a camera system which translates Japanese texts in a scene. The system is portable and consists of four components: digital camera, character image extraction process, character recognition process, and translation process. The system extracts character strings from a region which specifies and translates them into English.

Journal ArticleDOI
TL;DR: A method for recognizing characters on graphical designs and a new projection feature that separates text-line regions from backgrounds, and adaptive thresholding in displacement matching are introduced are proposed.
Abstract: A method for recognizing characters on graphical designs is proposed. A new projection feature that separates text-line regions from backgrounds, and adaptive thresholding in displacement matching are introduced. Experimental results for newspaper headlines with graphical designs show a recognition rate of 97.7 percent.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: An automatic mosaicing process for document images is described, using an image pyramid and sequential similarity to reduce computation time and present results for binarised document images with data captured using a digital camera.
Abstract: If it is impossible to capture all the image in one scan with the available equipment, a montage can be made from separately scanned pieces. We describe an automatic mosaicing process for document images. The image shifts are found by a correlation technique, using an image pyramid and sequential similarity to reduce computation time. Image placement and overlap is used to reject incorrect solutions. We present results for binarised document images with data captured using a digital camera.

Proceedings ArticleDOI
16 Sep 1998
TL;DR: A system for reliably establishing correspondences between printed words and their electronic counterparts, without performing optical character recognition, which might have interesting applications in document database retrieval, since it allows an electronic document to be indexed by a printed version of itself.
Abstract: A common authoring technique involves making annotations on a printed draft and then typing the corrections into a computer at a later date. In this paper, we describe a system that goes some way towards automating this process. The author simply passes the annotated documents through a sheetfeed scanner and then brings up the electronic document in a text editor. The system then works out where the annotated words are and allows the author to skip from one annotation to the next at the touch of a key. At the heart of the system lies a procedure for reliably establishing correspondences between printed words and their electronic counterparts, without performing optical character recognition. This procedure might have interesting applications in document database retrieval, since it allows an electronic document to be indexed by a printed version of itself.