scispace - formally typeset
Search or ask a question
Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.


Papers
More filters
Patent
14 Dec 1992
TL;DR: In this article, a process and system for processing a digitally stored image on a digital computer is described, which scans and digitizes an image, separate text from non-text components, enhances and deskews the image, compresses the resulting image file, and stores the enhanced, deskewed, and compressed file for later transmission, optical character recognition, or high quality printing or viewing of the image.
Abstract: This specification discloses a process and system for processing a digitally stored image on a digital computer. The system scans and digitizes an image, separate text from non-text components, enhances and deskews the image, compresses the resulting image file, and stores the enhanced, deskewed, and compressed file for later transmission, optical character recognition, or high quality printing or viewing of the image.

62 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: A schema for the description of shapes of Devanagari characters and its application in their recognition is presented, which exploits certain features of the script in both reducing the search space and creating a reference with respect to which correspondence could be established, during the matching process.
Abstract: The paper presents a schema for the description of shapes of Devanagari characters and its application in their recognition. It exploits certain features of the script in both reducing the search space and creating a reference with respect to which correspondence could be established, during the matching process. The description prototypes are constructed using the real-life script after segmentation so that the aberrations introduced during the inevitable process of segmentation get accounted for in the description. This has been tested on printed Devanagari text with a success of approximately 70% without any post-processing and 88% correct recognition with the help of a word dictionary.

62 citations

Patent
02 Feb 1990
TL;DR: In this paper, the output of the neural network is processed by an optical character recognition post-processor, which corrects erroneous symbol identifications made by the network and identifies special symbols and symbol cases not identifiable by the neural networks following character normalization.
Abstract: Character images which are to be sent to a neural network trained to recognize a predetermined set of symbols are first processed by an optical character recognition pre-processor which normalizes the character images. The output of the neural network is processed by an optical character recognition post-processor. The post-processor corrects erroneous symbol identifications made by the neural network. The post-processor identifies special symbols and symbol cases not identifiable by the neural network following character normalization. For characters identified by the neural network with low scores, the post-processor attempts to find and separate adjacent characters which are kerned and characters which are touching. The touching characters are separated in one of nine successively initiated processes depending upon the geometric parameters of the image. When all else fails, the post-processor selects either the second or third highest scoring symbol identified by the neural network based upon the likelihood of the second or third highest scoring symbol being confused with the highest scoring symbol.

62 citations

Proceedings ArticleDOI
24 Jan 2011
TL;DR: Modifications to the text/non-text segmentation algorithm presented by Bloomberg are described which result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
Abstract: Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper describes modifications to the text/non-text segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.

62 citations

Journal ArticleDOI
TL;DR: An original hybrid MLP-SVM method for unconstrained handwritten digits recognition, based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors).
Abstract: This paper presents an original hybrid MLP-SVM method for unconstrained handwritten digits recognition. Specialized Support Vector Machines (SVMs) are introduced to improve significantly the multilayer perceptron (MLP) performance in local areas around the separating surfaces between each pair of digit classes, in the input pattern space. This hybrid architecture is based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors). Specialized local SVMs are introduced to detect the correct class among these two classification hypotheses. The hybrid MLP-SVM recognizer achieves a recognition rate of $98.01\%$ , for real mail zipcode digits recognition task. By introducing a rejection mechanism based on the distances provided by the local SVMs, the error/reject trade-off performance of our recognition system is better than several classifiers reported in recent research.

62 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023186
2022425
2021333
2020448
2019430
2018357