scispace - formally typeset
Search or ask a question
Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.


Papers
More filters
Journal ArticleDOI
01 Jul 1992
TL;DR: The state of the art in handwriting recognition, especially in cursive word recognition, is surveyed, and some basic notions are reviewed in the field of picture recognition, particularly, line image recognition.
Abstract: The state of the art in handwriting recognition, especially in cursive word recognition, is surveyed, and some basic notions are reviewed in the field of picture recognition, particularly, line image recognition. The usefulness of 'regular' versus 'singular' classes of features is stressed. These notions are applied to obtain a graph, G, representing a line image, and also to find an 'axis' as the regular part of G. The complements to G of the axis are the 'tarsi', singular parts of G, which correspond to informative features of a cursive word. A segmentation of the graph is obtained, giving a symbolic description chain (SDC). Using one or more as robust anchors, possible words in a list of words are selected. Candidate words are examined to see if the other letters fit the rest of the SDC. Good results are obtained for clean images of words written by several persons. >

183 citations

Journal ArticleDOI
TL;DR: A probabilistic model for scene text recognition is introduced that integrates similarity, language properties, and lexical decision and is fusing information sources in one model to eliminate unrecoverable errors that result from sequential processing, improving accuracy.
Abstract: Scene text recognition (STR) is the recognition of text anywhere in the environment, such as signs and storefronts Relative to document recognition, it is challenging because of font variability, minimal language context, and uncontrolled conditions Much information available to solve this problem is frequently ignored or used sequentially Similarity between character images is often overlooked as useful information Because of language priors, a recognizer may assign different labels to identical characters Directly comparing characters to each other, rather than only a model, helps ensure that similar instances receive the same label Lexicons improve recognition accuracy but are used post hoc We introduce a probabilistic model for STR that integrates similarity, language properties, and lexical decision Inference is accelerated with sparse belief propagation, a bottom-up method for shortening messages by reducing the dependency between weakly supported hypotheses By fusing information sources in one model, we eliminate unrecoverable errors that result from sequential processing, improving accuracy In experimental results recognizing text from images of signs in outdoor scenes, incorporating similarity reduces character recognition error by 19 percent, the lexicon reduces word recognition error by 35 percent, and sparse belief propagation reduces the lexicon words considered by 999 percent with a 12X speedup and no loss in accuracy

182 citations

Proceedings ArticleDOI
20 Nov 2003
TL;DR: An integrated OCR system for mathematical documents, called INFTY, is presented, which shows high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.
Abstract: An integrated OCR system for mathematical documents, called INFTY, is presented. INFTY consists of four procedures, i.e., layout analysis, character recognition, structure analysis of mathematical expressions, and manual error correction. In those procedures, several novel techniques are utilized for better recognition performance. Experimental results on about 500 pages of mathematical documents showed high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

182 citations

Journal ArticleDOI
TL;DR: This work focuses on techniques that classify single-page typeset document images without using OCR results, and brings to light important issues in designing a document classifier, including the definition of document classes, the choices of document features and feature representation, and the choice of classification algorithm and learning mechanism.
Abstract: Document image classification is an important step in Office Automation, Digital Libraries, and other document image analysis applications. There is great diversity in document image classifiers: they differ in the problems they solve, in the use of training data to construct class models, and in the choice of document features and classification algorithms. We survey this diverse literature using three components: the problem statement, the classifier architecture, and performance evaluation. This brings to light important issues in designing a document classifier, including the definition of document classes, the choice of document features and feature representation, and the choice of classification algorithm and learning mechanism. We emphasize techniques that classify single-page typeset document images without using OCR results. Developing a general, adaptable, high-performance classifier is challenging due to the great variety of documents, the diverse criteria used to define document classes, and the ambiguity that arises due to ill-defined or fuzzy document classes.

181 citations

Proceedings ArticleDOI
27 Nov 1995
TL;DR: First experiments along highways in the Netherlands show that the CLPR-system has an error rate, of 0.02% at a recognition rate of 98.51%.
Abstract: A car license plate recognition system (CLPR-system) has been developed to identify vehicles by the contents of their license plate for speed-limit enforcement. This type of application puts high demands on the reliability of the CLPR-system. A combination of neural and fuzzy techniques is used to guarantee a very low error rate at an acceptable recognition rate. First experiments along highways in the Netherlands show that the system has an error rate, of 0.02% at a recognition rate of 98.51%. These results are also compared with other published CLPR-systems.

180 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023186
2022425
2021333
2020448
2019430
2018357