scispace - formally typeset
Search or ask a question
Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.


Papers
More filters
Journal ArticleDOI
TL;DR: A methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented.

47 citations

Proceedings ArticleDOI
S.-s. Kuo1, O.E. Agazzi1
27 Apr 1993
TL;DR: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented, where two statistical models, called pseudo-2D hidden Markov models (P2-DHMMs), are created for representing the actual keyword and all the other extraneous words, respectively.
Abstract: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistical models, called pseudo-2D hidden Markov models (P2-DHMMs), are created for representing the actual keyword and all the other extraneous words, respectively. Dynamic programming is then used for matching an unknown input word with the two models and making a maximum likelihood decision. Although the models are pseudo 2-D in the sense that they are not fully connected 2-D networks, they are shown to be general enough to characterize printed words efficiently. These models facilitate a nice 'elastic matching' property in both horizontal and vertical directions, which makes the recognizer not only independent of size and slant but also tolerant of highly deformed and noisy words. The system is evaluated on a synthetically created database which contains about 26000 words. A recognition accuracy of 99% is achieved when words in testing and training sets are in the same font size. An accuracy of 96% is achieved when they are in different sizes. In the latter case, the conventional 1-D HMM approach achieves only 70% accuracy rate. >

47 citations

Journal ArticleDOI
TL;DR: Empirical study shows that the proposed artificial immune system (AIS)-based pattern classification approach exhibits very good generalization ability in generating a smaller prototype library from a larger one and at the same time giving a substantial improvement in the classification accuracy of the underlying NN classifier.
Abstract: Artificial immune system (AIS)-based pattern classification approach is relatively new in the field of pattern recognition. The study explores the potentiality of this paradigm in the context of prototype selection task that is primarily effective in improving the classification performance of nearest-neighbor (NN) classifier and also partially in reducing its storage and computing time requirement. The clonal selection model of immunology has been incorporated to condense the original prototype set, and performance is verified by employing the proposed technique in a practical optical character recognition (OCR) system as well as for training and testing of a set of benchmark databases available in the public domain. The effect of control parameters is analyzed and the efficiency of the method is compared with another existing techniques often used for prototype selection. In the case of the OCR system, empirical study shows that the proposed approach exhibits very good generalization ability in generating a smaller prototype library from a larger one and at the same time giving a substantial improvement in the classification accuracy of the underlying NN classifier. The improvement in performance has been statistically verified. Consideration of both OCR data and public domain datasets demonstrate that the proposed method gives results better than or at least comparable to that of some existing techniques.

47 citations

Proceedings ArticleDOI
M. Hamanaka1, Keiji Yamada, J. Tsukumo
20 Oct 1993
TL;DR: It is shown that an offline character recognition method is effective for use in an online Japanese character recognition, and has been improved with developments in nonlinear shape normalization, nonlinear pattern matching, and the normalization-cooperated feature extraction method.
Abstract: It is shown that an offline character recognition method is effective for use in an online Japanese character recognition. Major conventional online recognition methods have restricted the number and the order of strokes. The offline method removes these restrictions, based on pattern matching of orientation feature patterns. It has been improved with developments in nonlinear shape normalization, nonlinear pattern matching, and the normalization-cooperated feature extraction method. It was used to examine 52,944 online Kanji characters in 1,064 categories. The recognition rate achieved 95.1%, and the cumulation recognition rate within the best five candidates was 99.3%. >

47 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: A two-stage approach for word-wise identification of English, Devnagari and Bengali (Bangla) scripts is proposed, which allows identifying scripts with high speed, yet less accuracy when dealing with noisy data.
Abstract: A two-stage approach for word-wise identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed. This approach balances the tradeoff between recognition accuracy and processing speed. The 1st stage allows identifying scripts with high speed, yet less accuracy when dealing with noisy data. The advanced 2nd stage processes only those samples that yield low recognition confidence in the first stage. For both stages a rough character segmentation is performed and features are computed on segmented character components. Features used in the 1st stage are a 64-dimensional chain-code-histogram feature, while 400-dimensional gradient features are used in the 2nd stage. Final classification of a word to a particular script is done via majority voting of each recognized character component of the word. Extensive experiments with various confidence scores were conducted and reported here. The overall recognition accuracy and speed is remarkable. Correct classification of 98.51% on 11,123 test words is achieved, even when the recognition-confidence is as high as 95% at both stages.

47 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023186
2022425
2021333
2020448
2019430
2018357