Topic
Optical character recognition
About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented.
47 citations
••
27 Apr 1993TL;DR: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented, where two statistical models, called pseudo-2D hidden Markov models (P2-DHMMs), are created for representing the actual keyword and all the other extraneous words, respectively.
Abstract: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistical models, called pseudo-2D hidden Markov models (P2-DHMMs), are created for representing the actual keyword and all the other extraneous words, respectively. Dynamic programming is then used for matching an unknown input word with the two models and making a maximum likelihood decision. Although the models are pseudo 2-D in the sense that they are not fully connected 2-D networks, they are shown to be general enough to characterize printed words efficiently. These models facilitate a nice 'elastic matching' property in both horizontal and vertical directions, which makes the recognizer not only independent of size and slant but also tolerant of highly deformed and noisy words. The system is evaluated on a synthetically created database which contains about 26000 words. A recognition accuracy of 99% is achieved when words in testing and training sets are in the same font size. An accuracy of 96% is achieved when they are in different sizes. In the latter case, the conventional 1-D HMM approach achieves only 70% accuracy rate. >
47 citations
••
TL;DR: Empirical study shows that the proposed artificial immune system (AIS)-based pattern classification approach exhibits very good generalization ability in generating a smaller prototype library from a larger one and at the same time giving a substantial improvement in the classification accuracy of the underlying NN classifier.
Abstract: Artificial immune system (AIS)-based pattern classification approach is relatively new in the field of pattern recognition. The study explores the potentiality of this paradigm in the context of prototype selection task that is primarily effective in improving the classification performance of nearest-neighbor (NN) classifier and also partially in reducing its storage and computing time requirement. The clonal selection model of immunology has been incorporated to condense the original prototype set, and performance is verified by employing the proposed technique in a practical optical character recognition (OCR) system as well as for training and testing of a set of benchmark databases available in the public domain. The effect of control parameters is analyzed and the efficiency of the method is compared with another existing techniques often used for prototype selection. In the case of the OCR system, empirical study shows that the proposed approach exhibits very good generalization ability in generating a smaller prototype library from a larger one and at the same time giving a substantial improvement in the classification accuracy of the underlying NN classifier. The improvement in performance has been statistically verified. Consideration of both OCR data and public domain datasets demonstrate that the proposed method gives results better than or at least comparable to that of some existing techniques.
47 citations
••
NEC1
TL;DR: It is shown that an offline character recognition method is effective for use in an online Japanese character recognition, and has been improved with developments in nonlinear shape normalization, nonlinear pattern matching, and the normalization-cooperated feature extraction method.
Abstract: It is shown that an offline character recognition method is effective for use in an online Japanese character recognition. Major conventional online recognition methods have restricted the number and the order of strokes. The offline method removes these restrictions, based on pattern matching of orientation feature patterns. It has been improved with developments in nonlinear shape normalization, nonlinear pattern matching, and the normalization-cooperated feature extraction method. It was used to examine 52,944 online Kanji characters in 1,064 categories. The recognition rate achieved 95.1%, and the cumulation recognition rate within the best five candidates was 99.3%. >
47 citations
••
26 Jul 2009TL;DR: A two-stage approach for word-wise identification of English, Devnagari and Bengali (Bangla) scripts is proposed, which allows identifying scripts with high speed, yet less accuracy when dealing with noisy data.
Abstract: A two-stage approach for word-wise identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed. This approach balances the tradeoff between recognition accuracy and processing speed. The 1st stage allows identifying scripts with high speed, yet less accuracy when dealing with noisy data. The advanced 2nd stage processes only those samples that yield low recognition confidence in the first stage. For both stages a rough character segmentation is performed and features are computed on segmented character components. Features used in the 1st stage are a 64-dimensional chain-code-histogram feature, while 400-dimensional gradient features are used in the 2nd stage. Final classification of a word to a particular script is done via majority voting of each recognized character component of the word. Extensive experiments with various confidence scores were conducted and reported here. The overall recognition accuracy and speed is remarkable. Correct classification of 98.51% on 11,123 test words is achieved, even when the recognition-confidence is as high as 95% at both stages.
47 citations