Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Use of a novel hash-table for speeding-up suggestions for misspelt Tamil words

[...]

Ratnasingam Sakuntharaj¹, Sinnathamby Mahesan²•Institutions (2)

Eastern University (United States)¹, University of Jaffna²

01 Dec 2017

TL;DR: An efficient approach to generating suggestions for misspelt words in Tamil language using n-gram technique on stemmed form of the words with two different hash-tables and the use of length of words in hash-table to speed up finding appropriate suggestions while reducing the number of inappropriate suggestions.

...read moreread less

Abstract: Spell checker is a tool that finds and corrects misspelt words in a text document. Spelling error detection and correction techniques are widely used by text editing systems, machine translation systems, optical character recognition systems, search engines and speech recognition systems. Though spell checkers for European languages and Indian languages are well developed, few for Tamil language, perhaps, because the fact that Tamil language is morphologically rich and agglutinative makes it a challenging task. An efficient approach to generating suggestions for misspelt words in Tamil language has been proposed in this paper. The proposed novel approach uses n-gram technique on stemmed form of the words with two different hash-tables and find the better one to generate most suitable alternatives to misspelt words by speeding up the lookup. The use of length of words in hash-table speed up finding appropriate suggestions while reducing the number of inappropriate suggestions. Test results show that the suggestions generated by the system are with 95% accuracy as approved by a Scholar in Tamil.

...read moreread less

48 citations

Patent•

License plate optical character recognition method and system

[...]

Aaron Michael Burry¹, Vladimir Kozitsky¹, Peter Paul¹•Institutions (1)

Xerox¹

18 Jan 2012

TL;DR: In this article, a method and system for recognizing a license plate character utilizing a machine learning classifier is presented. But the method is limited to a single image and cannot handle a large number of images.

...read moreread less

Abstract: A method and system for recognizing a license plate character utilizing a machine learning classifier. A license plate image with respect to a vehicle can be captured by an image capturing unit and the license plate image can be segmented into license plate character images. The character image can be preprocessed to remove a local background variation in the image and to define a local feature utilizing a quantization transformation. A classification margin for each character image can be identified utilizing a set of machine learning classifiers each binary in nature, for the character image. Each binary classifier can be trained utilizing a character sample as a positive class and all other characters as well as non-character images as a negative class. The character type associated with the classifier with a largest classification margin can be determined and the OCR result can be declared.

...read moreread less

48 citations

Proceedings Article•DOI•

A PDA-based sign translator

[...]

Jing Zhang, Xilin Chen¹, Jie Yang¹, Alex Waibel¹•Institutions (1)

Carnegie Mellon University¹

14 Oct 2002

TL;DR: An effective approach for a PDA-based sign system that efficiently embeds multi-resolution, adaptive search in a hierarchical framework with different emphases at each layer, and introduces an intensity-based OCR method to recognize characters in various fonts and lighting conditions.

...read moreread less

Abstract: In this paper, we propose an effective approach for a PDA-based sign system, and it presents user the sign translator. Its main functions include 3 parts: detection, recognition and translation. Automatic detection and recognition of text in natural scenes is a prerequisite forautomatic sign translator. In order to make the system robust for text detection in various natural scenes, the detection approach efficiently embeds multi-resolution, adaptive search in a hierarchical framework with different emphases at each layer. We also introduce an intensity-based OCR method to recognize character in various fonts and lighting condition, where we employ Gabor transform to obtain local features, and LDA for selection and classification of features. The recognition rate is 92.4% for the testing set got from the natural sign. Sign is different from the normal used sentence. It is brief, with a lot of abbreviations and place nouns. We here only briefly introduce a rule-based place name translation. We have integrated all these functions in a PDA, which can capture sign image, auto segment andrecognize the Chinese sign, and translate it into English.

...read moreread less

48 citations

Journal Article•DOI•

A new Arabic handwritten character recognition deep learning system (AHCR-DLS)

[...]

Hossam Magdy Balaha¹, Hesham A. Ali¹, Mohamed S. Saraya¹, Mahmoud Mohammed Badawy², Mahmoud Mohammed Badawy¹ - Show less +1 more•Institutions (2)

Mansoura University¹, Taibah University²

01 Jun 2021-Neural Computing and Applications

TL;DR: A deep learning (DL) system with two convolutional neural network (CNN) architectures (named HMB1 and HMB2); with the appliance of optimization, regularization, and dropout techniques is introduced to serve as a baseline for future research on handwritten Arabic text.

...read moreread less

Abstract: Optical character recognition for the English text may be considered one of the most important research topics, whether, printed or handwritten. Although excellent results have been reached in the English text, there is a lack of this type of research in the Arabic text. This is because of the nature of the Arabic alphabet, and the multiplicity of forms of the same letter. Arabic handwritten character recognition (AHCR) systems involve several issues, and challenges from finding a suitable, and public Arabic handwritten text dataset phase to recognition, and classification phase passing through segmentation, and feature extraction phases. The paper objectives are: Firstly, a large, and complex Arabic handwritten characters’ dataset (HMBD) is presented for training, testing, and validation phases, as well as, discussing its collection, preparation, cleaning, and preprocessing. Secondly, we introduce a deep learning (DL) system with two convolutional neural network (CNN) architectures (named HMB1 and HMB2); with the appliance of optimization, regularization, and dropout techniques. This system can serve as a baseline for future research on handwritten Arabic text. Different performance metrics were calculated such as accuracy, recall, precision, and F1. 16 experiments were applied to the described system using HMBD, and another two datasets: CMATER, and AIA9k. Experiments’ results were captured and compared to study the effects of weight initializers, optimizers, data augmentation, and regularization on overfitting, and accuracy. He Uniform weight initializer and AdaDelta optimizer reported the highest accuracies. Data augmentation showed an improvement in the accuracies. HMB1 reported testing accuracy of 98.4% with 865,840 records using augmentation on HMBD. CMATER and AIA9k datasets were used for validating the generalization. Data augmentation was applied, and the best results were 100%, and 99.0% for testing accuracies, respectively. A cross-over validation between the described architectures, and a previous state-of-the-art architecture, and dataset was performed in two phases. First, the previous control architecture cannot generalize for the presented dataset in the current study. Second, the study described architectures generalize for the control dataset, with higher accuracies (97.3%, and 96.8% for HMB1, and HMB2, respectively), than the reported accuracy in the selected control study.

...read moreread less

48 citations

Proceedings Article•DOI•

Automatic separation of machine-printed and hand-written text lines

[...]

Umapada Pal, Bidyut B. Chaudhuri

20 Sep 1999

TL;DR: This paper presents a classification scheme for both Bangla and Devnagari characters based on the structural and statistical features of the machine-printed and hand-written text lines and has an accuracy of about 98.3%.

...read moreread less

Abstract: There are many types of documents where machine-printed and hand-written texts appear intermixed. Since the optical character recognition (OCR) methodologies for machine-printed and hand-written texts are different, it is necessary to separate these two types of text before feeding them to the respective OCR systems. In this paper, we present such a scheme for both Bangla and Devnagari characters. The scheme is based on the structural and statistical features of the machine-printed and hand-written text lines. The classification scheme has an accuracy of about 98.3%.

...read moreread less

48 citations

Collapse

Network Information

Performance

Metrics

7,941

Papers

180,323

Citations

No. of papers in the topic in previous years
Year	Papers
2023	186
2022	425
2021	333
2020	448
2019	430
2018	357

Optical character recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics