scispace - formally typeset
Search or ask a question
Topic

Optical character recognition

About: Optical character recognition is a research topic. Over the lifetime, 7342 publications have been published within this topic receiving 158193 citations. The topic is also known as: OCR & optical character reader.


Papers
More filters
Proceedings ArticleDOI
24 Apr 2018
TL;DR: A Convolutional Neural Network based Optical Character Recognition system (OCR) which accurately digitizes Ancient Sanskrit manuscripts (Devanagari Script) that are not necessarily in good condition.
Abstract: Ancient Sanskrit manuscripts are a rich source of knowledge about Science, Mathematics, Hindu mythology, Indian civilization, and culture. It therefore becomes critical that access to these manuscripts is made easy, to share this knowledge with the world and to facilitate further research on this Ancient literature. In this paper, we propose a Convolutional Neural Network (CNN) based Optical Character Recognition system (OCR) which accurately digitizes Ancient Sanskrit manuscripts (Devanagari Script) that are not necessarily in good condition. We use an image segmentation algorithm for calculating pixel intensities to identify letters in the image. The OCR considers typical compound characters (half letter combinations) as separate classes in order to improve the segmentation accuracy. The novelty of the OCR is its robustness to image quality, image contrast, font style and font size, which makes it an ideal choice for digitizing soiled and poorly maintained Sanskrit manuscripts.

40 citations

Journal ArticleDOI
TL;DR: This paper focuses on the applicability of the features inspired by the visual ventral stream for handwritten character recognition, and an analysis is conducted to evaluate the robustness of this approach to orientation, scale and translation distortions.
Abstract: This paper focuses on the applicability of the features inspired by the visual ventral stream for handwritten character recognition. A set of scale and translation invariant C2 features are first extracted from all images in the dataset. Three standard classifiers kNN, ANN and SVM are then trained over a training set and then compared over a separate test set. In order to achieve higher recognition rate, a two stage classifier was designed with different preprocessing in the second stage. Experiments performed to validate the method on the well-known MNIST database, standard Farsi digits and characters, exhibit high recognition rates and compete with some of the best existing approaches. Moreover an analysis is conducted to evaluate the robustness of this approach to orientation, scale and translation distortions.

40 citations

Proceedings ArticleDOI
10 Sep 2001
TL;DR: A new connected component based segmentation algorithm which automatically extracts text regions from natural scene images is proposed in this paper, utilizing a multichannel decomposition method to locate text blocks in complex backgrounds.
Abstract: A new connected component based segmentation algorithm which automatically extracts text regions from natural scene images is proposed in this paper. This approach utilizes a multichannel decomposition method to locate text blocks in complex backgrounds. Block alignment analysis and recognition confidence values are used in the combination and identification of the connected components. The algorithm is applied to a test image database and shows promising results.

40 citations

Book
01 Jan 2007
TL;DR: This book discusses OCR Technologies for Machine Printed and Hand Printed Japanese Text, Meta-Data Extraction from Bibliographic Documents for the Digital Library, and Biometric and Forensic Aspects of Digital Document Processing.
Abstract: Reading Systems: An Introduction to Digital Document Processing.- Document Structure and Layout Analysis.- OCR Technologies for Machine Printed and Hand Printed Japanese Text.- Multi-Font Printed Tibetan OCR.- On OCR of a Printed Indian Script.- A Bayesian Network Approach for On-line Handwriting Recognition.- New Advances and New Challenges in On-Line Handwriting Recognition and Electronic Ink Management.- Off-Line Roman Cursive Handwriting Recognition.- Robustness Design of Industrial Strength Recognition Systems.- Arabic Cheque Processing System: Issues and Future Trends.- OCR of Printed Mathematical Expressions.- The State of the Art of Document Image Degradation Modelling.- Advances in Graphics Recognition.- An Introduction to Super-Resolution Text.- Meta-Data Extraction from Bibliographic Documents for the Digital Library.- Document Information Retrieval.- Biometric and Forensic Aspects of Digital Document Processing.- Web Document Analysis.- Semantic Structure Analysis of Web Documents.- Bank Cheque Data Mining: Integrated Cheque Recognition Technologies.

40 citations

Journal Article
TL;DR: In this article, an optical character recognition system for printed Urdu, a popular Pakistani/Indian script and is the third largest understandable language in the world, especially in the subcontinent but fewer efforts are made to make it understandable to computers.
Abstract: This paper deals with an Optical Character Recognition system for printed Urdu, a popular Pakistani/Indian script and is the third largest understandable language in the world, especially in the subcontinent but fewer efforts are made to make it understandable to computers. Lot of work has been done in the field of literature and Islamic studies in Urdu, which has to be computerized. In the proposed system individual characters are recognized using our own proposed method/ algorithms. The feature detection methods are simple and robust. Supervised learning is used to train the feed forward neural network. A prototype of the system has been tested on printed Urdu characters and currently achieves 98.3% character level accuracy on average .Although the system is script/ language independent but we have designed it for Urdu characters only.

40 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023186
2022425
2021333
2020448
2019430
2018357