scispace - formally typeset
Search or ask a question
Book ChapterDOI

An application of deep learning in character recognition: an overview

TL;DR: In this chapter, a detailed account of the state of art deep learning techniques for Arabic like script, Latin script and symbolic script is taken.
Abstract: For automated document analysis, OCR (Optical character recognition) is a basic building block. The robust automated document analysis system can have impact over a wider sphere of life. Many of the researchers have been working hard to build OCR systems in various languages with significant degree of accuracy, character recognition rate and minimum error rate. Deep learning is the start of art technique with efficient and accurate result as compared to other techniques. Every language, moreover every script have its own challenges e.g. scripts where characters are well separated are less challenging as compared to cursive scripts where characters are attached with one another. In this chapter, we would take a detailed account of the state of art deep learning techniques for Arabic like script, Latin script and symbolic script.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed deep transfer-based learning has achieved phenomenal recognition rates for PashTo ligatures on benchmark FAST-NU Pashto dataset.
Abstract: Over the past decades, text recognition technologies have focused immensely on noncursive isolated scripts. A text recognition system for the cursive Pashto script will serve as a great contribution, allowing the traditional, cultural, and educational Pashto literature to be converted into machine-readable form. We propose the use of deep learning architectures based on the transfer learning for the recognition of Pashto ligatures. For recognition analysis and evaluation, the ligature images in the dataset are preprocessed by data augmentation techniques, i.e., negatives, contours, and rotated to increase the variation of each sample and size of the original dataset. Rich feature representations are automatically extracted from the Pashto ligature images using deep convolution layers of the convolution neural network (CNN) architectures using fine-tuned approach. Pretrained CNN architectures: AlexNet, GoogleNet, and VGG (VGG-16 and VGG-19) are used for classification by feeding the extracted features to a fully connected layer and a softmax layer. The proposed deep transfer-based learning has achieved phenomenal recognition rates for Pashto ligatures on benchmark FAST-NU Pashto dataset. An accuracy of 97.24%, 97.46%, and 99.03% is achieved using AlexNext, GoogleNet, and VGGNet architectures, respectively.

11 citations

Proceedings ArticleDOI
05 Jul 2019
TL;DR: A hybrid deep neural network architecture with skip connections, which combines convolutional and recurrent neural network, is proposed to recognize the Urdu scene text.
Abstract: In this work, we present a benchmark and a hybrid deep neural network for Urdu Text Recognition in natural scene images. Recognizing text in natural scene images is a challenging task, which has attracted the attention of computer vision and pattern recognition communities. In recent years, scene text recognition has widely been studied where; state-of-the-art results are achieved by using deep neural network models. However, most of the research works are performed for English text and a less concentration is given to other languages. In this paper, we investigate the problem of Urdu text recognition in natural scene images. Urdu is a type of cursive text written from right to left direction where, two or more characters are joined to form a word. Recognizing cursive text in natural images is considered an open problem due to variations in its representation. A hybrid deep neural network architecture with skip connections, which combines convolutional and recurrent neural network, is proposed to recognize the Urdu scene text. We introduce a new dataset of 11500 manually cropped Urdu word images from natural scenes and show the baseline results. The network is trained on the whole word image avoiding the traditional character based classification. Data augmentation technique with contrast stretching and histogram equalizer is used to further enhance the size of the dataset. The experimental results on original and augmented word images show state-of-the-art performance of the network.

4 citations


Cites background from "An application of deep learning in ..."

  • ...Most of the works are performed on isolated character recognition [14] as automatic segmentation of Urdu text is a challenging task....

    [...]

Journal ArticleDOI
TL;DR: It was found that the main marketing problems solved with machine learning were related to consumer behavior, recommender systems, forecasting, marketing segmentation, and text analysis—content analysis.
Abstract: Even though machine learning (ML) applications are not novel, they have gained popularity partly due to the advance in computing processing. This study explores the adoption of ML methods in marketing applications through a bibliographic review of the period 2008–2022. In this period, the adoption of ML in marketing has grown significantly. This growth has been quite heterogeneous, varying from the use of classical methods such as artificial neural networks to hybrid methods that combine different techniques to improve results. Generally, maturity in the use of ML in marketing and increasing specialization in the type of problems that are solved were observed. Strikingly, the types of ML methods used to solve marketing problems vary wildly, including deep learning, supervised learning, reinforcement learning, unsupervised learning, and hybrid methods. Finally, we found that the main marketing problems solved with machine learning were related to consumer behavior, recommender systems, forecasting, marketing segmentation, and text analysis—content analysis.

2 citations

Posted Content
TL;DR: An enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed and achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.
Abstract: Online Arabic cursive character recognition is still a big challenge due to the existing complexities including Arabic cursive script styles, writing speed, writer mood and so forth. Due to these unavoidable constraints, the accuracy of online Arabic character's recognition is still low and retain space for improvement. In this research, an enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed. Each extracted stroke feature divides every isolated character into some meaningful pattern known as tokens. A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function. In this work, two milestones are achieved; firstly, attain a fixed number of tokens, secondly, minimize the number of the most repetitive tokens. For experiments, handwritten Arabic characters are selected from the OHASD benchmark dataset to test and evaluate the proposed method. The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.
References
More filters
Proceedings ArticleDOI
16 Jun 2012
TL;DR: In this paper, a biologically plausible, wide and deep artificial neural network architectures was proposed to match human performance on tasks such as the recognition of handwritten digits or traffic signs, achieving near-human performance.
Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

3,717 citations

Journal ArticleDOI
TL;DR: This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies, significantly outperforming a state-of-the-art HMM-based system.
Abstract: Recognizing lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to exploit surrounding context, has led to low recognition rates for even the best current recognizers. Most recent progress in the field has been made either through improved preprocessing or through advances in language modeling. Relatively little work has been done on the basic recognition algorithms. Indeed, most systems rely on the same hidden Markov models that have been used for decades in speech and handwriting recognition, despite their well-known shortcomings. This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies. In experiments on two large unconstrained handwriting databases, our approach achieves word recognition accuracies of 79.7 percent on online data and 74.1 percent on offline data, significantly outperforming a state-of-the-art HMM-based system. In addition, we demonstrate the network's robustness to lexicon size, measure the individual influence of its hidden layers, and analyze its use of context. Last, we provide an in-depth discussion of the differences between the network and HMMs, suggesting reasons for the network's superior performance.

1,686 citations

Proceedings Article
01 Nov 2012
TL;DR: This paper combines the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows them to use a common framework to train highly-accurate text detector and character recognizer modules.
Abstract: Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.

900 citations

Book ChapterDOI
08 Dec 2008
TL;DR: This paper introduces a globally trained offline handwriting recogniser that takes raw pixel data as input and does not require any alphabet specific preprocessing, and can therefore be used unchanged for any language.
Abstract: Offline handwriting recognition—the automatic transcription of images of handwritten text—is a challenging task that combines computer vision with sequence learning. In most systems the two elements are handled separately, with sophisticated preprocessing techniques used to extract the image features and sequential models such as HMMs used to provide the transcriptions. By combining two recent innovations in neural networks—multidimensional recurrent neural networks and connectionist temporal classification—this paper introduces a globally trained offline handwriting recogniser that takes raw pixel data as input. Unlike competing systems, it does not require any alphabet specific preprocessing, and can therefore be used unchanged for any language. Evidence of its generality and power is provided by data from a recent international Arabic recognition competition, where it outperformed all entries (91.4% accuracy compared to 87.2% for the competition winner) despite the fact that neither author understands a word of Arabic.

729 citations

Proceedings ArticleDOI
18 Sep 2011
TL;DR: This work applies the same architecture to NIST SD 19, a more challenging dataset including lower and upper case letters, and obtains the best results published so far for both NIST digits and NIST letters.
Abstract: In 2010, after many years of stagnation, the MNIST handwriting recognition benchmark record dropped from 0.40% error rate to 0.35%. Here we report 0.27% for a committee of seven deep CNNs trained on graphics cards, narrowing the gap to human performance. We also apply the same architecture to NIST SD 19, a more challenging dataset including lower and upper case letters. A committee of seven CNNs obtains the best results published so far for both NIST digits and NIST letters. The robustness of our method is verified by analyzing 78125 different 7-net committees.

504 citations