scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Deep optical character recognition: a case of Pashto language

01 Mar 2020-Journal of Electronic Imaging (Society of Photo-optical Instrumentation Engineers)-Vol. 29, Iss: 2, pp 023002
TL;DR: The proposed deep transfer-based learning has achieved phenomenal recognition rates for PashTo ligatures on benchmark FAST-NU Pashto dataset.
Abstract: Over the past decades, text recognition technologies have focused immensely on noncursive isolated scripts. A text recognition system for the cursive Pashto script will serve as a great contribution, allowing the traditional, cultural, and educational Pashto literature to be converted into machine-readable form. We propose the use of deep learning architectures based on the transfer learning for the recognition of Pashto ligatures. For recognition analysis and evaluation, the ligature images in the dataset are preprocessed by data augmentation techniques, i.e., negatives, contours, and rotated to increase the variation of each sample and size of the original dataset. Rich feature representations are automatically extracted from the Pashto ligature images using deep convolution layers of the convolution neural network (CNN) architectures using fine-tuned approach. Pretrained CNN architectures: AlexNet, GoogleNet, and VGG (VGG-16 and VGG-19) are used for classification by feeding the extracted features to a fully connected layer and a softmax layer. The proposed deep transfer-based learning has achieved phenomenal recognition rates for Pashto ligatures on benchmark FAST-NU Pashto dataset. An accuracy of 97.24%, 97.46%, and 99.03% is achieved using AlexNext, GoogleNet, and VGGNet architectures, respectively.
Citations
More filters
Journal ArticleDOI
TL;DR: VGG architecture outperforms the state-of-the-art techniques and number of architectures of conveNet in Alzheimer’s disease detection, and achieves an identification test set accuracy of 99.27% (MCI/AD), 98.89% (AD/CN) and 97.06% ( MCI/CN).
Abstract: Machine learning and deep learning play a crucial role in identification of various diseases like neurological, skin, eyes, blood and cancers. The deep learning algorithms can be performed promising for prediction of Alzheimer’s disease using MRI scans. Alzheimer disease becoming more common in the people (age 65 years or above). The disease becomes severe before the symptoms appear and causes brain disorder that cannot be cured by medicines and other therapies and treatments. So the early diagnosis is necessary to slow down its progression. Detection and prevention of Alzheimer disease is one of the active research area for the researchers nowadays. In this paper, we employed architectures of convoutional networks using freeze features extracted from source data set ImageNet for binary and ternary classification. All experiments were carried out using Alzheimer’s disease national initiative (ADNI) data set consisting of MRI scans. The performance of proposed system demonstrates for classification of Alzheimer’s disease versus mild cognitive impairment, normal controls versus mild cognitive impairment, and cognitive normal versus Alzheimer’s disease. The results of proposed study show that VGG architecture outperforms the state-of-the-art techniques and number of architectures of conveNet (AlexNet, GoogLeNet, ResNet, DenseNet, Inceptionv3, InceptionResNet) in Alzheimer’s disease detection, and achieves an identification test set accuracy of 99.27% (MCI/AD), 98.89% (AD/CN) and 97.06% (MCI/CN).

41 citations

Journal ArticleDOI
TL;DR: A novel binarization technique is found to be capable of handling almost all types of degradations without any parameter tuning, and is the winner of DIBCO 2019 competition.
Abstract: Binarization of document images still attracts the researchers especially when degraded document images are considered. This is evident from the recent Document Image Binarization Competition (DIBCO 2019) where we can see researchers from all over the world participated in this competition. In this paper, we present a novel binarization technique which is found to be capable of handling almost all types of degradations without any parameter tuning. Present method is based on an ensemble of three classical clustering algorithms (Fuzzy C-means, K-medoids and K-means++) to group the pixels as foreground or background, after application of a coherent image normalization method. It has been tested on four publicly available datasets, used in DIBCO series, 2016, 2017, 2018 and 2019. Present method gives promising results for the aforementioned datasets. In addition, this method is the winner of DIBCO 2019 competition.

19 citations

Journal ArticleDOI
TL;DR: This article proposes a ligature‐based recognition system for the cursive Pashto script using four pre‐trained CNN models using a fine‐tuned approach and achieved the highest recognition rate of up to 99.31% using the DenseNet architecture of Convolutional Neural Network for PashTo ligature.

6 citations


Cites methods from "Deep optical character recognition:..."

  • ...Zahoor et al. (2020) employed alexnet, GoogLeNet and VGGNet for Pashtu ligature identification and reported the highet accuracy of 99.03% using VGGNet....

    [...]

Journal ArticleDOI
TL;DR: The Deep CNN model is the best model in terms of accuracy and loss as compared to the other two models on the recognition of Pashto handwritten digits and characters combined.
Abstract: Pashto is one of the most ancient and historical languages in the world and is spoken in Pakistan and Afghanistan. Various languages like Urdu, English, Chinese, and Japanese have OCR applications, but very little work has been conducted on the Pashto language in this perspective. It becomes more difficult for OCR applications to recognize handwritten characters and digits, because handwriting is influenced by the writer’s hand dynamics. Moreover, there was no publicly available dataset for handwritten Pashto digits before this study. Due to this, there was no work performed on the recognition of Pashto handwritten digits and characters combined. To achieve this objective, a dataset of Pashto handwritten digits consisting of 60,000 images was created. The trio deep learning Convolutional Neural Network, i.e., CNN, LeNet, and Deep CNN were trained and tested with both Pashto handwritten characters and digits datasets. From the simulations, the Deep CNN achieved 99.42 percent accuracy for Pashto handwritten digits, 99.17 percent accuracy for handwritten characters, and 70.65 percent accuracy for combined digits and characters. Similarly, LeNet and CNN models achieved slightly less accuracies (LeNet; 98.82, 99.15, and 69.82 percent and CNN; 98.30, 98.74, and 66.53 percent) for Pashto handwritten digits, Pashto characters, and the combined Pashto digits and characters recognition datasets, respectively. Based on these results, the Deep CNN model is the best model in terms of accuracy and loss as compared to the other two models.

6 citations

Journal ArticleDOI
TL;DR: A gold-standard Pashto dataset and a segmentation app, one of the first open access datasets which directly maps line images to their corresponding text in the Paghto language, and the development of a segmentsation app using textbox expanding algorithms.
Abstract: The article aims to introduce a gold-standard Pashto dataset and a segmentation app. The Pashto dataset consists of 300 line images and corresponding Pashto text from three selected books. A line image is simply an image consisting of one text line from a scanned page. To our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the Pashto language. We also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to OCR segmentation. The authors discuss the steps to build a Pashto dataset and develop our unique approach to segmentation. The article starts with the nature of the Pashto alphabet and its unique diacritics which require special considerations for segmentation. Needs for datasets and a few available Pashto datasets are reviewed. Criteria of selection of data sources are discussed and three books were selected by our language specialist from the Afghan Digital Repository. The authors review previous segmentation methods and introduce a new approach to segmentation for Pashto content. The segmentation app and results are discussed to show readers how to adjust variables for different books. Our unique segmentation approach uses an expanding textbox method which performs very well given the nature of the Pashto scripts. The app can also be used for Persian and other languages using the Arabic writing system. The dataset can be used for OCR training, OCR testing, and machine learning applications related to content in Pashto.

4 citations

References
More filters
Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Proceedings Article
01 Jan 2009

694 citations


"Deep optical character recognition:..." refers background or methods in this paper

  • ...ImageNet is one of the largest benchmark datasets used for computer vision research and development.(19) It is a dataset of images that is based on WordNet....

    [...]

  • ..., ImageNet.(19) Target pretrained CNN models are created using TL....

    [...]

  • ...The CNN architectures (AlexNet, GoogleNet, and VGGNet) are used for training on the source dataset, i.e., ImageNet.19 Target pretrained CNN models are created using TL....

    [...]

Posted Content
TL;DR: In this paper, state-of-the-art deep learning architecture and its optimization used for medical image segmentation and classification is discussed. And the challenges deep learning based methods for medical imaging and open research issue are discussed.
Abstract: Healthcare sector is totally different from other industry. It is on high priority sector and people expect highest level of care and services regardless of cost. It did not achieve social expectation even though it consume huge percentage of budget. Mostly the interpretations of medical data is being done by medical expert. In terms of image interpretation by human expert, it is quite limited due to its subjectivity, the complexity of the image, extensive variations exist across different interpreters, and fatigue. After the success of deep learning in other real world application, it is also providing exciting solutions with good accuracy for medical imaging and is seen as a key method for future applications in health secotr. In this chapter, we discussed state of the art deep learning architecture and its optimization used for medical image segmentation and classification. In the last section, we have discussed the challenges deep learning based methods for medical imaging and open research issue.

300 citations

Proceedings ArticleDOI
16 Oct 2016
TL;DR: Experimental results show a high accuracy on the food/non-food classification and food recognition using a GoogLeNet model based on deep convolutional neural network.
Abstract: Recent past has seen a lot of developments in the field of image-based dietary assessment. Food image classification and recognition are crucial steps for dietary assessment. In the last couple of years, advancements in the deep learning and convolutional neural networks proved to be a boon for the image classification and recognition tasks, specifically for food recognition because of the wide variety of food items. In this paper, we report experiments on food/non-food classification and food recognition using a GoogLeNet model based on deep convolutional neural network. The experiments were conducted on two image datasets created by our own, where the images were collected from existing image datasets, social media, and imaging devices such as smart phone and wearable cameras. Experimental results show a high accuracy of 99.2% on the food/non-food classification and 83.6% on the food category recognition.

170 citations


"Deep optical character recognition:..." refers methods in this paper

  • ...GoogleNet is an efficient DNN architecture developed by Szegedy and was the winner of the ILSVRC held in the year 2014.(20) It consists of a total of 22 layers and 4 million parameters....

    [...]