Bio: Asghar Ali is an academic researcher from University of New South Wales. The author has contributed to research in topics: Convolutional neural network & Cursive. The author has an hindex of 3, co-authored 4 publications receiving 26 citations.
••12 Mar 2018
TL;DR: A Convolutional Neural Network is applied as a classifier, as CNN approaches have been reported to provide high accuracy for natural scene text detection and recognition.
Abstract: In this paper we investigate the challenging problem of cursive text recognition in natural scene images. In particular, we have focused on isolated Urdu character recognition in natural scenes that could not be handled by tradition Optical Character Recognition (OCR) techniques developed for Arabic and Urdu scanned documents. We also present a dataset of Urdu characters segmented from images of signboards, street scenes, shop scenes and advertisement banners containing Urdu text. A variety of deep learning techniques have been proposed by researchers for natural scene text detection and recognition. In this work, a Convolutional Neural Network (CNN) is applied as a classifier, as CNN approaches have been reported to provide high accuracy for natural scene text detection and recognition. A dataset of manually segmented characters was developed and deep learning based data augmentation techniques were applied to further increase the size of the dataset. The training is formulated using filter sizes of 3x3, 5x5 and mixed 3x3 and 5x5 with a stride value of 1 and 2. The CNN model is trained with various learning rates and state-of-the-art results are achieved.
01 Dec 2018
TL;DR: A new Convolutional Neural Network architecture is proposed for synthetic Urdu and English character recognition in natural scene images using three separate sub-models of the CNN which are then fused in one feature vector.
Abstract: In this paper, a new Convolutional Neural Network (CNN) architecture is proposed for synthetic Urdu and English character recognition in natural scene images. The features are extracted using three separate sub-models of the CNN which are then fused in one feature vector. The network is purely trained on the synthetic character images of English and Urdu texts in natural images. For English text, the Chars74k-Font dataset is used and for Urdu text, the synthetic dataset is created by automatically cropping the image patches from four background image datasets and then putting characters at random positions within the image patch. The network is evaluated on a combined synthetic dataset of English and Urdu characters and the separate synthetic characters of Urdu and English datasets. The experimental results show that the network performs well on synthetic datasets.
••05 Jul 2019
TL;DR: A hybrid deep neural network architecture with skip connections, which combines convolutional and recurrent neural network, is proposed to recognize the Urdu scene text.
Abstract: In this work, we present a benchmark and a hybrid deep neural network for Urdu Text Recognition in natural scene images. Recognizing text in natural scene images is a challenging task, which has attracted the attention of computer vision and pattern recognition communities. In recent years, scene text recognition has widely been studied where; state-of-the-art results are achieved by using deep neural network models. However, most of the research works are performed for English text and a less concentration is given to other languages. In this paper, we investigate the problem of Urdu text recognition in natural scene images. Urdu is a type of cursive text written from right to left direction where, two or more characters are joined to form a word. Recognizing cursive text in natural images is considered an open problem due to variations in its representation. A hybrid deep neural network architecture with skip connections, which combines convolutional and recurrent neural network, is proposed to recognize the Urdu scene text. We introduce a new dataset of 11500 manually cropped Urdu word images from natural scenes and show the baseline results. The network is trained on the whole word image avoiding the traditional character based classification. Data augmentation technique with contrast stretching and histogram equalizer is used to further enhance the size of the dataset. The experimental results on original and augmented word images show state-of-the-art performance of the network.
••01 Sep 2019
TL;DR: The aim of this dataset is to help the research community for algorithm development and evaluation of Urdu text in natural scenes, and contains ground truths in the form of bounding boxes at the word level, the script of the text and the text-transcription.
Abstract: Multi-lingual text in natural scene images conveys useful information and is a fundamental tool for tourists to interact with their environment. Multi-lingual text detection and recognition in natural scenes, therefore, has become a challenging problem for researchers in the last few years. Recently, a large-scale multi-lingual dataset for scene text detection and script identification is published by the ICDAR which, contains scene images with text in six different scripts including Arabic. This paper presents a novel dataset and benchmark for Urdu text in natural scenes. Currently, no dataset for Urdu text in natural scenes is publicly available. Urdu is a type of cursive language, which is derived from Arabic script and uses many similar alphabet characters. Therefore, the proposed dataset could be helpful for multi-lingual text detection, recognition and script identification. The aim of this dataset is to help the research community for algorithm development and evaluation of Urdu text in natural scenes. The Urdu-Text dataset contains 1400 complete scene images and 8200-segmented words. The images in the dataset contain a broad variety of text instances in multi-orientations with small and large font sizes. The dataset contains ground truths in the form of bounding boxes at the word level, the script of the text and the text-transcription. The performance of three deep neural networks is evaluated to measure the robustness of the Urdu-Text dataset.
TL;DR: This research work addresses the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data and presents a systematic literature review of state-of-the-art techniques for a variety of big data.
Abstract: Process of information extraction (IE) is used to extract useful information from unstructured or semi-structured data. Big data arise new challenges for IE techniques with the rapid growth of multifaceted also called as multidimensional unstructured data. Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems. It is necessary to understand the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data. Numerous studies have been conducted on IE, addressing the challenges and issues for different data types such as text, image, audio and video. Very limited consolidated research work have been conducted to investigate the task-dependent and task-independent limitations of IE covering all data types in a single study. This research work address this limitation and present a systematic literature review of state-of-the-art techniques for a variety of big data, consolidating all data types. Recent challenges of IE are also identified and summarized. Potential solutions are proposed giving future research directions in big data IE. The research is significant in terms of recent trends and challenges related to big data analytics. The outcome of the research and recommendations will help to improve the big data analytics by making it more productive.
TL;DR: A methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images and Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98.
Abstract: Urdu text is a cursive script and belongs to a non-Latin family of other cursive scripts like Arabic, Chinese, and Hindi. Urdu text poses a challenge for detection/localization from natural scene images, and consequently recognition of individual ligatures in scene images. In this paper, a methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images. As a first step, the custom FasterRCNN algorithm has been used in conjunction with well-known CNNs like Squeezenet, Googlenet, Resnet18, and Resnet50 for detection and localization purposes for images of size $320\times 240$ pixels. For ligature Orientation prediction, a custom Regression Residual Neural Network (RRNN) is trained/tested on datasets containing randomly oriented ligatures. Recognition of ligatures was done using Two Stream Deep Neural Network (TSDNN). In our experiments, five-set of datasets, containing 4.2K and 51K Urdu-text-embedded synthetic images were generated using the CLE annotation text to evaluate different tasks of detection, orientation prediction, and recognition of ligatures. These synthetic images contain 132, and 1600 unique ligatures corresponding to 4.2K and 51K images respectively, with 32 variations of each ligature (4-backgrounds and font 8-color variations). Also, 1094 real-world images containing more than 12k Urdu characters were used for TSDNN’s evaluation. Finally, all four detectors were evaluated and used to compare them for their ability to detect/localize Urdu-text using average-precision (AP). Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98. While Squeeznet, Googlenet, Resnet18 based detectors had testing AP of.65,.88, and.87 respectively. RRNN achieved and accuracy of 79% and 99% for 4k and 51K images respectively. Similarly, for characters classification in ligatures, TSDNN attained a partial sequence recognition rate of 94.90% and 95.20% for 4k and 51K images respectively. Similarly, a partial sequence recognition rate of 76.60% attained for real world-images.
01 Jan 2021
TL;DR: A weighted naïve Bayes classifier (WNBC)-based deep learning process is used in this framework to effectively detect the text and to recognize the character from the scene images.
Abstract: Text obtained in natural scenes contains various information; therefore, it is extensively used in various applications to understand the image scenarios and also to retrieve the visual information. The semantic information provided by this scene image is very much valuable for human beings to realize the whole environment. But the text in such natural images depicts a flexible appearance in an unconstrained environment which makes the text identification and character recognition process a more challenging one. Therefore, a weighted naive Bayes classifier (WNBC)-based deep learning process is used in this framework to effectively detect the text and to recognize the character from the scene images. Normally, the natural scene images may carry some kind of noise in it, and to remove that, the guided image filter is introduced at the pre-processing stage. The features that are useful for the classification process are extracted using the Gabor transform and stroke width transform techniques. Finally, with these extracted features, the text detection and character recognition is successfully achieved by WNBC and deep neural network-based adaptive galactic swarm optimization. Then, the performance metrics such as accuracy, F1-score, precision, mean absolute error, mean square error and recall metrics are evaluated to estimate the adeptness of the proposed method.
TL;DR: A multi-scale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images is proposed and experimental results show that the aggregation of multi- scale and multilevel features and their fusion is more effective, and outperforms other methods on the Urdu character image and Chars74K datasets.
Abstract: The accuracy of current natural scene text recognition algorithms is limited by the poor performance of character recognition methods for these images. The complex backgrounds, variations in the writing, text size, orientations, low resolution and multi-language text make recognition of text in natural images a complex and challenging task. Conventional machine learning and deep learning-based methods have been developed that have achieved satisfactory results, but character recognition for cursive text such as Arabic and Urdu scripts in natural images is still an open research problem. The characters in the cursive text are connected and are difficult to segment for recognition. Variations in the shape of a character due to its different positions within a word make the recognition task more challenging than non-cursive text. Optical character recognition (OCR) techniques proposed for Arabic and Urdu scanned documents perform very poorly when applied to character recognition in natural images. In this paper, we propose a multi-scale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images. The network first aggregates multi-scale features of the convolutional layers by up-sampling and addition operations and then combines them with the high-level features. Finally, the outputs of the MSFA and MLFF networks are fused together to create more robust and powerful features. A comprehensive dataset of segmented Urdu characters is developed for the evaluation of the proposed network models. Synthetic text on the patches of images with real natural scene backgrounds is generated to increase the samples of infrequently used characters. The proposed model is evaluated on the Chars74K and ICDAR03 datasets. To validate the proposed model on the new Urdu character image dataset, we compare its performance with the histogram of oriented gradients (HoG) method. The experimental results show that the aggregation of multi-scale and multilevel features and their fusion is more effective, and outperforms other methods on the Urdu character image and Chars74K datasets.