scispace - formally typeset
Search or ask a question
Author

Sanjana Gunna

Bio: Sanjana Gunna is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Word recognition & Unicode font. The author has co-authored 2 publications.
Topics: Word recognition, Unicode font, Telugu, Malayalam

Papers
More filters
Book ChapterDOI
05 Sep 2021
TL;DR: In this article, the authors investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages and show that the learned features of the models transferred from other Indian languages are visually closer (and sometimes even better) to the individual model features than those transferred from English.
Abstract: Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We perform experiments on the conventional CRNN model and STAR-Net to ensure generalisability. To study the effect of change in different scripts, we initially run our experiments on synthetic word images rendered using Unicode fonts. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages due to similarity in their n-gram distributions and visual features like the vowels and conjunct characters. We then study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We also demonstrate that the learned features of the models transferred from other Indian languages are visually closer (and sometimes even better) to the individual model features than those transferred from English. We finally set new benchmarks for scene-text recognition on Hindi, Telugu, and Malayalam datasets from IIIT-ILST and Bangla dataset from MLT-17 by achieving 6%, 5%, 2%, and 23% gains in Word Recognition Rates (WRRs) compared to previous works. We further improve the MLT-17 Bangla results by plugging in a novel correction BiLSTM into our model. We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words. WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.

4 citations

Book ChapterDOI
05 Sep 2021
TL;DR: In this article, the authors compare various features like the size (width and height) of the word images and word length statistics and discover that these factors are critical for the scene-text recognition systems.
Abstract: Scene-text recognition is remarkably better in Latin languages than the non-Latin languages due to several factors like multiple fonts, simplistic vocabulary statistics, updated data generation tools, and writing systems. This paper examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages. We compare various features like the size (width and height) of the word images and word length statistics. Over the last decade, generating synthetic datasets with powerful deep learning techniques has tremendously improved scene-text recognition. Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images. We discover that these factors are critical for the scene-text recognition systems. The English synthetic datasets utilize over 1400 fonts while Arabic and other non-Latin datasets utilize less than 100 fonts for data generation. Since some of these languages are a part of different regions, we garner additional fonts through a region-based search to improve the scene-text recognition models in Arabic and Devanagari. We improve the Word Recognition Rates (WRRs) on Arabic MLT-17 and MLT-19 datasets by \(24.54\%\) and \(2.32\%\) compared to previous works or baselines. We achieve WRR gains of \(7.88\%\) and \(3.72\%\) for IIIT-ILST and MLT-19 Devanagari datasets.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work investigates the significant differences in Indian and Latin Scene Text Recognition (STR) systems and presents utilizing additional non-Unicode fonts with generally employed Unicode fonts to cover font diversity in such synthesizers for Indian languages.
Abstract: Reading Indian scene texts is complex due to the use of regional vocabulary, multiple fonts/scripts, and text size. This work investigates the significant differences in Indian and Latin Scene Text Recognition (STR) systems. Recent STR works rely on synthetic generators that involve diverse fonts to ensure robust reading solutions. We present utilizing additional non-Unicode fonts with generally employed Unicode fonts to cover font diversity in such synthesizers for Indian languages. We also perform experiments on transfer learning among six different Indian languages. Our transfer learning experiments on synthetic images with common backgrounds provide an exciting insight that Indian scripts can benefit from each other than from the extensive English datasets. Our evaluations for the real settings help us achieve significant improvements over previous methods on four Indian languages from standard datasets like IIIT-ILST, MLT-17, and the new dataset (we release) containing 440 scene images with 500 Gujarati and 2535 Tamil words. Further enriching the synthetic dataset with non-Unicode fonts and multiple augmentations helps us achieve a remarkable Word Recognition Rate gain of over 33% on the IIIT-ILST Hindi dataset. We also present the results of lexicon-based transcription approaches for all six languages.

1 citations

TL;DR: This thesis looks at all the parameters involved in the process of text recognition and determines the importance of those parameters through thorough experiments and proposes an error correction module for correcting the labels by utilizing the training data of real test datasets.
Abstract: Text recognition has been an active field in computer vision even before the beginning of the deep learning era. Due to the varied applications of recognition models, the research area has been classified into diverse categories based on the domain of the data used. Optical character recognition (OCR) is focused on scanned documents, whereas images with natural scenes and much complex backgrounds fall into the category of scene text recognition. Scene text recognition has become an exciting area of research due to the complexities and difficulties such as complex backgrounds, improper illumination, distorted images with noise, inconsistent usage of fonts and font sizes that are not usually horizontally aligned. Such cases make the task of scene text recognition more complicated and challenging. In recent years, we have observed the rise of deep learning. Subsequently, there has been an incremental growth in the recognition algorithms and datasets available for training and testing purposes. This surge has caused the performance of recognizing text in natural scenes to rise above the baseline models that were previously trained using hand-crafted features. Latin texts were the center of attention in most of these works and did not profoundly investigate the field of scene text recognition for non-Latin languages. Upon scrutiny, we observe that the performance of the current best recognition models has reached above 90% over scene text benchmark datasets. However, these recognition models do not perform as well on non-Latin languages as they did on Latin (or English) datasets. This striking difference in the performances over different languages is a rising concern among the researchers focusing on lowresource languages, and it is indeed the motivation behind our work. Scene text recognition in low-resource non-Latin languages is difficult and challenging due to the inherent complex scripts, multiple writing systems, various fonts and orientations. Despite such differences, we can also achieve Latin (English) text-like performance for low-resource non-Latin languages. In this thesis, we look at all the parameters involved in the process of text recognition and determine the importance of those parameters through thorough experiments. We use synthetic data for controlled experiments where we test the parameters as mentioned earlier in an isolated fashion to effectively identify the catalysts of text recognition. We analyse the complexity of the scripts via these synthetic data experiments. We present the results of our experiments on two baseline models, CRNN and STAR-Net models, on available datasets to ensure generalisability. In addition to this, we also propose an error correction module for correcting the labels by utilizing the training data of real test datasets.
Proceedings ArticleDOI
23 Jan 2023
TL;DR: In this article , the N-gram linguistic models such as uni-gram, bigram, trigram, maximum likely hood estimation, Laplace and add1 smoothing have been used for automatic word completion and word prediction to save time, keystrokes and misspelling.
Abstract: Automated word prediction commonly called as language modeling is the task of predicting the next Word. Word completion and word prediction phenomenally benefit the disabled users who use physical keyboard or virtual keyboards in desktops and handheld systems. The objective of this paper is to predict the next word prediction and sequence of words prediction in Telugu language sentences with stochastic approach. The N-gram linguistic models such as uni-gram, bigram, tri-gram, maximum likely hood estimation, Laplace and add1 smoothing have used for automatic completion of a sentence with the prediction of appropriate word to save time, keystrokes and misspelling. We have used large data corpus of Telugu language from Telugu Wiki pages, which has words from different domains to predict correct word.
Proceedings ArticleDOI
19 Jan 2023
TL;DR: In this article , the hyper-parameter of the CRNN system is optimized using Taguchi's method of optimization, which shows that the hyperparameter optimized CRNN network performs better than the traditional CRNN-based systems.
Abstract: The Devanagari script is one of the most widely used scripts worldwide. The existing deep learning-based optical character recognition system for printed Devanagari scripts using Convolutional Neural Network – Recurrent Neural Network, or CRNN is not robust enough to recognize any randomly printed Devanagari scanned document. At present, the hyper-parameters of the CRNN system are selected randomly either with the trial-and-error or grid search methods. Moreover, there is no optimized way to choose the hyper-parameters of the CRNN, which improves the recognition accuracy for Devanagari documents. Furthermore, the lack of standard Devanagari script datasets has hampered the development of word recognizers. In this paper, the hyper-parameter of the CRNN system is optimized using Taguchi's method of optimization. The performance of the hyper-parameters optimized CRNN system is compared with the current state-of-the-art text recognition CRNN network. The results reveal that the CRNN optimized with Taguchi's method performs better than the CRNN-based systems.