scispace - formally typeset
Search or ask a question
Author

Praveen Krishnan

Bio: Praveen Krishnan is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Word (computer architecture) & Word recognition. The author has an hindex of 11, co-authored 24 publications receiving 481 citations. Previous affiliations of Praveen Krishnan include Facebook & Indian Institutes of Information Technology.

Papers
More filters
Proceedings ArticleDOI
01 Aug 2018
TL;DR: A modified CNN-RNN hybrid architecture is proposed with a major focus on effective training using: (i) efficient initialization of network using synthetic data for pretraining, (ii) image normalization for slant correction and (iii) domain specific data transformation and distortion for learning important invariances.
Abstract: The success of deep learning based models have centered around recent architectures and the availability of large scale annotated data. In this work, we explore these two factors systematically for improving handwritten recognition for scanned off-line document images. We propose a modified CNN-RNN hybrid architecture with a major focus on effective training using: (i) efficient initialization of network using synthetic data for pretraining, (ii) image normalization for slant correction and (iii) domain specific data transformation and distortion for learning important invariances. We perform a detailed ablation study to analyze the contribution of individual modules and present state of art results for the task of unconstrained line and word recognition on popular datasets such as IAM, RIMES and GW.

141 citations

Proceedings ArticleDOI
24 Apr 2018
TL;DR: This work proposes an End2End embedding framework which jointly learns both the text and image embeddings using state of the art deep convolutional architectures to enhance word spotting and recognition.
Abstract: Deep convolutional features for word images and textual embedding schemes have shown great success in word spotting. In this work, we follow these motivations to propose an End2End embedding framework which jointly learns both the text and image embeddings using state of the art deep convolutional architectures. The three major contributions of this work are: (i) an End2End embedding scheme to learn a common representation for word images and its labels, (ii) building a state of art word image descriptor and demonstrating its utility as off-the-shelf features for word spotting, and (iii) use of synthetic data as a complementary modality to further enhance word spotting and recognition. On the challenging IAM handwritten dataset, we report a mAP of 0.9509 for query-by-string based retrieval task. Under lexicon based word recognition, our proposed method report a 2.66 and 5.10 CER and WER respectively.

93 citations

Proceedings ArticleDOI
01 Oct 2016
TL;DR: A deep convolutional feature representation is proposed that achieves superior performance for word spotting and recognition for handwritten images and enables query-by-string by learning a common subspace for image and text using the embedded attribute framework.
Abstract: We propose a deep convolutional feature representation that achieves superior performance for word spotting and recognition for handwritten images. We focus on: -(i) enhancing the discriminative ability of the convolutional features using a reduced feature representation that can scale to large datasets, and (ii) enabling query-by-string by learning a common subspace for image and text using the embedded attribute framework. We present our results on popular datasets such as the IAM corpus and historical document collections from the Bentham and George Washington pages. On the challenging IAM dataset, we achieve a state of the art mAP of 91.58% on word spotting using textual queries and a mean word error rate of 6.69% for the word recognition task.

80 citations

Posted Content
TL;DR: This work renders synthetic data using open source fonts and incorporate data augmentation schemes and releases 9M synthetic handwritten word image corpus which could be useful for training deep network architectures and advancing the performance in handwritten word spotting and recognition tasks.
Abstract: Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner. In this work, we exploit such a framework for data generation in handwritten domain. We render synthetic data using open source fonts and incorporate data augmentation schemes. As part of this work, we release 9M synthetic handwritten word image corpus which could be useful for training deep network architectures and advancing the performance in handwritten word spotting and recognition tasks.

48 citations

Journal ArticleDOI
TL;DR: In this paper, the authors used a deep convolutional neural network with traditional classification loss to learn an efficient holistic representation for handwritten word images, which leads to state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size.
Abstract: We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our representation leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. On the challenging iam dataset, our method is first to report an mAP of around 0.90 for word spotting with a representation size of just 32 dimensions. Furthermore, we also present results on printed document datasets in English and Indic scripts which validates the generic nature of the proposed framework for learning word image representation.

47 citations


Cited by
More filters
17 Dec 2010
TL;DR: The authors survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000, using a corpus of digitized texts containing about 4% of all books ever printed.
Abstract: L'article, publie dans Science, sur une des premieres utilisations analytiques de Google Books, fondee sur les n-grammes (Google Ngrams) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can ...

735 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of the major applications of deep learning covering variety of areas is presented, study of the techniques and architectures used and further the contribution of that respective application in the real world are presented.
Abstract: Nowadays, deep learning is a current and a stimulating field of machine learning. Deep learning is the most effective, supervised, time and cost efficient machine learning approach. Deep learning is not a restricted learning approach, but it abides various procedures and topographies which can be applied to an immense speculum of complicated problems. The technique learns the illustrative and differential features in a very stratified way. Deep learning methods have made a significant breakthrough with appreciable performance in a wide variety of applications with useful security tools. It is considered to be the best choice for discovering complex architecture in high-dimensional data by employing back propagation algorithm. As deep learning has made significant advancements and tremendous performance in numerous applications, the widely used domains of deep learning are business, science and government which further includes adaptive testing, biological image classification, computer vision, cancer detection, natural language processing, object detection, face recognition, handwriting recognition, speech recognition, stock market analysis, smart city and many more. This paper focuses on the concepts of deep learning, its basic and advanced architectures, techniques, motivational aspects, characteristics and the limitations. The paper also presents the major differences between the deep learning, classical machine learning and conventional learning approaches and the major challenges ahead. The main intention of this paper is to explore and present chronologically, a comprehensive survey of the major applications of deep learning covering variety of areas, study of the techniques and architectures used and further the contribution of that respective application in the real world. Finally, the paper ends with the conclusion and future aspects.

499 citations

Journal ArticleDOI
TL;DR: In this article , the authors divide the concepts and essential techniques necessary for realizing the Metaverse into three components (i.e., hardware, software, and contents) rather than marketing or hardware approach to conduct a comprehensive analysis.
Abstract: Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is based on the social value of Generation Z that online and offline selves are not different. With the technological development of deep learning-based high-precision recognition models and natural generation models, Metaverse is being strengthened with various factors, from mobile-based always-on access to connectivity with reality using virtual currency. The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse. This paper divides the concepts and essential techniques necessary for realizing the Metaverse into three components (i.e., hardware, software, and contents) and three approaches (i.e., user interaction, implementation, and application) rather than marketing or hardware approach to conduct a comprehensive analysis. Furthermore, we describe essential methods based on three components and techniques to Metaverse’s representative Ready Player One, Roblox, and Facebook research in the domain of films, games, and studies. Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges.

313 citations

Journal ArticleDOI
TL;DR: This paper divides the concepts and essential techniques necessary for realizing the Metaverse into three components (i.e., hardware, software, and contents) and three approaches and describes essential methods based on three components and techniques to Metaverse’s representative Ready Player One, Roblox, and Facebook research in the domain of films, games, and studies.
Abstract: Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is based on the social value of Generation Z that online and offline selves are not different. With the technological development of deep learning-based high-precision recognition models and natural generation models, Metaverse is being strengthened with various factors, from mobile-based always-on access to connectivity with reality using virtual currency. The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse. This paper divides the concepts and essential techniques necessary for realizing the Metaverse into three components (i.e., hardware, software, and contents) and three approaches (i.e., user interaction, implementation, and application) rather than marketing or hardware approach to conduct a comprehensive analysis. Furthermore, we describe essential methods based on three components and techniques to Metaverse’s representative Ready Player One, Roblox, and Facebook research in the domain of films, games, and studies. Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges.

241 citations

Proceedings ArticleDOI
Zhi Qiao1, Yu Zhou1, Dongbao Yang1, Yucan Zhou1, Weiping Wang1 
14 Jun 2020
TL;DR: This work proposes a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts and integrates the state-of-the-art ASTER method into the proposed framework as an exemplar.
Abstract: Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape. Nevertheless, they still face lots of challenges like image blur, uneven illumination, and incomplete characters. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts. The semantic information is used both in the encoder module for supervision and in the decoder module for initializing. In particular, the state-of-the-art ASTER method is integrated into the proposed framework as an exemplar. Extensive experiments demonstrate that the proposed framework is more robust for low-quality text images, and achieves state-of-the-art results on several benchmark datasets. The source code will be available.

189 citations