Author
Utpal Garain
Other affiliations: Tata Consultancy Services, Indian Institute of Information Technology and Management, Gwalior
Bio: Utpal Garain is an academic researcher from Indian Statistical Institute. The author has contributed to research in topics: Bengali & Optical character recognition. The author has an hindex of 25, co-authored 129 publications receiving 2019 citations. Previous affiliations of Utpal Garain include Tata Consultancy Services & Indian Institute of Information Technology and Management, Gwalior.
Papers published on a yearly basis
Papers
More filters
01 Nov 2002
TL;DR: A new technique is presented for identification and segmentation of touching characters based on fuzzy multifactorial analysis and a predictive algorithm is developed for effectively selecting possible cut columns for segmenting the touching characters.
Abstract: One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Existence of touching characters in the scanned documents is a major problem to design an effective character segmentation procedure. In this paper, a new technique is presented for identification and segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting possible cut columns for segmenting the touching characters. The proposed method has been applied to printed documents in Devnagari and Bangla: the two most popular scripts of the Indian sub-continent. The results obtained from a test-set of considerable size show that a reasonable improvement in recognition rate can be achieved with a modest increase in computations.
126 citations
01 Dec 2004
TL;DR: This paper aims at automatic understanding of online handwritten mathematical expressions (MEs) written on an electronic tablet using a context-free grammar to convert the input expressions into their corresponding T/sub E/X strings which are subsequently converted into MathML format.
Abstract: This paper aims at automatic understanding of online handwritten mathematical expressions (MEs) written on an electronic tablet. The proposed technique involves two major stages: symbol recognition and structural analysis. Combination of two different classifiers have been used to achieve high accuracy for the recognition of symbols. Several online and offline features are used in the structural analysis phase to identify the spatial relationships among symbols. A context-free grammar has been designed to convert the input expressions into their corresponding T/sub E/X strings which are subsequently converted into MathML format. Contextual information has been used to correct several structure interpretation errors. A new method for evaluating performance of the proposed system has been formulated. Experiments on a dataset of considerable size strongly support the feasibility of the proposed system.
117 citations
Posted Content•
TL;DR: It is found that the word2vec based query expansion methods perform similarly with and without any feedback information, and the proposed method fails to achieve comparable performance with statistical co-occurrence based feedback method such as RM3.
Abstract: In this paper a framework for Automatic Query Expansion (AQE) is proposed using distributed neural language model word2vec. Using semantic and contextual relation in a distributed and unsupervised framework, word2vec learns a low dimensional embedding for each vocabulary entry. Using such a framework, we devise a query expansion technique, where related terms to a query are obtained by K-nearest neighbor approach. We explore the performance of the AQE methods, with and without feedback query expansion, and a variant of simple K-nearest neighbor in the proposed framework. Experiments on standard TREC ad-hoc data (Disk 4, 5 with query sets 301-450, 601-700) and web data (WT10G data with query set 451-550) shows significant improvement over standard term-overlapping based retrieval methods. However the proposed method fails to achieve comparable performance with statistical co-occurrence based feedback method such as RM3. We have also found that the word2vec based query expansion methods perform similarly with and without any feedback information.
105 citations
23 Oct 2016
TL;DR: The competition results suggest that recognition of handwritten formulae remains a difficult structural pattern recognition task.
Abstract: This paper presents an overview of the 5th Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME). As in previous years, the main task is formula recognition from handwritten strokes (Task 1). Additional tasks include classification of isolated symbols (Task 2a), classification of isolated valid and invalid symbols (Task 2b), a new task on parsing formula structure from valid handwritten symbols (Task 3), and parsing expressions with matrices (Task 4, experimental). In total, eleven (11) research labs registered for the competition, with six (6) teams submitting results. Innovations for this CROHME included providing a corpus of formulae from Wikipedia to train language models, and an online system for result submission. The highest recognition rates were obtained by MyScript corporation (Task 1. 67.65%, 2a. 92.81%, 2b. 86.77%, 3. 84.38%, and 4. 68.40%). Using only provided training data, the highest recognition rates were obtained by WIRIS corporation (Task 1. 49.61%, Task 3. 78.80%, Task 4. 56.40%), the Tokyo University of Agriculture and Technology (Task 2a. 92.28%), and RIT (Task 2b. 83.34%). The competition results suggest that recognition of handwritten formulae remains a difficult structural pattern recognition task.
101 citations
01 Sep 2014
TL;DR: The outcome of the latest edition of the CROHME competition, dedicated to on-line handwritten mathematical expression recognition, features two new tasks, one dedicated to isolated symbol recognition including a reject option for invalid symbol hypotheses, and the second concerns recognizing expressions that contain matrices.
Abstract: We present the outcome of the latest edition of the CROHME competition, dedicated to on-line handwritten mathematical expression recognition. In addition to the standard full expression recognition task from previous competitions, CROHME 2014 features two new tasks. The first is dedicated to isolated symbol recognition including a reject option for invalid symbol hypotheses, and the second concerns recognizing expressions that contain matrices. System performance is improving relative to previous competitions. Data and evaluation tools used for the competition are publicly available.
94 citations
Cited by
More filters
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.
Abstract: Intensive research has been done on optical character recognition (OCR) and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of work on Indian language character recognition although there are 12 major scripts in India. In this paper, we present a review of the OCR work done on Indian language scripts. The review is organized into 5 sections. Sections 1 and 2 cover introduction and properties on Indian scripts. In Section 3, we discuss different methodologies in OCR development as well as research work done on Indian scripts recognition. In Section 4, we discuss the scope of future work and further steps needed for Indian script OCR development. In Section 5 we conclude the paper.
592 citations
03 Apr 2017
TL;DR: This work proposes a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that Matching with distributed representations complements matching with traditional local representations.
Abstract: Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or 'duet' performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
489 citations
Patent•
12 Mar 2010TL;DR: In this paper, a system and method for automatically providing content associated with captured information is described, in which the system receives input by a user, and automatically provides content or links to the information associated with the input.
Abstract: A system and method for automatically providing content associated with captured information is described. In some examples, the system receives input by a user, and automatically provides content or links to content associated with the input. In some examples, the system receives input via text entry or by capturing text from a rendered document, such as a printed document, an object, an audio stream, and so on.
438 citations
Patent•
06 Oct 2010TL;DR: In this paper, a device for capturing rendered text is described, which consists of one or more visual sensors that receive visual information as a part of capturing text, and a visual information disposition subsystem for disposing of visual information received by the visual sensors.
Abstract: A device for capturing rendered text is described. The device incorporates one or more visual sensors that receive visual information as a part of capturing rendered text. The visual sensors are collectively capable of capturing both text that is permanently printed on a page, and text that is displayed transitorily on a dynamic device. The device further incorporates a visual information disposition subsystem for disposing of visual information received by the visual sensors. The device further incorporates a package that bears the visual sensors and the visual information disposition subsystem, and is suitable to be held in a human hand.
420 citations
Patent•
01 Apr 2005TL;DR: A portable device having scanning, imaging or other data-capture capability is described in this paper, where the portable device can indicate to the user when enough information has been captured to uniquely identify a source document.
Abstract: A portable device having scanning, imaging or other data-capture capability is described. In some cases, the portable device can indicate to the user when enough information has been captured to uniquely identify a source document. In some cases, the portable device calculates timestamps and location-stamps indicating when and where a data capture occurred. In some cases, the portable device is controlled by gestures. In some cases, the portable scanning device has associated billing and content/service subscription information.
381 citations