scispace - formally typeset
Search or ask a question
Author

Gaurav Harit

Bio: Gaurav Harit is an academic researcher from Indian Institute of Technology, Jodhpur. The author has contributed to research in topics: Character (mathematics) & Image segmentation. The author has an hindex of 13, co-authored 73 publications receiving 523 citations. Previous affiliations of Gaurav Harit include Indian Institutes of Technology & Indian Institute of Technology Delhi.


Papers
More filters
Journal ArticleDOI
TL;DR: A review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India, and the various methodologies and their reported results are presented.
Abstract: The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

70 citations

Journal ArticleDOI
TL;DR: A functional unobtrusive Indian sign language recognition system was implemented and tested on real world data and proposes a method for a novel, low-cost and easy-to-use application, for Indian Sign Language recognition, using the Microsoft Kinect camera.
Abstract: People with speech disabilities communicate in sign language and therefore have trouble in mingling with the able-bodied. There is a need for an interpretation system which could act as a bridge between them and those who do not know their sign language. A functional unobtrusive Indian sign language recognition system was implemented and tested on real world data. A vocabulary of 140 symbols was collected using 18 subjects, totalling 5041 images. The vocabulary consisted mostly of two-handed signs which were drawn from a wide repertoire of words of technical and daily-use origins. The system was implemented using Microsoft Kinect which enables surrounding light conditions and object colour to have negligible effect on the efficiency of the system. The system proposes a method for a novel, low-cost and easy-to-use application, for Indian Sign Language recognition, using the Microsoft Kinect camera. In the fingerspelling category of our dataset, we achieved above 90% recognition rates for 13 signs and 100% recognition for 3 signs with overall 16 distinct alphabets (A, B, D, E, F, G, H, K, P, R, T, U, W, X, Y, Z) recognised with an average accuracy rate of 90.68%.

49 citations

Journal ArticleDOI
TL;DR: The novelty of the approach lies in the formulation of appropriate rules of character decomposition for segmenting the character skeleton into stroke segments and then grouping them for extraction of meaningful shape components.

35 citations

Journal ArticleDOI
TL;DR: This work proposes a new action scoring system termed as Reference Guided Regression (RGR), which comprises a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and a Score Estimation Module that uses the resemblance of a video with a reference video to give the assessment score.
Abstract: Automated vision-based score estimation models can be used to provide an alternate opinion to avoid judgment bias. Existing works have learned score estimation models by regressing the video representation to ground truth score provided by judges. However, such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video, which would capture the temporal variations vis-a-vis the reference video and map those variations to the final score. In this work, we propose a new action scoring system termed as Reference Guided Regression (RGR) , which comprises (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and (2) a Score Estimation Module that uses the first module to find the resemblance of a video with a reference video to give the assessment score. The proposed scoring model is tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.

29 citations

Journal ArticleDOI
TL;DR: A contour-based thinning method used for performing skeletonization of printed noisy isolated character images by using shape characteristics of text to get skeleton of nearly same as the true character shape.

27 citations


Cited by
More filters
Journal Article
TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.
Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

222 citations

Journal ArticleDOI
01 Apr 2007
TL;DR: Call for papers for Special Issue of ACM Transactions on Multimedia Computing, Communications and Applications on Interactive Digital Television.
Abstract: Call for papers for Special Issue of ACM Transactions on Multimedia Computing, Communications and Applications on Interactive Digital Television

201 citations

Journal ArticleDOI
TL;DR: A method for automatically obtaining object representations suitable for retrieval from generic video shots that includes associating regions within a single shot to represent a deforming object and an affine factorization method that copes with motion degeneracy.
Abstract: We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object's possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions. In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy. We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video. We illustrate the method on the feature film "Groundhog Day." Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).

162 citations

Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines and beats Tesseract's state of the art table detection system by a significant margin.
Abstract: Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to the reader in a structured manner. It is a hard problem due to varying layouts and encodings of the tables. Researchers have proposed numerous techniques for table detection based on layout analysis of documents. Most of these techniques fail to generalize because they rely on hand engineered features which are not robust to layout variations. In this paper, we have presented a deep learning based method for table detection. In the proposed method, document images are first pre-processed. These images are then fed to a Region Proposal Network followed by a fully connected neural network for table detection. The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines. We have done our evaluations on publicly available UNLV dataset where it beats Tesseract's state of the art table detection system by a significant margin.

159 citations