scispace - formally typeset
Search or ask a question
Author

Gaurav Harit

Bio: Gaurav Harit is an academic researcher from Indian Institute of Technology, Jodhpur. The author has contributed to research in topics: Character (mathematics) & Image segmentation. The author has an hindex of 13, co-authored 73 publications receiving 523 citations. Previous affiliations of Gaurav Harit include Indian Institutes of Technology & Indian Institute of Technology Delhi.


Papers
More filters
Journal ArticleDOI
TL;DR: A thinning methodology applicable to character images that is novel in terms of its ability to adapt to local character shape while constructing the thinned skeleton while obtaining less spurious branches compared to other thinning methods.
Abstract: In this paper we propose a thinning methodology applicable to character images. It is novel in terms of its ability to adapt to local character shape while constructing the thinned skeleton. Our method does not produce many of the distortions in the character shapes which normally result from the use of existing thinning algorithms. The proposed thinning methodology is based on the medial axis of the character. The skeleton has a width of one pixel. As a by-product of our thinning approach, the skeleton also gets segmented into strokes in vector form. Hence further stroke segmentation is not required. We have conducted experiments with printed and handwritten characters in several scripts such as English, Bengali, Hindi, Kannada and Tamil. We obtain less spurious branches compared to other thinning methods. Our method does not use any kind of post processing.

8 citations

Journal ArticleDOI
TL;DR: In this paper, the authors developed topographic features of strokes visible with respect to views from different directions (e.g. North, South, East, and West) for optical character recognition (OCR).
Abstract: Feature selection and extraction plays an important role in different classification based problems such as face recognition, signature verification, optical character recognition (OCR) etc. The performance of OCR highly depends on the proper selection and extraction of feature set. In this paper, we present novel features based on the topography of a character as visible from different viewing directions on a 2D plane. By topography of a character we mean the structural features of the strokes and their spatial relations. In this work we develop topographic features of strokes visible with respect to views from different directions (e.g. North, South, East, and West). We consider three types of topographic features: closed region, convexity of strokes, and straight line strokes. These features are represented as a shape-based graph which acts as an invariant feature set for discriminating very similar type characters efficiently. We have tested the proposed method on printed and handwritten Bengali and Hindi character images. Initial results demonstrate the efficacy of our approach.

7 citations

Journal ArticleDOI
TL;DR: This work proposes a novel clustering strategy, tailored towards the specific requirements of clustering in video data, that takes care of many of the problems with traditional clustering schemes applied to the heterogeneous feature space of video.

7 citations

Book ChapterDOI
12 Jan 2012
TL;DR: A novel technique for detection of concave regions as a structural information of character images by analyzing the sequence of discrete turns taken to describe the character stroke, which has the added advantage of detecting same concave areas of a particular character written by different individuals.
Abstract: In this paper, we present a novel technique for detection of concave regions as a structural information of character images. The problem difficulty lies in reporting all concavities irrespective of the viewing direction on the 2D plane. In our approach, we detect concave regions by analyzing the sequence of discrete turns taken to describe the character stroke; hence, it becomes view-invariant. The proposed method has the added advantage of detecting same concave regions of a particular character written by different individuals. We have tested our method on printed and handwritten Bangla and Hindi isolated character images. Initial results demonstrate the efficacy of our approach.

7 citations

Proceedings ArticleDOI
23 Sep 2007
TL;DR: It is shown through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.
Abstract: In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents

7 citations


Cited by
More filters
Journal Article
TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.
Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

222 citations

Journal ArticleDOI
01 Apr 2007
TL;DR: Call for papers for Special Issue of ACM Transactions on Multimedia Computing, Communications and Applications on Interactive Digital Television.
Abstract: Call for papers for Special Issue of ACM Transactions on Multimedia Computing, Communications and Applications on Interactive Digital Television

201 citations

Journal ArticleDOI
TL;DR: A method for automatically obtaining object representations suitable for retrieval from generic video shots that includes associating regions within a single shot to represent a deforming object and an affine factorization method that copes with motion degeneracy.
Abstract: We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object's possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions. In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy. We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video. We illustrate the method on the feature film "Groundhog Day." Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).

162 citations

Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines and beats Tesseract's state of the art table detection system by a significant margin.
Abstract: Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to the reader in a structured manner. It is a hard problem due to varying layouts and encodings of the tables. Researchers have proposed numerous techniques for table detection based on layout analysis of documents. Most of these techniques fail to generalize because they rely on hand engineered features which are not robust to layout variations. In this paper, we have presented a deep learning based method for table detection. In the proposed method, document images are first pre-processed. These images are then fed to a Region Proposal Network followed by a fully connected neural network for table detection. The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines. We have done our evaluations on publicly available UNLV dataset where it beats Tesseract's state of the art table detection system by a significant margin.

159 citations