Home
/
Authors
/
Gaurav Harit

Author

Gaurav Harit

Other affiliations: Indian Institutes of Technology, Indian Institute of Technology Delhi, Indian Institute of Technology Kharagpur

Bio: Gaurav Harit is an academic researcher from Indian Institute of Technology, Jodhpur. The author has contributed to research in topics: Character (mathematics) & Image segmentation. The author has an hindex of 13, co-authored 73 publications receiving 523 citations. Previous affiliations of Gaurav Harit include Indian Institutes of Technology & Indian Institute of Technology Delhi.

Papers published on a yearly basis

2023
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Detecting Missed and Anomalous Action Segments Using Approximate String Matching Algorithm

[...]

Hiteshi Jain¹, Gaurav Harit¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

16 Dec 2017

TL;DR: An exemplar based Approximate String Matching (ASM) technique is proposed for detecting such anomalous and missing segments in action sequences and shows promising alignment and missed/anomalous notification results over this dataset.

...read moreread less

Abstract: We forget action steps and perform some unwanted action movements as amateur performers during our daily exercise routine, dance performances, etc. To improve our proficiency, it is important that we get a feedback on our performances in terms of where we went wrong. In this paper, we propose a framework for analyzing and issuing reports of action segments that were missed or anomalously performed. This involves comparing the performed sequence with the standard action sequence and notifying when misalignments occur. We propose an exemplar based Approximate String Matching (ASM) technique for detecting such anomalous and missing segments in action sequences. We compare the results with those obtained from the conventional Dynamic Time Warping (DTW) algorithm for sequence alignment. It is seen that the alignment of the action sequences under conventional DTW fails in the presence of missed action segments and anomalous segments due to its boundary condition constraints. The performance of the two techniques has been tested on a complex aperiodic human action dataset with Warm up exercise sequences that we developed from correct and incorrect executions by multiple people. The proposed ASM technique shows promising alignment and missed/anomalous notification results over this dataset.

...read moreread less

7 citations

Proceedings Article•DOI•

Improved geometric feature graph: a script independent representation of word images for compression, and retrieval

[...]

Gaurav Harit¹, Richa Jain¹, Santanu Chaudhury¹•Institutions (1)

Indian Institutes of Technology¹

31 Aug 2005

TL;DR: A new representation scheme for word images which exploits the structural features of the word image skeleton in the form of a graph called as the geometric feature graph (GFG).

...read moreread less

Abstract: In this paper, we discuss a new representation scheme for word images which exploits the structural features. The word image features are represented in the form of a graph called as the geometric feature graph (GFG). The GFG is encoded in the form of a string which serves as a compressed representation of the word image skeleton. We demonstrate reconstruction, and retrieval of word images for 3 different scripts using the GFG string.

...read moreread less

5 citations

Proceedings Article•DOI•

An Integrated Scheme for Compression and Interactive Access to Document Images

[...]

Gaurav Harit¹, Ritu Garg¹, Santanu Chaudhury¹•Institutions (1)

Indian Institute of Technology Delhi¹

05 Mar 2007

TL;DR: An integrated scheme for document image compression is presented which preserves the layout structure, and still allows the display of textual portions to adapt to the user preferences and screen area, and derives an SVG representation of the complete document image.

...read moreread less

Abstract: We present an integrated scheme for document image compression which preserves the layout structure, and still allows the display of textual portions to adapt to the user preferences and screen area. We encode the layout structure of the document images in an XML representation. The textual components and picture components are compressed separately into different representations. We derive an SVG (scalable vector graphics) representation of the complete document image. Compression is achieved since the word-images are encoded using specifications for geometric primitives that compose a word. A document rendered from its SVG representation can be adapted for display and interactive access through common browsers on desktop as well as mobile devices. We demonstrate the effectiveness of the proposed scheme for document access

...read moreread less

5 citations

Proceedings Article•DOI•

Unsupervised Temporal Segmentation of Human Action Using Community Detection

[...]

Hiteshi Jain¹, Gaurav Harit¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

01 Oct 2018

TL;DR: This work presents a novel community detection-based human action segmentation algorithm that marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks.

...read moreread less

Abstract: Temporal segmentation of complex human action videos into action primitives plays a pivotal role in building models for human action understanding Studies in the past have introduced unsupervised frameworks for deriving a known number of motion primitives from action videos Our work focuses towards answering a question: Given a set of videos with humans performing an activity, can the action primitives be derived from them without specifying any prior knowledge about the count for the constituting sub-actions categories? To this end, we present a novel community detection-based human action segmentation algorithm Our work marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks We test our proposed technique over the stitched Weizmann dataset and MHADI01-s motion capture dataset and our technique outperforms the state-of-the-art techniques of complex action segmentation without the count of actions being pre-specified

...read moreread less

5 citations

Proceedings Article•DOI•

An Unsupervised Sequence-to-Sequence Autoencoder Based Human Action Scoring Model

[...]

Hiteshi Jain¹, Gaurav Harit¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

01 Nov 2019

TL;DR: This work introduces a novel sequence-to-sequence autoencoder-based scoring model which learns the representation from only expert performances and judges an unknown performance based on how well it can be regenerated from the learned model.

...read moreread less

Abstract: Developing a model for the task of assessing quality of human action is a key research area in computer vision. The quality assessment task has been posed as a supervised regression problem, where models have been trained to predict score, given action representation features. However, human proficiency levels can widely vary and so do their scores. Providing all such performance variations and their respective scores is an expensive solution as it requires a domain expert to annotate many videos. The question arises - Can we exploit the variations of the performances from that of expert and map the variations to their respective scores? To this end, we introduce a novel sequence-to-sequence autoencoder-based scoring model which learns the representation from only expert performances and judges an unknown performance based on how well it can be regenerated from the learned model. We evaluated our model in predicting scores of a complex Sun- Salutation action sequence, and demonstrate that our model gives remarkable prediction accuracy compared to the baselines.

...read moreread less

5 citations

1
2
3
…
4
5
6
7
8
9
10
…
11
12
13
14
15

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Document Analysis and Recognition

[...]

Takahiro Watanabe

25 Mar 1999-IEICE Transactions on Information and Systems

TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.

...read moreread less

Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

...read moreread less

222 citations

Journal Article•DOI•

ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP)

[...]

Newton Lee

01 Apr 2007

TL;DR: Call for papers for Special Issue of ACM Transactions on Multimedia Computing, Communications and Applications on Interactive Digital Television.

...read moreread less

Abstract: Call for papers for Special Issue of ACM Transactions on Multimedia Computing, Communications and Applications on Interactive Digital Television

...read moreread less

201 citations

Journal Article•DOI•

Object Level Grouping for Video Shots

[...]

Josef Sivic¹, Frederik Schaffalitzky¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

10 Apr 2006-International Journal of Computer Vision

TL;DR: A method for automatically obtaining object representations suitable for retrieval from generic video shots that includes associating regions within a single shot to represent a deforming object and an affine factorization method that copes with motion degeneracy.

...read moreread less

Abstract: We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object's possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions. In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy. We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video. We illustrate the method on the feature film "Groundhog Day." Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).

...read moreread less

162 citations

Journal Article•DOI•

Offline Recognition of Devanagari Script: A Survey

[...]

R. Jayadevan¹, Satish R. Kolhe, Pradeep M. Patil², Umapada Pal•Institutions (2)

Pune Institute of Computer Technology¹, Vishwakarma Institute of Technology²

01 Nov 2011

TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.

...read moreread less

Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

...read moreread less

159 citations

Proceedings Article•DOI•

Table Detection Using Deep Learning

[...]

Azka Gilani, Shah Rukh Qasim¹, Imran Malik¹, Faisal Shafait¹•Institutions (1)

University of the Sciences¹

01 Nov 2017

TL;DR: The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines and beats Tesseract's state of the art table detection system by a significant margin.

...read moreread less

Abstract: Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to the reader in a structured manner. It is a hard problem due to varying layouts and encodings of the tables. Researchers have proposed numerous techniques for table detection based on layout analysis of documents. Most of these techniques fail to generalize because they rely on hand engineered features which are not robust to layout variations. In this paper, we have presented a deep learning based method for table detection. In the proposed method, document images are first pre-processed. These images are then fed to a Region Proposal Network followed by a fully connected neural network for table detection. The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines. We have done our evaluations on publicly available UNLV dataset where it beats Tesseract's state of the art table detection system by a significant margin.

...read moreread less

159 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110

Collapse