scispace - formally typeset
Search or ask a question

Showing papers by "Gaurav Harit published in 2018"


Proceedings ArticleDOI
01 Oct 2018
TL;DR: This work presents a novel community detection-based human action segmentation algorithm that marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks.
Abstract: Temporal segmentation of complex human action videos into action primitives plays a pivotal role in building models for human action understanding Studies in the past have introduced unsupervised frameworks for deriving a known number of motion primitives from action videos Our work focuses towards answering a question: Given a set of videos with humans performing an activity, can the action primitives be derived from them without specifying any prior knowledge about the count for the constituting sub-actions categories? To this end, we present a novel community detection-based human action segmentation algorithm Our work marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks We test our proposed technique over the stitched Weizmann dataset and MHADI01-s motion capture dataset and our technique outperforms the state-of-the-art techniques of complex action segmentation without the count of actions being pre-specified

5 citations


Book ChapterDOI
18 Dec 2018
TL;DR: A new segmentation-free approach is proposed which matches convex shape portions of symbols occurring in various layout such as subscript, superscript, fraction etc and is able to perform spotting of symbols present in a handwritten expression.
Abstract: Recognition of touching characters in mathematical expressions is a challenging problem in the field of document image analysis. Various approaches for recognizing touching maths symbols have been reported in literature, but they mainly dealt with printed expressions and handwritten numeral strings. In this work, a new segmentation-free approach is proposed which matches convex shape portions of symbols occurring in various layout such as subscript, superscript, fraction etc. and is able to perform spotting of symbols present in a handwritten expression. Our contribution lies in the design of a novel feature which can handle touching symbols effectively in the presence of handwriting variations. This recognition-based approach helps in spotting symbols in an expression even in the presence of clutter created by the presence of other symbols.

3 citations


Book ChapterDOI
TL;DR: This paper proposes a new way of representing layout, which is called attributed paths, which admits a string edit distance based match measure and shows that layout based retrieval using attributed paths is computationally efficient and more effective.
Abstract: A document is rich in its layout. The entities of interest can be scattered over the document page. Traditional layout matching has involved modeling layout structure as grids, graphs, and spatial histograms of patches. In this paper we propose a new way of representing layout, which we call attributed paths. This representation admits a string edit distance based match measure. Our experiments show that layout based retrieval using attributed paths is computationally efficient and more effective. It also offers flexibility in tuning the match criterion. We have demonstrated effectiveness of attributed paths in performing layout based retrieval tasks on datasets of floor plan images [14] and journal pages [1].

2 citations


Proceedings ArticleDOI
18 Dec 2018
TL;DR: This work proposes an approximation of Gaussian Process and applies it to Classification and Regression tasks using a greedy approach to subset selection and the inducing input choice to approximate the kernel matrix, resulting in faster retrieval timings.
Abstract: In this work we propose an approximation of Gaussian Process and apply it to Classification and Regression tasks. We, primarily, target the problem of visual object categorization using a Greedy variant of Gaussian Processes. To deal with the prohibitive training and inferencing cost of GP, we devise a greedy approach to subset selection and the inducing input choice to approximate the kernel matrix, resulting in faster retrieval timings. A localized combination of kernel functions is designed and used in a framework of sparse approximations to Gaussian Processes for visual object categorization and generic regression tasks. Through exhaustive experimentation and empirical results we demonstrate the effectiveness of the proposed approach, when compared with other kernel based methods.

1 citations


Journal ArticleDOI
TL;DR: This paper introduces query specific dtw distance, which enables effective computation of global principal alignments for novel queries and uses query expansion (qe) to further improve the performance of the recently proposed Direct Query Classifier (dqc).
Abstract: In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for performing retrieval in many practical document image datasets. In (dqc), the classifiers are trained for a set of frequent queries and seamlessly extended for the rare and arbitrary queries. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. The (dqc) requires indexing cut-portions (n-grams) of the word image and dtw distance has been used for indexing. However, dtw is computationally slow and therefore limits the performance of the (dqc). We introduce query specific dtw distance, which enables effective computation of global principal alignments for novel queries. Since the proposed query specific dtw distance is a linear approximation of the dtw distance, it enhances the performance of the (dqc). Unlike previous approaches, the proposed query specific dtw distance uses both the class mean vectors and the query information for computing the global principal alignments for the query. Since the proposed method computes the global principal alignments using n-grams, it works well for both frequent and rare queries. We also use query expansion (qe) to further improve the performance of our query specific dtw. This also allows us to seamlessly adapt our solution to new fonts, styles and collections. We have demonstrated the utility of the proposed technique over 3 different datasets. The proposed query specific dtw performs well compared to the previous dtw approximations.

1 citations


Proceedings ArticleDOI
18 Dec 2018
TL;DR: This work introduces a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation and proposes a technique to learn the temporal order of these key poses from these imperfect videos.
Abstract: As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions with the help of some imperfect performances. The efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.

Journal ArticleDOI
TL;DR: This paper designs and implements a human–machine interaction application, which enables a visually challenged person to locate and manipulate personal objects in her/his neighborhood, and develops a moment-based human servoing algorithm which is able to generate commands that help the visually impaired human to localize his hand with respect to the object of interest.
Abstract: In this paper, we design and implement a human–machine interaction application, which enables a visually challenged person to locate and manipulate personal objects in her/his neighborhood. In this setting, we need to develop a tool (embedded in a mobile phone) which is capable of sensing, computing, and guiding the human arm toward the object. This involves solving the following two subproblems: (1) recognition of objects in the input images, and (2) generating control signals, to guide the human for navigation, to reach the desired destination. For the former subproblem, we adapt the bag-of-words framework for recognition and matching on mobile phones. For the latter subproblem, we have developed a moment-based human servoing algorithm which is able to generate commands that help the visually impaired human to localize his hand with respect to the object of interest. All necessary computations take place on the mobile phone. The proposed object recognition and vision-based control design are deployed on a low-/mid-end mobile phone. This can lead to a wide range of applications. With our proposed design and implementation, we demonstrate that our application is effective and accurate, with a high reliability of convergence for different experimental settings.