scispace - formally typeset
Search or ask a question
Book ChapterDOI

Modeling sense disambiguation of human pose: recognizing action at a distance by key poses

TL;DR: A methodology for recognizing actions at a distance by watching the human poses and deriving descriptors that capture the motion patterns of the poses and shows the efficacy of this approach when compared to the present state of the art.
Abstract: We propose a methodology for recognizing actions at a distance by watching the human poses and deriving descriptors that capture the motion patterns of the poses. Human poses often carry a strong visual sense (intended meaning) which describes the related action unambiguously. But identifying the intended meaning of poses is a challenging task because of their variability and such variations in poses lead to visual sense ambiguity. From a large vocabulary of poses (visual words) we prune out ambiguous poses and extract key poses (or key words) using centrality measure of graph connectivity [1]. Under this framework, finding the key poses for a given sense (i.e., action type) amounts to constructing a graph with poses as vertices and then identifying the most "important" vertices in the graph (following centrality theory). The results on four standard activity recognition datasets show the efficacy of our approach when compared to the present state of the art.
Citations
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: A novel approach for key poses selection is proposed, which models the descriptor space utilizing a manifold learning technique to recover the geometric structure of the descriptors on a lower dimensional manifold and develops a PageRank-based centrality measure.
Abstract: In action recognition, bag of visual words based approaches have been shown to be successful, for which the quality of codebook is critical. In a large vocabulary of poses (visual words), some key poses play a more decisive role than others in the codebook. This paper proposes a novel approach for key poses selection, which models the descriptor space utilizing a manifold learning technique to recover the geometric structure of the descriptors on a lower dimensional manifold. A PageRank-based centrality measure is developed to select key poses according to the recovered geometric structure. In each step, a key pose is selected from the manifold and the remaining model is modified to maximize the discriminative power of selected codebook. With the obtained codebook, each action can be represented with a histogram of the key poses. To solve the ambiguity between some action classes, a pairwise subdivision is executed to select discriminative codebooks for further recognition. Experiments on benchmark datasets showed that our method is able to obtain better performance compared with other state-of-the-art methods.

52 citations


Cites background from "Modeling sense disambiguation of hu..."

  • ...Mukherjee et al. [27] selected several key poses for each action through analysis for action cycles....

    [...]

Journal ArticleDOI
TL;DR: A graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses and introduces a “meaningful” threshold on centrality measure that selects key poses for each action type.
Abstract: In this paper, we propose a graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses. The proposed methodology follows a bag-of-word approach that starts with a large vocabulary of poses (visual words) and derives a refined and compact codebook of key poses using centrality measure of graph connectivity. We introduce a “meaningful” threshold on centrality measure that selects key poses for each action type. Our contribution includes a novel pose descriptor based on histogram of oriented optical flow evaluated in a hierarchical fashion on a video frame. This pose descriptor combines both pose information and motion pattern of the human performer into a multidimensional feature vector. We evaluate our methodology on four standard activity-recognition datasets demonstrating the superiority of our method over the state-of-the-art.

43 citations


Cites result from "Modeling sense disambiguation of hu..."

  • ...We will show in the result section that the theory of meaningfulness gives better result compared to the procedure of selecting q-best poses for each action type [18]....

    [...]

  • ...These key poses can either be selected by choosing q-best (with q fixed) poses for each action type [18], or by introducing a suitable threshold on the graph centrality measure using the concept of meaningfulness [17]....

    [...]

Proceedings ArticleDOI
28 Nov 2011
TL;DR: A graph theoretic approach for recognizing interactions between two human performers present in a video clip and applies the same centrality measure on all possible combinations of the key poses of the two performers to select the set of 'key pose doublets' that best represent the corresponding action.
Abstract: In this paper, we propose a graph theoretic approach for recognizing interactions between two human performers present in a video clip. We watch primarily the human poses of each performer and derive descriptors that capture the motion patterns of the poses. From an initial dictionary of poses (visual words), we extract key poses (or key words) by ranking the poses on the centrality measure of graph connectivity. We argue that the key poses are graph nodes which share a close semantic relationship (in terms of some suitable edge weight function) with all other pose nodes and hence are said to be the central part of the graph. We apply the same centrality measure on all possible combinations of the key poses of the two performers to select the set of 'key pose doublets' that best represent the corresponding action. The results on standard interaction recognition dataset show the robustness of our approach when compared to the present state of the art method.

22 citations


Cites background or methods from "Modeling sense disambiguation of hu..."

  • ...the interaction descriptors (as action descriptors in [9]) from the dictionary of ‘key pose doublets’ Ψ for recognition....

    [...]

  • ...We make a two-fold contribution to enhance the approach of [9] for recognizing interaction between multiple human performers....

    [...]

  • ...In [9], a new pose descriptor is proposed using a gradient weighted optical flow feature combining both global and local features....

    [...]

  • ...The poses from Sj , j = 1, 2 are placed in a graph as nodes and the edge between each two poses stands for the dissimilarity in terms of a semantic relationship between them, measured using some form of weight function [9]....

    [...]

  • ...We use multidimensional pose descriptor corresponding to each performer of each frame of an action video as suggested in [9]....

    [...]

Journal ArticleDOI
TL;DR: The Layered Elastic Motion Tracking (LEMT) method is adopted, a hybrid feature representation is presented to integrate both of the shape and motion features, and a Region-based Mixture Model (RMM) is proposed to be utilized for action classification.

15 citations


Cites methods from "Modeling sense disambiguation of hu..."

  • ...There were 4 university teams [20, 29, 35, 36] who participated in the AVAC Challenge and the UT-Tower dataset was used to evaluate each partic200 ipant’s method....

    [...]

References
More filters
Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

Journal ArticleDOI
TL;DR: This work characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links that connect them.
Abstract: Social Network Analysis Methods And Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network ...

12,634 citations


"Modeling sense disambiguation of hu..." refers background or methods in this paper

  • ..., action type) we rank the poses in order of “importance” using centrality measure of graph connectivity [12]....

    [...]

  • ...An equivalent problem exists in social network analysis [1, 12] (viz....

    [...]

  • ...To make e explicit we adopt eccentricity as a measure of graph connectivity [12]....

    [...]

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book
01 Aug 2006
TL;DR: Looking for competent reading resources?
Abstract: Looking for competent reading resources? We have pattern recognition and machine learning information science and statistics to read, not only read, but also download them or even check out online. Locate this fantastic book writtern by by now, simply here, yeah just here. Obtain the reports in the kinds of txt, zip, kindle, word, ppt, pdf, as well as rar. Once again, never ever miss to review online and download this book in our site right here. Click the link.

8,923 citations