Modeling sense disambiguation of human pose: recognizing action at a distance by key poses
TL;DR: A methodology for recognizing actions at a distance by watching the human poses and deriving descriptors that capture the motion patterns of the poses and shows the efficacy of this approach when compared to the present state of the art.
Abstract: We propose a methodology for recognizing actions at a distance by watching the human poses and deriving descriptors that capture the motion patterns of the poses. Human poses often carry a strong visual sense (intended meaning) which describes the related action unambiguously. But identifying the intended meaning of poses is a challenging task because of their variability and such variations in poses lead to visual sense ambiguity. From a large vocabulary of poses (visual words) we prune out ambiguous poses and extract key poses (or key words) using centrality measure of graph connectivity [1]. Under this framework, finding the key poses for a given sense (i.e., action type) amounts to constructing a graph with poses as vertices and then identifying the most "important" vertices in the graph (following centrality theory). The results on four standard activity recognition datasets show the efficacy of our approach when compared to the present state of the art.
...read more
Citations
10,141 citations
48 citations
Cites background from "Modeling sense disambiguation of hu..."
...Mukherjee et al. [27] selected several key poses for each action through analysis for action cycles....
[...]
41 citations
Cites result from "Modeling sense disambiguation of hu..."
...We will show in the result section that the theory of meaningfulness gives better result compared to the procedure of selecting q-best poses for each action type [18]....
[...]
...These key poses can either be selected by choosing q-best (with q fixed) poses for each action type [18], or by introducing a suitable threshold on the graph centrality measure using the concept of meaningfulness [17]....
[...]
21 citations
Cites background or methods from "Modeling sense disambiguation of hu..."
...the interaction descriptors (as action descriptors in [9]) from the dictionary of ‘key pose doublets’ Ψ for recognition....
[...]
...We make a two-fold contribution to enhance the approach of [9] for recognizing interaction between multiple human performers....
[...]
...In [9], a new pose descriptor is proposed using a gradient weighted optical flow feature combining both global and local features....
[...]
...The poses from Sj , j = 1, 2 are placed in a graph as nodes and the edge between each two poses stands for the dissimilarity in terms of a semantic relationship between them, measured using some form of weight function [9]....
[...]
...We use multidimensional pose descriptor corresponding to each performer of each frame of an action video as suggested in [9]....
[...]
11 citations
Cites methods from "Modeling sense disambiguation of hu..."
...There were 4 university teams [20, 29, 35, 36] who participated in the AVAC Challenge and the UT-Tower dataset was used to evaluate each partic200 ipant’s method....
[...]
References
22,762 citations
"Modeling sense disambiguation of hu..." refers methods in this paper
...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [ 16 ] or one can directly employ X-means algorithm [17] which is a divisive clustering technique where splitting decision depends on the local BIC score (i.e., does BIC increase or decrease upon splitting the parent cluster into child ones)....
[...]
17,092 citations
16,640 citations
"Modeling sense disambiguation of hu..." refers methods in this paper
...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [16] or one can directly employ X-means algorithm [17] which is a divisive clustering technique where splitting decision depends on the local BIC score (i.e., does BIC increase or decrease upon splitting the parent cluster into child ones)....
[...]
...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [16] or one can directly employ X-means algorithm [17] which is a divisive clustering technique where splitting decision depends on the local BIC score (i....
[...]
14,045 citations
"Modeling sense disambiguation of hu..." refers background or methods in this paper
...Google uses centrality measures to rank webpages [13] and recently this ranking technique has spelled success in feature ranking for object recognition [14] and video-action recognition [8] tasks....
[...]
...[13]) and it has been recently used with success in feature ranking for object recognition [14] and feature mining task [8]....
[...]