scispace - formally typeset

Book ChapterDOI

Modeling sense disambiguation of human pose: recognizing action at a distance by key poses

08 Nov 2010-pp 244-255

TL;DR: A methodology for recognizing actions at a distance by watching the human poses and deriving descriptors that capture the motion patterns of the poses and shows the efficacy of this approach when compared to the present state of the art.

AbstractWe propose a methodology for recognizing actions at a distance by watching the human poses and deriving descriptors that capture the motion patterns of the poses. Human poses often carry a strong visual sense (intended meaning) which describes the related action unambiguously. But identifying the intended meaning of poses is a challenging task because of their variability and such variations in poses lead to visual sense ambiguity. From a large vocabulary of poses (visual words) we prune out ambiguous poses and extract key poses (or key words) using centrality measure of graph connectivity [1]. Under this framework, finding the key poses for a given sense (i.e., action type) amounts to constructing a graph with poses as vertices and then identifying the most "important" vertices in the graph (following centrality theory). The results on four standard activity recognition datasets show the efficacy of our approach when compared to the present state of the art.

...read more


Citations
More filters
Christopher M. Bishop1
01 Jan 2006
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: A novel approach for key poses selection is proposed, which models the descriptor space utilizing a manifold learning technique to recover the geometric structure of the descriptors on a lower dimensional manifold and develops a PageRank-based centrality measure.
Abstract: In action recognition, bag of visual words based approaches have been shown to be successful, for which the quality of codebook is critical. In a large vocabulary of poses (visual words), some key poses play a more decisive role than others in the codebook. This paper proposes a novel approach for key poses selection, which models the descriptor space utilizing a manifold learning technique to recover the geometric structure of the descriptors on a lower dimensional manifold. A PageRank-based centrality measure is developed to select key poses according to the recovered geometric structure. In each step, a key pose is selected from the manifold and the remaining model is modified to maximize the discriminative power of selected codebook. With the obtained codebook, each action can be represented with a histogram of the key poses. To solve the ambiguity between some action classes, a pairwise subdivision is executed to select discriminative codebooks for further recognition. Experiments on benchmark datasets showed that our method is able to obtain better performance compared with other state-of-the-art methods.

48 citations


Cites background from "Modeling sense disambiguation of hu..."

  • ...Mukherjee et al. [27] selected several key poses for each action through analysis for action cycles....

    [...]

Journal ArticleDOI
TL;DR: A graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses and introduces a “meaningful” threshold on centrality measure that selects key poses for each action type.
Abstract: In this paper, we propose a graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses. The proposed methodology follows a bag-of-word approach that starts with a large vocabulary of poses (visual words) and derives a refined and compact codebook of key poses using centrality measure of graph connectivity. We introduce a “meaningful” threshold on centrality measure that selects key poses for each action type. Our contribution includes a novel pose descriptor based on histogram of oriented optical flow evaluated in a hierarchical fashion on a video frame. This pose descriptor combines both pose information and motion pattern of the human performer into a multidimensional feature vector. We evaluate our methodology on four standard activity-recognition datasets demonstrating the superiority of our method over the state-of-the-art.

41 citations


Cites result from "Modeling sense disambiguation of hu..."

  • ...We will show in the result section that the theory of meaningfulness gives better result compared to the procedure of selecting q-best poses for each action type [18]....

    [...]

  • ...These key poses can either be selected by choosing q-best (with q fixed) poses for each action type [18], or by introducing a suitable threshold on the graph centrality measure using the concept of meaningfulness [17]....

    [...]

Proceedings ArticleDOI
28 Nov 2011
TL;DR: A graph theoretic approach for recognizing interactions between two human performers present in a video clip and applies the same centrality measure on all possible combinations of the key poses of the two performers to select the set of 'key pose doublets' that best represent the corresponding action.
Abstract: In this paper, we propose a graph theoretic approach for recognizing interactions between two human performers present in a video clip. We watch primarily the human poses of each performer and derive descriptors that capture the motion patterns of the poses. From an initial dictionary of poses (visual words), we extract key poses (or key words) by ranking the poses on the centrality measure of graph connectivity. We argue that the key poses are graph nodes which share a close semantic relationship (in terms of some suitable edge weight function) with all other pose nodes and hence are said to be the central part of the graph. We apply the same centrality measure on all possible combinations of the key poses of the two performers to select the set of 'key pose doublets' that best represent the corresponding action. The results on standard interaction recognition dataset show the robustness of our approach when compared to the present state of the art method.

21 citations


Cites background or methods from "Modeling sense disambiguation of hu..."

  • ...the interaction descriptors (as action descriptors in [9]) from the dictionary of ‘key pose doublets’ Ψ for recognition....

    [...]

  • ...We make a two-fold contribution to enhance the approach of [9] for recognizing interaction between multiple human performers....

    [...]

  • ...In [9], a new pose descriptor is proposed using a gradient weighted optical flow feature combining both global and local features....

    [...]

  • ...The poses from Sj , j = 1, 2 are placed in a graph as nodes and the edge between each two poses stands for the dissimilarity in terms of a semantic relationship between them, measured using some form of weight function [9]....

    [...]

  • ...We use multidimensional pose descriptor corresponding to each performer of each frame of an action video as suggested in [9]....

    [...]

Journal ArticleDOI
TL;DR: The Layered Elastic Motion Tracking (LEMT) method is adopted, a hybrid feature representation is presented to integrate both of the shape and motion features, and a Region-based Mixture Model (RMM) is proposed to be utilized for action classification.
Abstract: State-of-the-art performance in human action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, optical flow algorithms are far from perfect in low-resolution (LR) videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points (SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM encodes the spatial layout of features without any needs of body parts segmentation. Experimental results show that the approach is effective and, more importantly, the approach is more general for LR recognition tasks.

11 citations


Cites methods from "Modeling sense disambiguation of hu..."

  • ...There were 4 university teams [20, 29, 35, 36] who participated in the AVAC Challenge and the UT-Tower dataset was used to evaluate each partic200 ipant’s method....

    [...]


References
More filters
Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,762 citations


"Modeling sense disambiguation of hu..." refers methods in this paper

  • ...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [ 16 ] or one can directly employ X-means algorithm [17] which is a divisive clustering technique where splitting decision depends on the local BIC score (i.e., does BIC increase or decrease upon splitting the parent cluster into child ones)....

    [...]

01 Jan 2005

19,237 citations

Book
25 Nov 1994
TL;DR: This paper presents mathematical representation of social networks in the social and behavioral sciences through the lens of Dyadic and Triadic Interaction Models, which describes the relationships between actor and group measures and the structure of networks.
Abstract: Part I. Introduction: Networks, Relations, and Structure: 1. Relations and networks in the social and behavioral sciences 2. Social network data: collection and application Part II. Mathematical Representations of Social Networks: 3. Notation 4. Graphs and matrixes Part III. Structural and Locational Properties: 5. Centrality, prestige, and related actor and group measures 6. Structural balance, clusterability, and transitivity 7. Cohesive subgroups 8. Affiliations, co-memberships, and overlapping subgroups Part IV. Roles and Positions: 9. Structural equivalence 10. Blockmodels 11. Relational algebras 12. Network positions and roles Part V. Dyadic and Triadic Methods: 13. Dyads 14. Triads Part VI. Statistical Dyadic Interaction Models: 15. Statistical analysis of single relational networks 16. Stochastic blockmodels and goodness-of-fit indices Part VII. Epilogue: 17. Future directions.

17,092 citations

Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

16,640 citations


"Modeling sense disambiguation of hu..." refers methods in this paper

  • ...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [16] or one can directly employ X-means algorithm [17] which is a divisive clustering technique where splitting decision depends on the local BIC score (i.e., does BIC increase or decrease upon splitting the parent cluster into child ones)....

    [...]

  • ...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) [16] or one can directly employ X-means algorithm [17] which is a divisive clustering technique where splitting decision depends on the local BIC score (i....

    [...]

Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,045 citations


"Modeling sense disambiguation of hu..." refers background or methods in this paper

  • ...Google uses centrality measures to rank webpages [13] and recently this ranking technique has spelled success in feature ranking for object recognition [14] and video-action recognition [8] tasks....

    [...]

  • ...[13]) and it has been recently used with success in feature ranking for object recognition [14] and feature mining task [8]....

    [...]