scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Recognizing Human Action at a Distance in Video by Key Poses

TL;DR: A graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses and introduces a “meaningful” threshold on centrality measure that selects key poses for each action type.
Abstract: In this paper, we propose a graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses. The proposed methodology follows a bag-of-word approach that starts with a large vocabulary of poses (visual words) and derives a refined and compact codebook of key poses using centrality measure of graph connectivity. We introduce a “meaningful” threshold on centrality measure that selects key poses for each action type. Our contribution includes a novel pose descriptor based on histogram of oriented optical flow evaluated in a hierarchical fashion on a video frame. This pose descriptor combines both pose information and motion pattern of the human performer into a multidimensional feature vector. We evaluate our methodology on four standard activity-recognition datasets demonstrating the superiority of our method over the state-of-the-art.
Citations
More filters
Journal ArticleDOI
TL;DR: The thrust of this survey is on the utilization of depth cameras and inertial sensors as these two types of sensors are cost-effective, commercially available, and more significantly they both provide 3D human action data.
Abstract: A number of review or survey articles have previously appeared on human action recognition where either vision sensors or inertial sensors are used individually. Considering that each sensor modality has its own limitations, in a number of previously published papers, it has been shown that the fusion of vision and inertial sensor data improves the accuracy of recognition. This survey article provides an overview of the recent investigations where both vision and inertial sensors are used together and simultaneously to perform human action recognition more effectively. The thrust of this survey is on the utilization of depth cameras and inertial sensors as these two types of sensors are cost-effective, commercially available, and more significantly they both provide 3D human action data. An overview of the components necessary to achieve fusion of data from depth and inertial sensors is provided. In addition, a review of the publicly available datasets that include depth and inertial data which are simultaneously captured via depth and inertial sensors is presented.

294 citations


Cites methods from "Recognizing Human Action at a Dista..."

  • ...The approaches developed based on video sequences can be classified into template-based approaches, where emphasis is placed on low- and mid-level features, and model-based approaches where emphasis is placed on high-level features [45]....

    [...]

Journal ArticleDOI
TL;DR: STIP-based detectors are robust in detecting interest points from video in spatio-temporal domain and related public datasets useful for comparing performances of various techniques are summarized.
Abstract: Over the past two decades, human action recognition from video has been an important area of research in computer vision. Its applications include surveillance systems, human---computer interactions and various real-world applications where one of the actor is a human being. A number of review works have been done by several researchers in the context of human action recognition. However, it is found that there is a gap in literature when it comes to methodologies of STIP-based detector for human action recognition. This paper presents a comprehensive review on STIP-based methods for human action recognition. STIP-based detectors are robust in detecting interest points from video in spatio-temporal domain. This paper also summarizes related public datasets useful for comparing performances of various techniques.

152 citations


Cites methods from "Recognizing Human Action at a Dista..."

  • ...The methods of human action recognition from image frames or video sequences are broadly classified as templatebased approach (emphasis on collecting low- and mid-level features) and model-based approach (emphasis on feature for high-level interaction) [7]....

    [...]

Journal ArticleDOI
TL;DR: A new performance metric addressing and unifying the qualitative and quantitative aspects of the performance measures is proposed, which has been tested on several activity recognition algorithms participating in the ICPR 2012 HARL competition.

76 citations


Cites methods from "Recognizing Human Action at a Dista..."

  • ...HoF features were calculated according to [58] in a hierarchical way using a pyramid....

    [...]

Journal ArticleDOI
TL;DR: A Bag of Expression (BoE) framework, based on the bag of words method, for recognizing human action in simple and realistic scenarios, which outperforms existing Bag of Words based approaches, when evaluated using the same performance evaluation methods.

49 citations


Cites background from "Recognizing Human Action at a Dista..."

  • ...It is captured in a controlled environment with simple background and holds camera motion and zooming effect in few videos (Mukherjee et al., 2011)....

    [...]

Proceedings ArticleDOI
09 Jan 2012
TL;DR: A sparse representation based dictionary learning technique is used to address dance classification as a new problem in computer vision and to present a new action descriptor to represent a dance video which overcomes the problem of the “Bags-of-Words” model.
Abstract: In this paper, we address an interesting application of computer vision technique, namely classification of Indian Classical Dance (ICD). With the best of our knowledge, the problem has not been addressed so far in computer vision domain. To deal with this problem, we use a sparse representation based dictionary learning technique. First, we represent each frame of a dance video by a pose descriptor based on histogram of oriented optical flow (HOOF), in a hierarchical manner. The pose basis is learned using an on-line dictionary learning technique. Finally each video is represented sparsely as a dance descriptor by pooling pose descriptor of all the frames. In this work, dance videos are classified using support vector machine (SVM) with intersection kernel. Our contribution here are two folds. First, to address dance classification as a new problem in computer vision and second, to present a new action descriptor to represent a dance video which overcomes the problem of the “Bags-of-Words” model. We have tested our algorithm on our own ICD dataset created from the videos collected from YouTube. An accuracy of 86.67% is achieved on this dataset. Since we have proposed a new action descriptor too, we have tested our algorithm on well known KTH dataset. The performance of the system is comparable to the state-of-the-art.

39 citations


Cites background or methods from "Recognizing Human Action at a Dista..."

  • ...We use motion and oriented gradient information to build a pose descriptor of each frame [21]....

    [...]

  • ...For the details, please go through [21]....

    [...]

  • ...Where, in [21] they starts with a large vocabulary of poses (visual words) and derives a refined and bharatnatyam 83....

    [...]

  • ...Moreover, it is significantly better than another related algorithm proposed in [21]....

    [...]

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Recognizing Human Action at a Dista..." refers background or result in this paper

  • ...Table I shows that hierarchical implementations of HOOF/HOG in multiple layers work better than their raw counterparts, but our proposed hierarchical HOOF computed on gradient weighted optic flow [using (1)] outperforms all others....

    [...]

  • ...We show in Table I the results of our experiments using raw HOOF [20] (row 3) and HOG [21] (row 4) features as well as hierarchical HOOF without using (1) (row 5) and hierarchical HOG (row 6)....

    [...]

  • ...HOG feature on 3-layer architecture [21] 98 74....

    [...]

  • ...HOG feature on 3-layer architecture [21] 62....

    [...]

  • ...2(a)] and the histogram of oriented gradient (HOG) [21] [derived from Fig....

    [...]

Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,840 citations

01 Jan 2005

19,250 citations

Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

18,802 citations


"Recognizing Human Action at a Dista..." refers methods in this paper

  • ...An optimum (local) lower bound on the codebook size of S can be estimated by Akaike information criterion, or Bayesian information criterion [23] or one can directly employ X-means algorithm [24]....

    [...]

Book
25 Nov 1994
TL;DR: This paper presents mathematical representation of social networks in the social and behavioral sciences through the lens of Dyadic and Triadic Interaction Models, which describes the relationships between actor and group measures and the structure of networks.
Abstract: Part I. Introduction: Networks, Relations, and Structure: 1. Relations and networks in the social and behavioral sciences 2. Social network data: collection and application Part II. Mathematical Representations of Social Networks: 3. Notation 4. Graphs and matrixes Part III. Structural and Locational Properties: 5. Centrality, prestige, and related actor and group measures 6. Structural balance, clusterability, and transitivity 7. Cohesive subgroups 8. Affiliations, co-memberships, and overlapping subgroups Part IV. Roles and Positions: 9. Structural equivalence 10. Blockmodels 11. Relational algebras 12. Network positions and roles Part V. Dyadic and Triadic Methods: 13. Dyads 14. Triads Part VI. Statistical Dyadic Interaction Models: 15. Statistical analysis of single relational networks 16. Stochastic blockmodels and goodness-of-fit indices Part VII. Epilogue: 17. Future directions.

17,104 citations