scispace - formally typeset
Search or ask a question
Book ChapterDOI

Recognizing Human Actions by Their Pose

TL;DR: This paper proposes to recognize human actions by evaluating a distribution over a set of predefined static poses which it refers to as pose primitives, and proposes a generally applicable approach that also works in still images, or for images taken from a moving camera.
Abstract: The topic of human action recognition from image sequences gained increasing interest throughout the last years. Interestingly, the majority of approaches are restricted to dynamic motion features and therefore not universally applicable. In this paper, we propose to recognize human actions by evaluating a distribution over a set of predefined static poses which we refer to as pose primitives. We aim at a generally applicable approach that also works in still images, or for images taken from a moving camera. Experimental validation takes varying video sequence lengths into account and emphasizes the possibility for action recognition from single images, which we believe is an often overlooked but nevertheless important aspect of action recognition. The proposed approach uses a set of training video sequences to estimate pose and action class representations. To incorporate the local temporal context of poses, atomic subsequences of poses using n-gram expressions are explored. Action classes can be represented by histograms of poses primitive n-grams which allows for action recognition by means of histogram comparison. Although the suggested action recognition method is independent of the underlying low-level representation of poses, representations remain important for targeting practical problems. Thus, to deal with common problems in video based action recognition, e.g. articulated poses and cluttered background, a recently introduced Histogram of Oriented Gradient based descriptor is extended using a non-negative matrix factorization reconstruction.
Citations
More filters
Proceedings ArticleDOI
11 Jul 2012
TL;DR: In this article, a method for reliable recognition of construction workers and their actions using color and depth data from a Microsoft Kinect sensor is presented. But their method is based on machine learning techniques, in which meaningful visual features are extracted based on the estimated body pose of workers.
Abstract: In this paper we present a novel method for reliable recognition of construction workers and their actions using color and depth data from a Microsoft Kinect sensor. Our algorithm is based on machine learning techniques, in which meaningful visual features are extracted based on the estimated body pose of workers. We adopt a bag-of-poses representation for worker actions and combine it with powerful discriminative classifiers to achieve accurate action recognition. The discriminative framework is able to focus on the visual aspects that are distinctive and can detect and recognize actions from different workers. We train and test our algorithm by using 80 videos from four workers involved in five drywall related construction activities. These videos were all collected from drywall construction activities inside of an under construction dining hall facility. The proposed algorithm is further validated by recognizing the actions of a construction worker that was never seen before in the training dataset. Experimental results show that our method achieves an average precision of 85.28 percent. The results reflect the promise of the proposed method for automated assessment of craftsmen productivity, safety, and occupational health at indoor environments.

74 citations

Journal ArticleDOI
TL;DR: In this paper, the authors aim at equipping autonomous robots with robust manipulation skills for action-based tasks. But they do not define the meaning of "action" in robotics research.
Abstract: Understanding and defining the meaning of “action” is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for ...

19 citations

01 Jan 2012
TL;DR: A novel method for reliable recognition of construction workers and their actions using color and depth data from a Microsoft Kinect sensor is presented, based on machine learning techniques, which adopt a bag-of-poses representation for worker actions and combine it with powerful discriminative classifiers to achieve accurate action recognition.
Abstract: In this paper we present a novel method for reliable recognition of construction workers and their actions using color and depth data from a Microsoft Kinect sensor. Our algorithm is based on machine learning techniques, in which meaningful visual features are extracted based on the estimated body pose of workers. We adopt a bag-of-poses representation for worker actions and combine it with powerful discriminative classifiers to achieve accurate action recognition. The discriminative framework is able to focus on the visual aspects that are distinctive and can detect and recognize actions from different workers. We train and test our algorithm by using 80 videos from four workers involved in five drywall related construction activities. These videos were all collected from drywall construction activities inside of an under construction dining hall facility. The proposed algorithm is further validated by recognizing the actions of a construction worker that was never seen before in the training dataset. Experimental results show that our method achieves an average precision of 85.28 percent. The results reflect the promise of the proposed method for automated assessment of craftsmen productivity, safety, and occupational health at indoor environments.

9 citations

Journal ArticleDOI
TL;DR: A recurrent neural network solving the approximate nonnegative matrix factorization (NMF) problem is presented, and it is proved that local solutions of the NMF optimization problem correspond to as many stable steady-state points of the network dynamics.

6 citations


Cites background from "Recognizing Human Actions by Their ..."

  • ...text mining [28], document clustering [29,30], image reconstruction [31], human action recognition [32], discovering muscle synergies [33], EEG classification [34] and music transcription [35,36]....

    [...]

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Recognizing Human Actions by Their ..." refers background or methods in this paper

  • ...The data set does not show any humans and was used in other recent contributions [6]....

    [...]

  • ...In this paper, we stay close to the parameters suggested in [6]....

    [...]

  • ...Every pixel calculates a vote for an edge orientation histogram for its cell, weighted by the magnitude (as suggested in [6], we used 9 orientation bins)....

    [...]

  • ...In [6], a Histogram of Oriented Gradients (HOG) descriptor was introduced for pedestrian detection in raw images....

    [...]

Journal ArticleDOI
21 Oct 1999-Nature
TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

11,500 citations


"Recognizing Human Actions by Their ..." refers background in this paper

  • ...For example, in [18] face images were decomposed into a set of meaningful parts, e....

    [...]

01 Jan 1999
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

9,604 citations

Proceedings Article
01 Jan 2000
TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Abstract: Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary function analogous to that used for proving convergence of the Expectation-Maximization algorithm. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.

7,345 citations


"Recognizing Human Actions by Their ..." refers background in this paper

  • ...Under these update rules the divergence D(V||WH) is non increasing, see also [ 19 ] for further details....

    [...]