scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Novel Human Action Recognition and Behaviour Analysis Technique using SWFHOG

01 Jan 2020-International Journal of Advanced Computer Science and Applications (The Science and Information (SAI) Organization Limited)-Vol. 11, Iss: 4
TL;DR: The proposed SWFHOG method shows promising results as compared to earlier methods, and is tested against Camera view angle change and imperfect actions using Weizmann robustness testing datasets.
Abstract: In this paper, a new local feature, called, Salient Wavelet Feature with Histogram of Oriented Gradients (SWFHOG) is introduced for human action recognition and behaviour analysis. In the proposed approach, regions having maximum information are selected based on their entropies. The SWF feature descriptor is formed by using the wavelet sub-bands obtained by applying wavelet decomposition to selected regions. To improve the accuracy further, the SWF feature vector is combined with the Histogram of Oriented Gradient global feature descriptor to form the SWFHOG feature descriptor. The proposed algorithm is evaluated using publicly available KTH, Weizmann, UT Interaction, and UCF Sports datasets for action recognition. The highest accuracy of 98.33% is achieved for the UT interaction dataset. The proposed SWFHOG feature descriptor is tested for behaviour analysis to identify the actions as normal or abnormal. The actions from SBU Kinect and UT Interaction dataset are divided into two sets as Normal Behaviour and Abnormal Behaviour. For the application of behaviour analysis, 95% recognition accuracy is achieved for the SBU Kinect dataset and 97% accuracy is obtained for the UT Interaction dataset. Robustness of the proposed SWFHOG algorithm is tested against Camera view angle change and imperfect actions using Weizmann robustness testing datasets. The proposed SWFHOG method shows promising results as compared to earlier methods.

Content maybe subject to copyright    Report

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


Additional excerpts

  • ...As HOG was originally designed for person detection by Dalal and Triggs [25], it is a perfect candidate for human action recognition....

    [...]

Proceedings ArticleDOI
15 Oct 2005
TL;DR: It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
Abstract: A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.

2,699 citations


Additional excerpts

  • ...For the Weizmann dataset, slightly higher accuracy is achieved with a structural average based method [20]....

    [...]

Proceedings ArticleDOI
13 Oct 2003
TL;DR: This work builds on the idea of the Harris and Forstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time to detect spatio-temporal events.
Abstract: Local image features or interest points provide compact and abstract representations of patterns in an image. We propose to extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for its interpretation. To detect spatio-temporal events, we build on the idea of the Harris and Forstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We then estimate the spatio-temporal extents of the detected events and compute their scale-invariant spatio-temporal descriptors. Using such descriptors, we classify events and construct video representation in terms of labeled space-time points. For the problem of human motion analysis, we illustrate how the proposed method allows for detection of walking people in scenes with occlusions and dynamic backgrounds.

2,232 citations

Proceedings ArticleDOI
17 Oct 2005
TL;DR: The method is fast, does not require video alignment and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.
Abstract: Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et al. (2004) for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video

2,186 citations


Additional excerpts

  • ...For the Weizmann dataset, slightly higher accuracy is achieved with a structural average based method [20]....

    [...]

  • ...To evaluate the robustness of the proposed SWFHOG algorithm to high regularities like occlusion, unusual way of performing the action, varied background and view angles, Weizmann robustness dataset is used for testing....

    [...]

  • ...% RECOGNITION ACCURACY ACHIEVED WITH SWF VARIANTS % Recognition Accuracy Weizmann KTH UT1 UT2 UCF SWF_A 97.6 95 97.33 97.83 96.2 SWF_D 97 93.83 97.00 97.83 95.6 SWF_AD 97.6 94.33 97.67 98.33 96.2 SWFHOG 98.5 97.5 98.33 98.67 96.8 The graph in Fig....

    [...]

  • ...The Weizmann camera view angle change dataset is having a videos of a walking action recorded with ten different camera view angles ranging from 00 to 900....

    [...]

  • ...Weizmann robustness testing and camera view angle change dataset are specifically recorded with some challenges....

    [...]

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This paper generalizes the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data, and analyzes the response of the filter in the frequency domain to avoid the high computational cost commonly incurred in template-based approaches.
Abstract: In this paper we introduce a template-based method for recognizing human actions called action MACH. Our approach is based on a maximum average correlation height (MACH) filter. A common limitation of template-based methods is their inability to generate a single template using a collection of examples. MACH is capable of capturing intra-class variability by synthesizing a single Action MACH filter for a given action class. We generalize the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data. By analyzing the response of the filter in the frequency domain, we avoid the high computational cost commonly incurred in template-based approaches. Vector valued data is analyzed using the Clifford Fourier transform, a generalization of the Fourier transform intended for both scalar and vector-valued data. Finally, we perform an extensive set of experiments and compare our method with some of the most recent approaches in the field by using publicly available datasets, and two new annotated human action datasets which include actions performed in classic feature films and sports broadcast television.

1,316 citations


Additional excerpts

  • ...UCF Sports dataset [28, 29] has video clips recorded at various sports events and is a realistic dataset....

    [...]