scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Novel Approach for Human Action Recognition from Silhouette Images

04 Mar 2017-Iete Journal of Research (Taylor & Francis)-Vol. 63, Iss: 2, pp 160-171
TL;DR: In this paper, the authors proposed a novel human action recognition (HAR) technique for human silhouette sequence based on spatio-temporal body parts movement (STBPM) and action-code classification (ACC).
Abstract: In this work, we propose a novel human action recognition (HAR) technique for human silhouette sequence based on spatio-temporal body parts movement (STBPM) and action-code classification (ACC). STBPM feature is designed to accumulate the signature of the activity of several body parts to accomplish any action. ACC is a code-based classifier for HAR, which needs no training and the codes of any action is created by analyzing the STBPM features. The proposed approach is view independent except the top view and scale invariant. The experimental results on publicly available Weizmann, MuHVAi, and IXMAS datasets clearly show that our proposed technique outperforms the related research works in terms of accuracy in the human action detection.
Citations
More filters
Journal ArticleDOI
TL;DR: This research proposes a hybrid strategy for efficient classification of human activities from a given video sequence by integrating four major steps: segment the moving objects by fusing novel uniform segmentation and expectation maximization, extract a new set of fused features using local binary patterns with histogram oriented gradient and Harlick features, and feature classification using multi-class support vector machine.
Abstract: Human activity monitoring in the video sequences is an intriguing computer vision domain which incorporates colossal applications, e.g., surveillance systems, human-computer interaction, and traffic control systems. In this research, our primary focus is in proposing a hybrid strategy for efficient classification of human activities from a given video sequence. The proposed method integrates four major steps: (a) segment the moving objects by fusing novel uniform segmentation and expectation maximization, (b) extract a new set of fused features using local binary patterns with histogram oriented gradient and Harlick features, (c) feature selection by novel Euclidean distance and joint entropy-PCA-based method, and (d) feature classification using multi-class support vector machine. The three benchmark datasets (MIT, CAVIAR, and BMW-10) are used for training the classifier for human classification; and for testing, we utilized multi-camera pedestrian videos along with MSR Action dataset, INRIA, and CASIA dataset. Additionally, the results are also validated using dataset recorded by our research group. For action recognition, four publicly available datasets are selected such as Weizmann, KTH, UIUC, and Muhavi to achieve recognition rates of 95.80, 99.30, 99, and 99.40%, respectively, which confirm the authenticity of our proposed work. Promising results are achieved in terms of greater precision compared to existing techniques.

105 citations

Journal ArticleDOI
TL;DR: A novel human action recognition method is contributed by embedding the proposed frames fusion working on the principle of pixels similarity into the existing techniques for recognition rate and trueness.
Abstract: In video sequences, human action recognition is a challenging problem due to motion variation, in frame person difference, and setting of video recording in the field of computer vision. Since last few years, applications of human activity recognition have increased significantly. In the literature, many techniques are implemented for human action recognition, but still they face problem in contrast of foreground region, segmentation, feature extraction, and feature selection. This article contributes a novel human action recognition method by embedding the proposed frames fusion working on the principle of pixels similarity. An improved hybrid feature extraction increases the recognition rate and allows efficient classification in the complex environment. The design consists of four phases, (a) enhancement of video frames (b) threshold-based background subtraction and construction of saliency map (c) feature extraction and selection (d) neural network (NN) for human action classification. Results have been tested using five benchmark datasets including Weizmann, KTH, UIUC, Muhavi, and WVU and obtaining recognition rate 97.2, 99.8, 99.4, 99.9, and 99.9%, respectively. Contingency table and graphical curves support our claims. Comparison with existent techniques identifies the recognition rate and trueness of our proposed method.

81 citations

Journal ArticleDOI
TL;DR: This article considers the problems related to multiple human detection and classification using novel statistical weighted segmentation and rank correlation-based feature selection approach and proves the significance of proposed compared to other techniques.
Abstract: Human action recognition from a video sequence has received much attention lately in the field of computer vision due to its range of applications in surveillance, healthcare, smart homes, tele-immersion, to name but a few. However, it is still facing several challenges such as human variations, occlusion, change in illumination, complex background. In this article, we consider the problems related to multiple human detection and classification using novel statistical weighted segmentation and rank correlation-based feature selection approach. Initially, preprocessing is performed on a set of frames to remove existing noise and to make the foreground maximal differentiable compared to the background. A novel weighted segmentation method is also introduced for human extraction prior to feature extraction. Ternary features are exploited including color, shape, and texture, which are later combined using serial-based features fusion method. To avoid redundancy, rank correlation-based feature selection technique is employed, which acts as a feature optimizer and leads to improved classification accuracy. The proposed method is validated on six datasets including Weizmann, KTH, Muhavi, WVU, UCF sports, and MSR action and validated based on seven performance measures. A fair comparison with existing work is also provided which proves the significance of proposed compared to other techniques.

42 citations

Journal ArticleDOI
TL;DR: A new foreground detection architecture based on information extracted from the Gaussian mixture model incorporating with the uniform motion of Magnitude of Optical Flow (MOF) is introduced, and a new technique of representations to construct an informative concept for human action recognition based on the superposition of human silhouettes is presented.
Abstract: The recognition of human actions in a video sequence still remains a challenging task in the computer vision community. Several techniques have been proposed until today such as silhouette detection, local space-time features and optical flow techniques. In this paper, a supervised way followed by an unsupervised learning using the principle of the auto-encoder is proposed to address the problem. We introduce a new foreground detection architecture based on information extracted from the Gaussian mixture model (GMM) incorporating with the uniform motion of Magnitude of Optical Flow (MOF). Thus, we use a fast dynamic frame skipping technique to avoid frames that contain irrelevant motion, making it possible to decrease the computational complexity of silhouette extraction. Furthermore a new technique of representations to construct an informative concept for human action recognition based on the superposition of human silhouettes is presented. We called this approach history of binary motion image (HBMI).Our method has been evaluated by a classification on the Ixmas, Weizmann, and KTH datasets, the Sparce Stacked Auto-encoder (SSAE), an instance of a deep learning strategy, is presented for efficient human activities detection and the Softmax (SMC) for the classification. The objective of this classifier in deep learning is the learning of function hierarchies with higher-level functions at lower-level functions of the hierarchy to provide an agile, robust and simple method. The results prove the efficiency of our proposed approach with respect to the irregularity in the performance of an action shape distortion, change of point of view as well as significant changes of scale.

36 citations

Journal ArticleDOI
TL;DR: An improved cascaded design for human motion analysis is presented; it consolidates four phases: acquisition and preprocessing, frame segmentation, features extraction and dimensionality reduction, and classification.
Abstract: Human motion analysis has received a lot of attention in the computer vision community during the last few years. This research domain is supported by a wide spectrum of applications including video surveillance, patient monitoring systems, and pedestrian detection, to name a few. In this study, an improved cascaded design for human motion analysis is presented; it consolidates four phases: (i) acquisition and preprocessing, (ii) frame segmentation, (iii) features extraction and dimensionality reduction, and (iv) classification. The implemented architecture takes advantage of CIE-Lab and National Television System Committee colour spaces, and also performs contrast stretching using the proposed red–green–blue* colour space enhancement technique. A parallel design utilising attention-based motion estimation and segmentation module is also proposed in order to avoid the detection of false moving regions. In addition to these contributions, the proposed feature selection technique called entropy controlled principal components with weights minimisation, further improves the classification accuracy. The authors claims are supported with a comparison between six state-of-the-art classifiers tested on five standard benchmark data sets including Weizmann, KTH, UIUC, Muhavi, and WVU, where the results reveal an improved correct classification rate of 96.55, 99.50, 99.40, 100, and 100%, respectively.

19 citations

References
More filters
Proceedings ArticleDOI
23 Jun 2008
TL;DR: A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset.
Abstract: The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.

3,833 citations

Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
Abstract: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.

3,238 citations

Journal ArticleDOI
TL;DR: A view-based approach to the representation and recognition of human movement is presented, and a recognition method matching temporal templates against stored instances of views of known actions is developed.
Abstract: A view-based approach to the representation and recognition of human movement is presented. The basis of the representation is a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence. Using aerobics exercises as a test domain, we explore the representational power of a simple, two component version of the templates: The first value is a binary value indicating the presence of motion and the second value is a function of the recency of motion in a sequence. We then develop a recognition method matching temporal templates against stored instances of views of known actions. The method automatically performs temporal segmentation, is invariant to linear changes in speed, and runs in real-time on standard platforms.

2,932 citations

Proceedings ArticleDOI
15 Oct 2005
TL;DR: It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
Abstract: A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.

2,699 citations

Proceedings ArticleDOI
17 Oct 2005
TL;DR: The method is fast, does not require video alignment and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.
Abstract: Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et al. (2004) for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video

2,186 citations