scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An Improved Model for Human Activity Recognition by Integrated feature Approach and Optimized SVM

TL;DR: In this paper, an integrated feature approach using Histogram of Gradient (HOG) local feature descriptor and Principal Component Analysis (PCA) as a global feature is proposed in order to automatically recognize human activity from a video sequence.
Abstract: Lately, human activity recognition has pulled in an expanding measure of consideration from research and industry networks. Human activity recognition is the consistently sprouting exploration zone as it finds magnificent utilizations in surveillance, healthcare, and many real-life problems. This paper presents a technique to automatically recognize human activity from a video sequence. An integrated features approach using Histogram of Gradient (HOG) local feature descriptor and Principal Component Analysis (PCA) as a global feature is proposed in this paper. Optimized Support Vector Machine(SVM), Artificial Neural Network (ANN) used as a classifier. The proposed model is trained and tested on Benchmark KTH dataset, results obtained are comparable with existing methods. The proposed technique achieved the activity recognition accuracy of 99.21 percent. The experimental results confirms that the embedded feature approach and optimization techniques for classifier improves the performance of human activity recognition
Citations
More filters
Journal ArticleDOI
01 Jun 2022-Entropy
TL;DR: In this paper , the authors improved the traditional cuckoo algorithm to optimize not only continuous hyper-parameters, but also integer hyperparameters and mixed-hyperparameters.
Abstract: Activity recognition methods often include some hyper-parameters based on experience, which greatly affects their effectiveness in activity recognition. However, the existing hyper-parameter optimization algorithms are mostly for continuous hyper-parameters, and rarely for the optimization of integer hyper-parameters and mixed hyper-parameters. To solve the problem, this paper improved the traditional cuckoo algorithm. The improved algorithm can optimize not only continuous hyper-parameters, but also integer hyper-parameters and mixed hyper-parameters. This paper validated the proposed method with the hyper-parameters in Least Squares Support Vector Machine (LS-SVM) and Long-Short-Term Memory (LSTM), and compared the activity recognition effects before and after optimization on the smart home activity recognition data set. The results show that the improved cuckoo algorithm can effectively improve the performance of the model in activity recognition.

6 citations

Journal ArticleDOI
Aihua Sun1
TL;DR: In this article , video shortening is utilized to discover key frames and select relevant sequences in two datasets, where selected frames are resized using adaptive frame cropping, where this image is given as an input to the proposed Convolutional Neural Network (CNN).
Abstract: Human Activity Recognition (HAR) is an arena that looks at uncooked time-series signals from embedded sensors in smartphones and wearable devices to figure out what people are doing. It has become very popular in a lot of smart home environments, especially to keep an eye on people’s behavior in ambient assisted living to help the elderly and get them back on their feet. The system goes through a series of steps to get data, remove noise and distortions, find features, choose features, and classify them. Recently, a number of state-of-the-art techniques have proposed ways to extract and choose features from data. These techniques have been classified using traditional Machine learning. Although many techniques use simple feature extraction processes, they are unable to identify complicated actions. Many HAR systems now use Deep Learning algorithms to swiftly discover and categorize characteristics as high-performance computers become more common. In this study, video shortening is utilized to discover key frames and select relevant sequences in two datasets. The selected frames are resized using adaptive frame cropping, where this image is given as an input to the proposed Convolutional Neural Network (CNN). The experiments are conducted on various activities in terms of classification rate. From the results, it is proved that CNN achieved 98.22% of classification rate on Weizmann action dataset.
References
More filters
Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
Abstract: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.

3,238 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A new ConvNet architecture for spatiotemporal fusion of video snippets is proposed, and its performance on standard benchmarks where this architecture achieves state-of-the-art results is evaluated.
Abstract: Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. We make the following findings: (i) that rather than fusing at the softmax layer, a spatial and temporal network can be fused at a convolution layer without loss of performance, but with a substantial saving in parameters, (ii) that it is better to fuse such networks spatially at the last convolutional layer than earlier, and that additionally fusing at the class prediction layer can boost accuracy, finally (iii) that pooling of abstract convolutional features over spatiotemporal neighbourhoods further boosts performance. Based on these studies we propose a new ConvNet architecture for spatiotemporal fusion of video snippets, and evaluate its performance on standard benchmarks where this architecture achieves state-of-the-art results.

2,057 citations

Journal ArticleDOI
TL;DR: This research proposes a hybrid strategy for efficient classification of human activities from a given video sequence by integrating four major steps: segment the moving objects by fusing novel uniform segmentation and expectation maximization, extract a new set of fused features using local binary patterns with histogram oriented gradient and Harlick features, and feature classification using multi-class support vector machine.
Abstract: Human activity monitoring in the video sequences is an intriguing computer vision domain which incorporates colossal applications, e.g., surveillance systems, human-computer interaction, and traffic control systems. In this research, our primary focus is in proposing a hybrid strategy for efficient classification of human activities from a given video sequence. The proposed method integrates four major steps: (a) segment the moving objects by fusing novel uniform segmentation and expectation maximization, (b) extract a new set of fused features using local binary patterns with histogram oriented gradient and Harlick features, (c) feature selection by novel Euclidean distance and joint entropy-PCA-based method, and (d) feature classification using multi-class support vector machine. The three benchmark datasets (MIT, CAVIAR, and BMW-10) are used for training the classifier for human classification; and for testing, we utilized multi-camera pedestrian videos along with MSR Action dataset, INRIA, and CASIA dataset. Additionally, the results are also validated using dataset recorded by our research group. For action recognition, four publicly available datasets are selected such as Weizmann, KTH, UIUC, and Muhavi to achieve recognition rates of 95.80, 99.30, 99, and 99.40%, respectively, which confirm the authenticity of our proposed work. Promising results are achieved in terms of greater precision compared to existing techniques.

105 citations

Proceedings ArticleDOI
01 Jan 2019
TL;DR: Experimental results show that the proposed methodology is reliable at complex realistic settings and applicable in security systems, e-learning, smart surveillance, violence detection, child abuse protection, elderly care, virtual games, intelligent video retrievals and human computer interaction.
Abstract: Tracking human activities and analyzing their effect in real life settings has become a task of high interest within the computer vision field as it enable many industrial and commercial applications. Sensors and communication technologies are being used for capturing human movements and providing interactive interfaces for man-machine collaboration. However, introducing intelligent computing for better scene understanding is still an unexplored domain for the researchers. In this paper, we have made an effort to allow machines understand the behaviors in outer environment by proposing a novel methodology to recognize human interactions. The objective of this research is to embed cognitive processes in information technologies for exploring new directions of intelligent media. Our proposed human activity recognition (HAR) system recognize eight complex human activities taken from BIT- Interaction dataset i.e. bow, boxing, handshake, high-five, hug, kick, pat and push. We have designed multiple features algorithms along with convolutional neural network (CNN) to evaluate the performance of our system compared with other state of the art classifiers. While, experimental results show that the proposed methodology is reliable at complex realistic settings and applicable in security systems, e-learning, smart surveillance, violence detection, child abuse protection, elderly care, virtual games, intelligent video retrievals and human computer interaction.

83 citations

Journal ArticleDOI
TL;DR: A novel human action recognition method is contributed by embedding the proposed frames fusion working on the principle of pixels similarity into the existing techniques for recognition rate and trueness.
Abstract: In video sequences, human action recognition is a challenging problem due to motion variation, in frame person difference, and setting of video recording in the field of computer vision. Since last few years, applications of human activity recognition have increased significantly. In the literature, many techniques are implemented for human action recognition, but still they face problem in contrast of foreground region, segmentation, feature extraction, and feature selection. This article contributes a novel human action recognition method by embedding the proposed frames fusion working on the principle of pixels similarity. An improved hybrid feature extraction increases the recognition rate and allows efficient classification in the complex environment. The design consists of four phases, (a) enhancement of video frames (b) threshold-based background subtraction and construction of saliency map (c) feature extraction and selection (d) neural network (NN) for human action classification. Results have been tested using five benchmark datasets including Weizmann, KTH, UIUC, Muhavi, and WVU and obtaining recognition rate 97.2, 99.8, 99.4, 99.9, and 99.9%, respectively. Contingency table and graphical curves support our claims. Comparison with existent techniques identifies the recognition rate and trueness of our proposed method.

81 citations