scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Review on Human Activity Recognition Using Vision-Based Method

20 Jul 2017-Journal of Healthcare Engineering (Hindawi)-Vol. 2017, Iss: 2017, pp 3090343-3090343
TL;DR: This review highlights the advances of state-of-the-art activity recognition approaches, especially for the activity representation and classification methods, and classify existing literatures with a detailed taxonomy including representation and Classification methods, as well as the datasets they used.
Abstract: Human activity recognition (HAR) aims to recognize activities from a series of observations on the actions of subjects and the environmental conditions. The vision-based HAR research is the basis of many applications including video surveillance, health care, and human-computer interaction (HCI). This review highlights the advances of state-of-the-art activity recognition approaches, especially for the activity representation and classification methods. For the representation methods, we sort out a chronological research trajectory from global representations to local representations, and recent depth-based representations. For the classification methods, we conform to the categorization of template-based methods, discriminative models, and generative models and review several prevalent methods. Next, representative and available datasets are introduced. Aiming to provide an overview of those methods and a convenient way of comparing them, we classify existing literatures with a detailed taxonomy including representation and classification methods, as well as the datasets they used. Finally, we investigate the directions for future research.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This survey aims to provide a more comprehensive introduction to Sensor-based human activity recognition (HAR) in terms of sensors, activities, data pre-processing, feature learning and classification, including both conventional approaches and deep learning methods.
Abstract: Increased life expectancy coupled with declining birth rates is leading to an aging population structure. Aging-caused changes, such as physical or cognitive decline, could affect people's quality of life, result in injuries, mental health or the lack of physical activity. Sensor-based human activity recognition (HAR) is one of the most promising assistive technologies to support older people's daily life, which has enabled enormous potential in human-centred applications. Recent surveys in HAR either only focus on the deep learning approaches or one specific sensor modality. This survey aims to provide a more comprehensive introduction for newcomers and researchers to HAR. We first introduce the state-of-art sensor modalities in HAR. We look more into the techniques involved in each step of wearable sensor modality centred HAR in terms of sensors, activities, data pre-processing, feature learning and classification, including both conventional approaches and deep learning methods. In the feature learning section, we focus on both hand-crafted features and automatically learned features using deep networks. We also present the ambient-sensor-based HAR, including camera-based systems, and the systems which combine the wearable and ambient sensors. Finally, we identify the corresponding challenges in HAR to pose research problems for further improvement in HAR.

195 citations

Journal ArticleDOI
TL;DR: The object of this study was to systematically review the literature on machine and deep learning for sport-specific movement recognition using inertial measurement unit (IMU) and, or computer vision data inputs.
Abstract: Objective assessment of an athlete’s performance is of importance in elite sports to facilitate detailed analysis. The implementation of automated detection and recognition of sport-specific moveme...

147 citations


Cites background or methods from "A Review on Human Activity Recognit..."

  • ...Vision-based methods for human activity recognition (Aggarwal & Xia, 2014; Bux et al., 2017; Ke et al., 2013; Zhang et al., 2017), semantic human activity recognition (Ziaeefard & Bergevin, 2015) and motion analysis in sport (Barris & Button, 2008) have also been reviewed....

    [...]

  • ...Several challenges including occlusion, viewpoint variations, and environmental conditions may impact results, depending on the camera set-up (Poppe, 2010; Zhang et al., 2017)....

    [...]

Journal ArticleDOI
26 Feb 2021-Sensors
TL;DR: In this article, the authors proposed a generic HAR framework for smartphone sensor data, based on Long Short-Term Memory (LSTM) networks for time-series domains, and a hybrid LSTM network was proposed to improve recognition performance.
Abstract: Human Activity Recognition (HAR) employing inertial motion data has gained considerable momentum in recent years, both in research and industrial applications. From the abstract perspective, this has been driven by an acceleration in the building of intelligent and smart environments and systems that cover all aspects of human life including healthcare, sports, manufacturing, commerce, etc. Such environments and systems necessitate and subsume activity recognition, aimed at recognizing the actions, characteristics, and goals of one or more individuals from a temporal series of observations streamed from one or more sensors. Due to the reliance of conventional Machine Learning (ML) techniques on handcrafted features in the extraction process, current research suggests that deep-learning approaches are more applicable to automated feature extraction from raw sensor data. In this work, the generic HAR framework for smartphone sensor data is proposed, based on Long Short-Term Memory (LSTM) networks for time-series domains. Four baseline LSTM networks are comparatively studied to analyze the impact of using different kinds of smartphone sensor data. In addition, a hybrid LSTM network called 4-layer CNN-LSTM is proposed to improve recognition performance. The HAR method is evaluated on a public smartphone-based dataset of UCI-HAR through various combinations of sample generation processes (OW and NOW) and validation protocols (10-fold and LOSO cross validation). Moreover, Bayesian optimization techniques are used in this study since they are advantageous for tuning the hyperparameters of each LSTM network. The experimental results indicate that the proposed 4-layer CNN-LSTM network performs well in activity recognition, enhancing the average accuracy by up to 2.24% compared to prior state-of-the-art approaches.

106 citations

Journal ArticleDOI
26 Apr 2018-Sensors
TL;DR: Given the obtained results, the rule-based systems represent a promising research line as they perform similarly to neural networks, but with a reduced computational cost, and support vector machines performed with a high specificity.
Abstract: Fall detection is a very important challenge that affects both elderly people and the carers. Improvements in fall detection would reduce the aid response time. This research focuses on a method for fall detection with a sensor placed on the wrist. Falls are detected using a published threshold-based solution, although a study on threshold tuning has been carried out. The feature extraction is extended in order to balance the dataset for the minority class. Alternative models have been analyzed to reduce the computational constraints so the solution can be embedded in smart-phones or smart wristbands. Several published datasets have been used in the Materials and Methods section. Although these datasets do not include data from real falls of elderly people, a complete comparison study of fall-related datasets shows statistical differences between the simulated falls and real falls from participants suffering from impairment diseases. Given the obtained results, the rule-based systems represent a promising research line as they perform similarly to neural networks, but with a reduced computational cost. Furthermore, support vector machines performed with a high specificity. However, further research to validate the proposal in real on-line scenarios is needed. Furthermore, a slight improvement should be made to reduce the number of false alarms.

105 citations


Cites background from "A Review on Human Activity Recognit..."

  • ..., video systems [6]; nevertheless, the use of wearable devices is crucial because of the high percentage of elderly people and their desire to live autonomously in their own house [7]....

    [...]

Journal ArticleDOI
01 Mar 2020
TL;DR: This paper investigates an overview of the existing methods according to the kind of issue they address, and presents a comparison of the already introduced datasets introduced for the human action recognition field.
Abstract: Within a large range of applications in computer vision, Human Action Recognition has become one of the most attractive research fields. Ambiguities in recognizing actions does not only come from the difficulty to define the motion of body parts, but also from many other challenges related to real world problems such as camera motion, dynamic background, and bad weather conditions. There has been little research work in the real world conditions of human action recognition systems, which encourages us to seriously search in this application domain. Although a plethora of robust approaches have been introduced in the literature, they are still insufficient to fully cover the challenges. To quantitatively and qualitatively compare the performance of these methods, public datasets that present various actions under several conditions and constraints are recorded. In this paper, we investigate an overview of the existing methods according to the kind of issue they address. Moreover, we present a comparison of the existing datasets introduced for the human action recognition field.

103 citations

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"A Review on Human Activity Recognit..." refers background in this paper

  • ...Dalal and Triggs [48] proposed the histogram of oriented gradients (HOG) descriptor and achieved great success in human detection with linear SVM classifier....

    [...]

Journal ArticleDOI
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

23,396 citations


"A Review on Human Activity Recognit..." refers methods in this paper

  • ...The authors first match feature points using two complementary descriptors (i.e., SURF and dense optical flow), then estimate the homography using RANSC [54]....

    [...]

  • ..., SURF and dense optical flow), then estimate the homography using RANSC [54]....

    [...]