scispace - formally typeset
Search or ask a question

Showing papers on "Activity recognition published in 2020"


Journal ArticleDOI
TL;DR: This work introduces a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames, and investigates a novel one-shot 3D activity recognition problem on this dataset.
Abstract: Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.

837 citations


Journal ArticleDOI
Yiqiang Chen1, Xin Qin1, Jindong Wang2, Chaohui Yu1, Wen Gao 
TL;DR: FedHealth is proposed, the first federated transfer learning framework for wearable healthcare that performs data aggregation through federated learning, and then builds relatively personalized models by transfer learning.
Abstract: With the rapid development of computing technology, wearable devices make it easy to get access to people's health information. Smart healthcare achieves great success by training machine learning models on a large quantity of user personal data. However, there are two critical challenges. First, user data often exist in the form of isolated islands, making it difficult to perform aggregation without compromising privacy security. Second, the models trained on the cloud fail on personalization. In this article, we propose FedHealth, the first federated transfer learning framework for wearable healthcare to tackle these challenges. FedHealth performs data aggregation through federated learning, and then builds relatively personalized models by transfer learning. Wearable activity recognition experiments and real Parkinson's disease auxiliary diagnosis application have evaluated that FedHealth is able to achieve accurate and personalized healthcare without compromising privacy and security. FedHealth is general and extensible in many healthcare applications.

486 citations


Posted Content
TL;DR: This study presents a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition and proposes a new taxonomy to structure the deep methods by challenges.
Abstract: The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions.

255 citations


Journal ArticleDOI
TL;DR: This paper proposes a pattern-balanced semisupervised framework to extract and preserve diverse latent patterns of activities from multimodal wearable sensory data, and exploits the independence of multi-modalities of sensory data and attentively identify salient regions that are indicative of human activities from inputs by the authors' recurrent convolutional attention networks.
Abstract: Recent years have witnessed the success of deep learning methods in human activity recognition (HAR). The longstanding shortage of labeled activity data inherently calls for a plethora of semisupervised learning methods, and one of the most challenging and common issues with semisupervised learning is the imbalanced distribution of labeled data over classes. Although the problem has long existed in broad real-world HAR applications, it is rarely explored in the literature. In this paper, we propose a semisupervised deep model for imbalanced activity recognition from multimodal wearable sensory data. We aim to address not only the challenges of multimodal sensor data (e.g., interperson variability and interclass similarity) but also the limited labeled data and class-imbalance issues simultaneously. In particular, we propose a pattern-balanced semisupervised framework to extract and preserve diverse latent patterns of activities. Furthermore, we exploit the independence of multi-modalities of sensory data and attentively identify salient regions that are indicative of human activities from inputs by our recurrent convolutional attention networks. Our experimental results demonstrate that the proposed model achieves a competitive performance compared to a multitude of state-of-the-art methods, both semisupervised and supervised ones, with 10% labeled training data. The results also show the robustness of our method over imbalanced, small training data sets.

245 citations


Journal ArticleDOI
TL;DR: The existing wireless sensing systems are surveyed in terms of their basic principles, techniques and system structures to describe how the wireless signals could be utilized to facilitate an array of applications including intrusion detection, room occupancy monitoring, daily activity recognition, gesture recognition, vital signs monitoring, user identification and indoor localization.
Abstract: With the advancement of wireless technologies and sensing methodologies, many studies have shown the success of re-using wireless signals (e.g., WiFi) to sense human activities and thereby realize a set of emerging applications, ranging from intrusion detection, daily activity recognition, gesture recognition to vital signs monitoring and user identification involving even finer-grained motion sensing. These applications arguably can brace various domains for smart home and office environments, including safety protection, well-being monitoring/management, smart healthcare and smart-appliance interaction. The movements of the human body impact the wireless signal propagation (e.g., reflection, diffraction and scattering), which provide great opportunities to capture human motions by analyzing the received wireless signals. Researchers take the advantage of the existing wireless links among mobile/smart devices (e.g., laptops, smartphones, smart thermostats, smart refrigerators and virtual assistance systems) by either extracting the ready-to-use signal measurements or adopting frequency modulated signals to detect the frequency shift. Due to the low-cost and non-intrusive sensing nature, wireless-based human activity sensing has drawn considerable attention and become a prominent research field over the past decade. In this paper, we survey the existing wireless sensing systems in terms of their basic principles, techniques and system structures. Particularly, we describe how the wireless signals could be utilized to facilitate an array of applications including intrusion detection, room occupancy monitoring, daily activity recognition, gesture recognition, vital signs monitoring, user identification and indoor localization. The future research directions and limitations of using wireless signals for human activity sensing are also discussed.

185 citations


Proceedings ArticleDOI
01 Feb 2020
TL;DR: This paper proposes a holistic deep learning-based activity recognition architecture, a convolutional neural network-long short-term memory network (CNN-LSTM), which improves the predictive accuracy of human activities from raw data but also reduces the complexity of the model while eliminating the need for advanced feature engineering.
Abstract: To understand human behavior and intrinsically anticipate human intentions, research into human activity recognition HAR) using sensors in wearable and handheld devices has intensified. The ability for a system to use as few resources as possible to recognize a user's activity from raw data is what many researchers are striving for. In this paper, we propose a holistic deep learning-based activity recognition architecture, a convolutional neural network-long short-term memory network (CNN-LSTM). This CNN-LSTM approach not only improves the predictive accuracy of human activities from raw data but also reduces the complexity of the model while eliminating the need for advanced feature engineering. The CNN-LSTM network is both spatially and temporally deep. Our proposed model achieves a 99% accuracy on the iSPL dataset, an internal dataset, and a 92 % accuracy on the UCI HAR public dataset. We also compared its performance against other approaches. It competes favorably against other deep neural network (DNN) architectures that have been proposed in the past and against machine learning models that rely on manually engineered feature datasets.

173 citations


Journal ArticleDOI
TL;DR: This article focuses on the deep-learning-enhanced HAR in IoHT environments, and a semisupervised deep learning framework is designed and built for more accurate HAR, which efficiently uses and analyzes the weakly labeled sensor data to train the classifier learning model.
Abstract: Along with the advancement of several emerging computing paradigms and technologies, such as cloud computing, mobile computing, artificial intelligence, and big data, Internet of Things (IoT) technologies have been applied in a variety of fields. In particular, the Internet of Healthcare Things (IoHT) is becoming increasingly important in human activity recognition (HAR) due to the rapid development of wearable and mobile devices. In this article, we focus on the deep-learning-enhanced HAR in IoHT environments. A semisupervised deep learning framework is designed and built for more accurate HAR, which efficiently uses and analyzes the weakly labeled sensor data to train the classifier learning model. To better solve the problem of the inadequately labeled sample, an intelligent autolabeling scheme based on deep $Q$ -network (DQN) is developed with a newly designed distance-based reward rule which can improve the learning efficiency in IoT environments. A multisensor based data fusion mechanism is then developed to seamlessly integrate the on-body sensor data, context sensor data, and personal profile data together, and a long short-term memory (LSTM)-based classification method is proposed to identify fine-grained patterns according to the high-level features contextually extracted from the sequential motion data. Finally, experiments and evaluations are conducted to demonstrate the usefulness and effectiveness of the proposed method using real-world data.

171 citations


Journal ArticleDOI
TL;DR: In this article, the authors focused on critical role of machine learning in developing HAR applications based on inertial sensors in conjunction with physiological and environmental sensors, which is considered as one of the most promising assistive technology tools to support elderly's daily life by monitoring their cognitive and physical function through daily activities.
Abstract: In the last decade, Human Activity Recognition (HAR) has become a vibrant research area, especially due to the spread of electronic devices such as smartphones, smartwatches and video cameras present in our daily lives. In addition, the advance of deep learning and other machine learning algorithms has allowed researchers to use HAR in various domains including sports, health and well-being applications. For example, HAR is considered as one of the most promising assistive technology tools to support elderly’s daily life by monitoring their cognitive and physical function through daily activities. This survey focuses on critical role of machine learning in developing HAR applications based on inertial sensors in conjunction with physiological and environmental sensors.

168 citations


Journal ArticleDOI
TL;DR: In this article, a deep neural network architecture for human activity recognition based on multiple sensor data is proposed, which encodes the time series of sensor data as images and leverages these transformed images to retain the necessary features for activity recognition.

123 citations


Journal ArticleDOI
06 Jan 2020-Sensors
TL;DR: This research has proposed a hybrid method feature selection process, which includes a filter and wrapper method, that works efficiently with limited hardware resource and provides satisfactory activity identification.
Abstract: Human activity recognition (HAR) techniques are playing a significant role in monitoring the daily activities of human life such as elderly care, investigation activities, healthcare, sports, and smart homes. Smartphones incorporated with varieties of motion sensors like accelerometers and gyroscopes are widely used inertial sensors that can identify different physical conditions of human. In recent research, many works have been done regarding human activity recognition. Sensor data of smartphone produces high dimensional feature vectors for identifying human activities. However, all the vectors are not contributing equally for identification process. Including all feature vectors create a phenomenon known as ‘curse of dimensionality’. This research has proposed a hybrid method feature selection process, which includes a filter and wrapper method. The process uses a sequential floating forward search (SFFS) to extract desired features for better activity recognition. Features are then fed to a multiclass support vector machine (SVM) to create nonlinear classifiers by adopting the kernel trick for training and testing purpose. We validated our model with a benchmark dataset. Our proposed system works efficiently with limited hardware resource and provides satisfactory activity identification.

122 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of the work conducted over the period 2010-2018 in various areas of human activity recognition with main focus on device-free solutions and proposes a new taxonomy for categorizing the research work conducted in the field of activity recognition.

Journal ArticleDOI
TL;DR: In this paper, a novel cloud-edge based federated learning framework for in-home health monitoring is proposed, which learns a shared global model in the cloud from multiple homes at the network edges and achieves data privacy protection by keeping user data locally.
Abstract: In-home health monitoring has attracted great attention for the ageing population worldwide. With the abundant user health data accessed by Internet of Things (IoT) devices and recent development in machine learning, smart healthcare has seen many successful stories. However, existing approaches for in-home health monitoring do not pay sufficient attention to user data privacy and thus are far from being ready for large-scale practical deployment. In this paper, we propose FedHome, a novel cloud-edge based federated learning framework for in-home health monitoring, which learns a shared global model in the cloud from multiple homes at the network edges and achieves data privacy protection by keeping user data locally. To cope with the imbalanced and non-IID distribution inherent in user's monitoring data, we design a generative convolutional autoencoder (GCAE), which aims to achieve accurate and personalized health monitoring by refining the model with a generated class-balanced dataset from user's personal data. Besides, GCAE is lightweight to transfer between the cloud and edges, which is useful to reduce the communication cost of federated learning in FedHome. Extensive experiments based on realistic human activity recognition data traces corroborate that FedHome significantly outperforms existing widely-adopted methods.

Journal ArticleDOI
TL;DR: A novel hybrid fusion scheme is proposed to combine soft and hard fusion to push the classification performances to approximately 96% accuracy in identifying continuous activities and fall events.
Abstract: This paper presents a framework based on multi-layer bi-LSTM network (bidirectional Long Short-Term Memory) for multimodal sensor fusion to sense and classify daily activities’ patterns and high-risk events such as falls. The data collected in this work are continuous activity streams from FMCW radar and three wearable inertial sensors on the wrist, waist, and ankle. Each activity has a variable duration in the data stream so that the transitions between activities can happen at random times within the stream, without resorting to conventional fixed-duration snapshots. The proposed bi-LSTM implements soft feature fusion between wearable sensors and radar data, as well as two robust hard-fusion methods using the confusion matrices of both sensors. A novel hybrid fusion scheme is then proposed to combine soft and hard fusion to push the classification performances to approximately 96% accuracy in identifying continuous activities and fall events. These fusion schemes implemented with the proposed bi-LSTM network are compared with conventional sliding window approach, and all are validated with realistic “leaving one participant out” (L1PO) method (i.e. testing subjects unknown to the classifier). The developed hybrid-fusion approach is capable of stabilizing the classification performance among different participants in terms of reducing accuracy variance of up to 18.1% and increasing minimum, worst-case accuracy up to 16.2%.

Journal ArticleDOI
TL;DR: This work proposes a novel approach that utilizes the convolutional neural networks (CNNs) and the attention mechanism for HAR that is improved by incorporating attention into multihead CNNs for better feature extraction and selection.
Abstract: Together with the fast advancement of the Internet of Things (IoT), smart healthcare applications and systems are equipped with increasingly more wearable sensors and mobile devices. These sensors are used not only to collect data but also, and more importantly, to assist in daily activity tracking and analyzing of their users. Various human activity recognition (HAR) approaches are used to enhance such tracking. Most of the existing HAR methods depend on exploratory case-based shallow feature learning architectures, which struggle with correct activity recognition when put into real-life practice. To tackle this problem, we propose a novel approach that utilizes the convolutional neural networks (CNNs) and the attention mechanism for HAR. In the presented method, the activity recognition accuracy is improved by incorporating attention into multihead CNNs for better feature extraction and selection. Proof of concept experiments are conducted on a publicly available data set from wireless sensor data mining (WISDM) lab. The results demonstrate a higher accuracy of our proposed approach in comparison with the current methods.

Journal ArticleDOI
TL;DR: The Ada-HAR framework introduces an unsupervised online learning algorithm that is independent of the number of class constraints that is fastest method for modal evolution and can monitor human activity in real time, regardless of the direction of the smartphone.
Abstract: Human activity recognition (HAR) using smartphones provides significant healthcare guidance for telemedicine and long-term treatment. Machine learning and deep learning (DL) techniques are widely utilized for the scientific study of the statistical models of human behaviors. However, the performance of existing HAR platforms is limited by complex physical activity. In this article, we proposed an adaptive recognition and real-time monitoring system for human activities (Ada-HAR), which is expected to identify more human motions in dynamic situations. The Ada-HAR framework introduces an unsupervised online learning algorithm that is independent of the number of class constraints. Furthermore, the adopted hierarchical clustering and classification algorithms label and classify 12 activities (five dynamics, six statics, and a series of transitions) autonomously. Finally, practical experiments have been performed to validate the effectiveness and robustness of the proposed algorithms. Compared with the methods mentioned in the literature, the results show that the DL-based classifier obtains a higher recognition rate ( $\text{95.15}\%$ , waist, and $\text{92.20}\%$ , pocket). The decision-tree-based classifier is the fastest method for modal evolution. Finally, the Ada-HAR system can monitor human activity in real time, regardless of the direction of the smartphone.

Journal ArticleDOI
06 May 2020-Sensors
TL;DR: This paper demonstrates how human motions can be detected in a quasi-real-time scenario using a non-invasive method and produces a dataset that contains patterns of radio wave signals obtained using software-defined radios to establish if a subject is standing up or sitting down as a test case.
Abstract: Human motion detection is getting considerable attention in the field of Artificial Intelligence (AI) driven healthcare systems. Human motion can be used to provide remote healthcare solutions for vulnerable people by identifying particular movements such as falls, gait and breathing disorders. This can allow people to live more independent lifestyles and still have the safety of being monitored if more direct care is needed. At present wearable devices can provide real-time monitoring by deploying equipment on a person’s body. However, putting devices on a person’s body all the time makes it uncomfortable and the elderly tend to forget to wear them, in addition to the insecurity of being tracked all the time. This paper demonstrates how human motions can be detected in a quasi-real-time scenario using a non-invasive method. Patterns in the wireless signals present particular human body motions as each movement induces a unique change in the wireless medium. These changes can be used to identify particular body motions. This work produces a dataset that contains patterns of radio wave signals obtained using software-defined radios (SDRs) to establish if a subject is standing up or sitting down as a test case. The dataset was used to create a machine learning model, which was used in a developed application to provide a quasi-real-time classification of standing or sitting state. The machine-learning model was able to achieve 96.70% accuracy using the Random Forest algorithm using 10 fold cross-validation. A benchmark dataset of wearable devices was compared to the proposed dataset and results showed the proposed dataset to have similar accuracy of nearly 90%. The machine-learning models developed in this paper are tested for two activities but the developed system is designed and applicable for detecting and differentiating x number of activities.

Journal ArticleDOI
TL;DR: This paper proposes a layer-wise convolutional neural networks (CNN) with local loss for the use of HAR task, and is the first that uses local loss based CNN for HAR in ubiquitous and wearable computing arena.
Abstract: Recently, deep learning, which are able to extract automatically features from data, has achieved state-of-the-art performance across a variety of sensor based human activity recognition (HAR) tasks. However, the existing deep neural networks are usually trained with a global loss, and all hidden layer weights have to be always kept in memory before the forward and backward pass has completed. The backward locking phenomenon prevents the reuse of memory, which is a crucial limitation for wearable activity recognition. In the paper, we proposed a layer-wise convolutional neural networks (CNN) with local loss for the use of HAR task. To our knowledge, this paper is the first that uses local loss based CNN for HAR in ubiquitous and wearable computing arena. We performed experiments on five public HAR datasets including UCI HAR dataset, OPPOTUNITY dataset, UniMib-SHAR dataset, PAMAP dataset, and WISDM dataset. The results show that local loss works better than global loss for tested baseline architectures. At no extra cost, the local loss can approach the state-of-the-arts on a variety of HAR datasets, even though the number of parameters was smaller. We believe that the layer-wise CNN with local loss can be used to update the existing deep HAR methods.

Journal ArticleDOI
24 Jan 2020
TL;DR: A case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions evaluating its use with a large scale real-world dataset.
Abstract: The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specific expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (1) different topologies and parameters are assessed to identify the best candidate models for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (2) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data, balanced accuracy was 91.98% on the UCI-HAR dataset, and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset.

Journal ArticleDOI
TL;DR: The experiments show that the employment of personalization models improves, on average, the accuracy of machine learning algorithms, thus confirming the soundness of the approach and paving the way for future investigations on this topic.
Abstract: Recently, a significant amount of literature concerning machine learning techniques has focused on automatic recognition of activities performed by people. The main reason for this considerable interest is the increasing availability of devices able to acquire signals which, if properly processed, can provide information about human activities of daily living (ADL). The recognition of human activities is generally performed by machine learning techniques that process signals from wearable sensors and/or cameras appropriately arranged in the environment. Whatever the type of sensor, activities performed by human beings have a strong subjective characteristic that is related to different factors, such as age, gender, weight, height, physical abilities, and lifestyle. Personalization models have been studied to take into account these subjective factors and it has been demonstrated that using these models, the accuracy of machine learning algorithms can be improved. In this work we focus on the recognition of human activities using signals acquired by the accelerometer embedded in a smartphone. The contributions of this research are mainly three. A first contribution is the definition of a clear validation model that takes into account the problem of personalization and which thus makes it possible to objectively evaluate the performances of machine learning algorithms. A second contribution is the evaluation, on three different public datasets, of a personalization model which considers two aspects: the similarity between people related to physical aspects (age, weight, and height) and similarity related to intrinsic characteristics of the signals produced by these people when performing activities. A third and last contribution is the development of a personalization model that considers both the physical and signal similarities. The experiments show that the employment of personalization models improves, on average, the accuracy, thus confirming the soundness of the approach and paving the way for future investigations on this topic.

Journal ArticleDOI
TL;DR: Examination of how well different machine learning architectures make generalizations based on a new subject(s) by using Leave-One-Subject-Out Cross-Validation (LOSOCV) shows that CNN architecture with two convolutions and one-dimensional filter accompanied by a sliding window and vector magnitude, generalizes better than other architectures.
Abstract: Human Activity Recognition (HAR) has been attracting significant research attention because of the increasing availability of environmental and wearable sensors for collecting HAR data. In recent years, deep learning approaches have demonstrated a great success due to their ability to model complex systems. However, these models are often evaluated on the same subjects as those used to train the model; thus, the provided accuracy estimates do not pertain to new subjects. Occasionally, one or a few subjects are selected for the evaluation, but such estimates highly depend on the subjects selected for the evaluation. Consequently, this paper examines how well different machine learning architectures make generalizations based on a new subject(s) by using Leave-One-Subject-Out Cross-Validation (LOSOCV). Changing the subject used for the evaluation in each fold of the cross-validation, LOSOCV provides subject-independent estimate of the performance for new subjects. Six feed forward and convolutional neural network (CNN) architectures as well as four pre-processing scenarios have been considered. Results show that CNN architecture with two convolutions and one-dimensional filter accompanied by a sliding window and vector magnitude, generalizes better than other architectures. For the same CNN, the accuracy improves from 85.1% when evaluated with LOSOCV to 99.85% when evaluated with the traditional 10-fold cross-validation, demonstrating the importance of using LOSOCV for the evaluation.

Journal ArticleDOI
TL;DR: WiAct, a passive WiFi-based human activity recognition system, which explores the correlations between body movement and the amplitude information in Channel State Information (CSI) to classify different activities and achieves an average accuracy of 94.2% for distinguishing ten actions.
Abstract: Nowadays, human behavior recognition research plays a pivotal role in the field of human-computer interaction. However, comprehensive approaches mainly rely on video camera, ambient sensors or wearable devices, which either require arduous deployment or arouse privacy concerns. In this paper, we propose WiAct, a passive WiFi-based human activity recognition system, which explores the correlations between body movement and the amplitude information in Channel State Information (CSI) to classify different activities. The system designs a novel Adaptive Activity Cutting Algorithm (AACA) based on the difference in signal variance between the action and non-action parts, which adjusts the threshold adaptively to achieve the best trade-off between performance and robustness. The Doppler shift correlation value is used as classification features, which is extracted by using the correlation of the WiFi device’s antennas. Extreme Learning Machine (ELM) is utilized for activity data classification because of its strong generalization ability and fast learning speed. We implement the WiAct prototype using commercial WiFi equipment and evaluate its performance in real-world environments. In the evaluation, WiAct achieves an average accuracy of 94.2% for distinguishing ten actions. We compare different experimental conditions and classification methods, and the results demonstrate its robustness.

Journal ArticleDOI
TL;DR: A survey of recent advances in WiFi vision problems, i.e., sensing, recognition, and detection by utilizing the channel state information (CSI) of the commodity WiFi devices, focuses on nine key applications of smart environments.
Abstract: Indoor human sensing, recognition, and detection, as key enablers of building smart environments, such as smart home, smart retail, and smart museum, have gained tremendous attention in recent years. Compared with traditional vision-based and wearable sensor-based solutions, radio-frequency (RF)-based approaches are more desirable with the contactless and nonline-of-sight nature. Among all RF-based approaches, WiFi-based approaches have been the focus of many researchers because of the ubiquitous availability and cost efficiency. In this article, we present a survey of recent advances in WiFi vision problems, i.e., sensing, recognition, and detection by utilizing the channel state information (CSI) of the commodity WiFi devices. We focus on nine key applications of smart environments, including WiFi imaging, vital sign monitoring, human identification, gesture recognition, gait recognition, daily activity recognition, fall detection, human detection, and indoor positioning. Such a survey can help readers have an overall understanding of sensing, recognition, and detection with commodity WiFi, and thus expedite the development of smart environments.

Journal ArticleDOI
14 Apr 2020-Sensors
TL;DR: This paper proposes an approach to recognize physical activities using only 2-axes of the smartphone accelerometer sensor and investigates the effectiveness and contribution of each axis of the accelerometer in the recognition of physical activities.
Abstract: Recognizing human physical activities from streaming smartphone sensor readings is essential for the successful realization of a smart environment. Physical activity recognition is one of the active research topics to provide users the adaptive services using smart devices. Existing physical activity recognition methods lack in providing fast and accurate recognition of activities. This paper proposes an approach to recognize physical activities using only2-axes of the smartphone accelerometer sensor. It also investigates the effectiveness and contribution of each axis of the accelerometer in the recognition of physical activities. To implement our approach, data of daily life activities are collected labeled using the accelerometer from 12 participants. Furthermore, three machine learning classifiers are implemented to train the model on the collected dataset and in predicting the activities. Our proposed approach provides more promising results compared to the existing techniques and presents a strong rationale behind the effectiveness and contribution of each axis of an accelerometer for activity recognition. To ensure the reliability of the model, we evaluate the proposed approach and observations on standard publicly available dataset WISDM also and provide a comparative analysis with state-of-the-art studies. The proposed approach achieved 93% weighted accuracy with Multilayer Perceptron (MLP) classifier, which is almost 13% higher than the existing methods.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this article, an actor-transformer model is proposed to learn and selectively extract information relevant for group activity recognition, which achieves state-of-the-art results on two publicly available benchmarks.
Abstract: This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-specific static and dynamic representations expressed by features from a 2D pose network and 3D CNN, respectively. We empirically study different ways to combine these representations and show their complementary benefits. Experiments show what is important to transform and how it should be transformed. What is more, actor-transformers achieve state-of-the-art results on two publicly available benchmarks for group activity recognition, outperforming the previous best published results by a considerable margin

Journal ArticleDOI
TL;DR: A novel attentive semantic recurrent neural network (RNN), namely, stagNet, is presented for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling.
Abstract: In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the “factor sharing” and “message passing” mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.

Journal ArticleDOI
TL;DR: A new human activity recognition method with two-stage end-to-end convolutional neural network and a data augmentation method that achieves significantly improved recognition accuracy and reduced computational complexity.
Abstract: Human activity recognition has been widely used in healthcare applications such as elderly monitoring, exercise supervision, and rehabilitation monitoring. Compared with other approaches, sensor-based wearable human activity recognition is less affected by environmental noise and therefore is promising in providing higher recognition accuracy. However, one of the major issues of existing wearable human activity recognition methods is that although the average recognition accuracy is acceptable, the recognition accuracy for some activities (e.g., ascending stairs and descending stairs) is low, mainly due to relatively less training data and complex behavior pattern for these activities. Another issue is that the recognition accuracy is low when the training data from the test subject are limited, which is a common case in real practice. In addition, the use of neural network leads to large computational complexity and thus high power consumption. To address these issues, we proposed a new human activity recognition method with two-stage end-to-end convolutional neural network and a data augmentation method. Compared with the state-of-the-art methods (including neural network based methods and other methods), the proposed methods achieve significantly improved recognition accuracy and reduced computational complexity.

Journal ArticleDOI
04 Sep 2020
TL;DR: IMUTube is introduced, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data that improves the performance of a variety of models on known HAR datasets.
Abstract: The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways that we outline. This should lead to on-body, sensor-based HAR becoming yet another success story in large-dataset breakthroughs in recognition.

Journal ArticleDOI
TL;DR: The survey aims to provide a systematic categorization and comparison framework of the state-of-the-art that drives the discussion to important open research challenges and future directions of multi-user activity recognition.

Journal ArticleDOI
20 May 2020-Entropy
TL;DR: A human activity recognition model that acquires signal data from motion node sensors including inertial sensors, i.e., gyroscopes and accelerometers is presented, which outperformed existing well-known statistical state-of-the-art methods by achieving an improved recognition accuracy.
Abstract: Advancements in wearable sensors technologies provide prominent effects in the daily life activities of humans. These wearable sensors are gaining more awareness in healthcare for the elderly to ensure their independent living and to improve their comfort. In this paper, we present a human activity recognition model that acquires signal data from motion node sensors including inertial sensors, i.e., gyroscopes and accelerometers. First, the inertial data is processed via multiple filters such as Savitzky-Golay, median and hampel filters to examine lower/upper cutoff frequency behaviors. Second, it extracts a multifused model for statistical, wavelet and binary features to maximize the occurrence of optimal feature values. Then, adaptive moment estimation (Adam) and AdaDelta are introduced in a feature optimization phase to adopt learning rate patterns. These optimized patterns are further processed by the maximum entropy Markov model (MEMM) for empirical expectation and highest entropy, which measure signal variances for outperformed accuracy results. Our model was experimentally evaluated on University of Southern California Human Activity Dataset (USC-HAD) as a benchmark dataset and on an Intelligent Mediasporting behavior (IMSB), which is a new self-annotated sports dataset. For evaluation, we used the "leave-one-out" cross validation scheme and the results outperformed existing well-known statistical state-of-the-art methods by achieving an improved recognition accuracy of 91.25%, 93.66% and 90.91% when compared with USC-HAD, IMSB, and Mhealth datasets, respectively. The proposed system should be applicable to man-machine interface domains, such as health exercises, robot learning, interactive games and pattern-based surveillance.

Proceedings ArticleDOI
06 Jul 2020
TL;DR: This work addresses the issue of fusion of different modalities in the context of human activity recognition, making use of a state-of-the-art convolutional network architecture (Inception I3D) and a huge dataset (NTU RGB+D).
Abstract: Combining machine learning in neural networks with multimodal fusion strategies offers an interesting potential for classification tasks but the optimum fusion strategies for many applications have yet to be determined. Here we address this issue in the context of human activity recognition, making use of a state-of-the-art convolutional network architecture (Inception I3D) and a huge dataset (NTU RGB+D). As modalities we consider RGB video, optical flow, and skeleton data. We determine whether the fusion of different modalities can provide an advantage as compared to uni-modal approaches, and whether a more complex early fusion strategy can outperform the simpler late-fusion strategy by making use of statistical correlations between the different modalities. Our results show a clear performance improvement by multi-modal fusion and a substantial advantage of an early fusion strategy,