scispace - formally typeset
Search or ask a question

Showing papers on "Activity recognition published in 2015"


Proceedings Article
25 Jul 2015
TL;DR: This method adopts a deep convolutional neural networks (CNN) to automate feature learning from the raw inputs in a systematic way and makes it outperform other HAR algorithms, as verified in the experiments on the Opportunity Activity Recognition Challenge and other benchmark datasets.
Abstract: This paper focuses on human activity recognition (HAR) problem, in which inputs are multichannel time series signals acquired from a set of bodyworn inertial sensors and outputs are predefined human activities. In this problem, extracting effective features for identifying activities is a critical but challenging task. Most existing work relies on heuristic hand-crafted feature design and shallow feature learning architectures, which cannot find those distinguishing features to accurately classify different activities. In this paper, we propose a systematic feature learning method for HAR problem. This method adopts a deep convolutional neural networks (CNN) to automate feature learning from the raw inputs in a systematic way. Through the deep architecture, the learned features are deemed as the higher level abstract representation of low level raw time series signals. By leveraging the labelled information via supervised learning, the learned features are endowed with more discriminative power. Unified in one model, feature learning and classification are mutually enhanced. All these unique advantages of the CNN make it outperform other HAR algorithms, as verified in the experiments on the Opportunity Activity Recognition Challenge and other benchmark datasets.

894 citations


Proceedings ArticleDOI
01 Nov 2015
TL;DR: It is indicated that on-device sensor and sensor handling heterogeneities impair HAR performances significantly and a novel clustering-based mitigation technique suitable for large-scale deployment of HAR is proposed, where heterogeneity of devices and their usage scenarios are intrinsic.
Abstract: The widespread presence of motion sensors on users' personal mobile devices has spawned a growing research interest in human activity recognition (HAR). However, when deployed at a large-scale, e.g., on multiple devices, the performance of a HAR system is often significantly lower than in reported research results. This is due to variations in training and test device hardware and their operating system characteristics among others. In this paper, we systematically investigate sensor-, device- and workload-specific heterogeneities using 36 smartphones and smartwatches, consisting of 13 different device models from four manufacturers. Furthermore, we conduct experiments with nine users and investigate popular feature representation and classification techniques in HAR research. Our results indicate that on-device sensor and sensor handling heterogeneities impair HAR performances significantly. Moreover, the impairments vary significantly across devices and depends on the type of recognition technique used. We systematically evaluate the effect of mobile sensing heterogeneities on HAR and propose a novel clustering-based mitigation technique suitable for large-scale deployment of HAR, where heterogeneity of devices and their usage scenarios are intrinsic.

561 citations


Proceedings ArticleDOI
13 Oct 2015
TL;DR: This work assembles signal sequences of accelerometers and gyroscopes into a novel activity image, which enables Deep Convolutional Neural Networks (DCNN) to automatically learn the optimal features from the activity image for the activity recognition task.
Abstract: Human physical activity recognition based on wearable sensors has applications relevant to our daily life such as healthcare. How to achieve high recognition accuracy with low computational cost is an important issue in the ubiquitous computing. Rather than exploring handcrafted features from time-series sensor signals, we assemble signal sequences of accelerometers and gyroscopes into a novel activity image, which enables Deep Convolutional Neural Networks (DCNN) to automatically learn the optimal features from the activity image for the activity recognition task. Our proposed approach is evaluated on three public datasets and it outperforms state-of-the-arts in terms of recognition accuracy and computational cost.

496 citations


Journal ArticleDOI
19 Jan 2015-Sensors
TL;DR: This paper reviews the studies done so far that implement activity recognition systems on mobile phones and use only their on-board sensors, and discusses their limitations and present various recommendations for future research.
Abstract: Physical activity recognition using embedded sensors has enabled many context-aware applications in different areas, such as healthcare Initially, one or more dedicated wearable sensors were used for such applications However, recently, many researchers started using mobile phones for this purpose, since these ubiquitous devices are equipped with various sensors, ranging from accelerometers to magnetic field sensors In most of the current studies, sensor data collected for activity recognition are analyzed offline using machine learning tools However, there is now a trend towards implementing activity recognition systems on these devices in an online manner, since modern mobile phones have become more powerful in terms of available resources, such as CPU, memory and battery The research on offline activity recognition has been reviewed in several earlier studies in detail However, work done on online activity recognition is still in its infancy and is yet to be reviewed In this paper, we review the studies done so far that implement activity recognition systems on mobile phones and use only their on-board sensors We discuss various aspects of these studies Moreover, we discuss their limitations and present various recommendations for future research

452 citations


Journal ArticleDOI
TL;DR: This work proposes a categorization of human activity methodologies and divides human activity classification methods into two large categories according to whether they use data from different modalities or not, and examines the requirements for an ideal human activity recognition dataset.
Abstract: Recognizing human activities from video sequences or still images is a challenging task due to problems such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance. Many applications, including video surveillance systems, human-computer interaction, and robotics for human behavior characterization, require a multiple activity recognition system. In this work, we provide a detailed review of recent and state-of-the-art research advances in the field of human activity classification. We propose a categorization of human activity methodologies and discuss their advantages and limitations. In particular, we divide human activity classification methods into two large categories according to whether they use data from different modalities or not. Then, each of these categories is further analyzed into sub-categories, which reflect how they model human activities and what type of activities they are interested in. Moreover, we provide a comprehensive analysis of the existing, publicly available human activity classification datasets and examine the requirements for an ideal human activity recognition dataset. Finally, we report the characteristics of future research directions and present some open issues on human activity recognition.

395 citations


Journal ArticleDOI
07 Jun 2015
TL;DR: This paper finds that features from different channels could share some similar hidden structures, and proposes a joint learning model to simultaneously explore the shared and feature-specific components as an instance of heterogeneous multi-task learning for RGB-D activity recognition.
Abstract: In this paper, we focus on heterogeneous features learning for RGB-D activity recognition. We find that features from different channels (RGB, depth) could share some similar hidden structures, and then propose a joint learning model to simultaneously explore the shared and feature-specific components as an instance of heterogeneous multi-task learning. The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets. To efficiently train the joint model, a three-step iterative optimization algorithm is proposed, followed by a simple inference model. Extensive experimental results on four activity datasets have demonstrated the efficacy of the proposed method. A new RGB-D activity dataset focusing on human-object interaction is further contributed, which presents more challenges for RGB-D activity benchmarking.

387 citations


Proceedings ArticleDOI
09 Nov 2015
TL;DR: This work focuses its presentation and experimental analysis on a hybrid CNN-RNN architecture for facial expression analysis that can outperform a previously applied CNN approach using temporal averaging for aggregation.
Abstract: Deep learning based approaches to facial analysis and video analysis have recently demonstrated high performance on a variety of key tasks such as face recognition, emotion recognition and activity recognition. In the case of video, information often must be aggregated across a variable length sequence of frames to produce a classification result. Prior work using convolutional neural networks (CNNs) for emotion recognition in video has relied on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial aggregation of information. Recurrent neural networks (RNNs) have seen an explosion of recent interest as they yield state-of-the-art performance on a variety of sequence analysis tasks. RNNs provide an attractive framework for propagating information over a sequence using a continuous valued hidden layer representation. In this work we present a complete system for the 2015 Emotion Recognition in the Wild (EmotiW) Challenge. We focus our presentation and experimental analysis on a hybrid CNN-RNN architecture for facial expression analysis that can outperform a previously applied CNN approach using temporal averaging for aggregation.

328 citations


Journal ArticleDOI
TL;DR: Experimental results show that the solution outperforms four relevant works based on RGB-D image fusion, hierarchical Maximum Entropy Markov Model, Markov Random Fields, and Eigenjoints, respectively, and the ability to recognize the activities in real time show promise for applied use.
Abstract: In this paper, we present a method for recognizing human activities using information sensed by an RGB-D camera, namely the Microsoft Kinect. Our approach is based on the estimation of some relevant joints of the human body by means of the Kinect; three different machine learning techniques, i.e., K-means clustering, support vector machines, and hidden Markov models, are combined to detect the postures involved while performing an activity, to classify them, and to model each activity as a spatiotemporal evolution of known postures. Experiments were performed on Kinect Activity Recognition Dataset, a new dataset, and on CAD-60, a public dataset. Experimental results show that our solution outperforms four relevant works based on RGB-D image fusion , hierarchical Maximum Entropy Markov Model , Markov Random Fields , and Eigenjoints , respectively. The performance we achieved, i.e., precision/recall of 77.3% and 76.7%, and the ability to recognize the activities in real time show promise for applied use.

292 citations


Proceedings ArticleDOI
01 Oct 2015
TL;DR: This paper constructs a CNN model and modify the convolution kernel to adapt the characteristics of tri-axial acceleration signals, and shows that the CNN works well, which can reach an average accuracy of 93.8% without any feature extraction methods.
Abstract: In this paper, we propose an acceleration-based human activity recognition method using popular deep architecture, Convolution Neural Network (CNN). In particular, we construct a CNN model and modify the convolution kernel to adapt the characteristics of tri-axial acceleration signals. Also, for comparison, we use some widely used methods to accomplish the recognition task on the same dataset. The large dataset we constructed consists of 31688 samples from eight typical activities. The experiment results show that the CNN works well, which can reach an average accuracy of 93.8% without any feature extraction methods.

291 citations


Proceedings Article
25 Jul 2015
TL;DR: This paper presents a novel approach for complex activity recognition comprising of two components: temporal pattern mining, which provides a mid-level feature representation for activities, encodes temporal relatedness among actions, and captures the intrinsic properties of activities.
Abstract: As compared to simple actions, activities are much more complex, but semantically consistent with a human's real life. Techniques for action recognition from sensor generated data are mature. However, there has been relatively little work on bridging the gap between actions and activities. To this end, this paper presents a novel approach for complex activity recognition comprising of two components. The first component is temporal pattern mining, which provides a mid-level feature representation for activities, encodes temporal relatedness among actions, and captures the intrinsic properties of activities. The second component is adaptive Multi-Task Learning, which captures relatedness among activities and selects discriminant features. Extensive experiments on a real-world dataset demonstrate the effectiveness of our work.

291 citations


Proceedings ArticleDOI
12 Feb 2015
TL;DR: Preliminary answers to how the field of mobile sensing can best make use of advances in deep learning towards robust and efficient sensor inference are provided by prototyping a low-power Deep Neural Network inference engine that exploits both the CPU and DSP of a mobile device SoC.
Abstract: Sensor-equipped smartphones and wearables are transforming a variety of mobile apps ranging from health monitoring to digital assistants However, reliably inferring user behavior and context from noisy and complex sensor data collected under mobile device constraints remains an open problem, and a key bottleneck to sensor app development In recent years, advances in the field of deep learning have resulted in nearly unprecedented gains in related inference tasks such as speech and object recognition However, although mobile sensing shares many of the same data modeling challenges, we have yet to see deep learning be systematically studied within the sensing domain If deep learning could lead to significantly more robust and efficient mobile sensor inference it would revolutionize the field by rapidly expanding the number of sensor apps ready for mainstream usage In this paper, we provide preliminary answers to this potentially game-changing question by prototyping a low-power Deep Neural Network (DNN) inference engine that exploits both the CPU and DSP of a mobile device SoC We use this engine to study typical mobile sensing tasks (eg, activity recognition) using DNNs, and compare results to learning techniques in more common usage Our early findings provide illustrative examples of DNN usage that do not overburden modern mobile hardware, while also indicating how they can improve inference accuracy Moreover, we show DNNs can gracefully scale to larger numbers of inference classes and can be flexibly partitioned across mobile and remote resources Collectively, these results highlight the critical need for further exploration as to how the field of mobile sensing can best make use of advances in deep learning towards robust and efficient sensor inference

Journal ArticleDOI
TL;DR: An overview of state-of-the-art methods in activity recognition using semantic features, including a semantic space including the most popular semantic features of an action namely the human body, attributes, related objects, and scene context, is presented.

Journal ArticleDOI
01 Dec 2015
TL;DR: The power of ensemble of classifiers approach for accelerometer-based activity recognition is explored and a novel activity prediction model based on machine learning classifiers is built and provides better performance than MLP-based recognition approach suggested in previous study.
Abstract: Proposed activity recognition approach. We propose and validate a novel activity recognition model.We examine the power of ensemble of classifiers approach experimentally.The model uses J48, Logistic Regression, and MLP.Proposed recognition model is superior to MLP-based recognition model suggested in a previous study.We suggest researchers to focus on ensemble of classifiers approach for activity recognition. Activity recognition aims to detect the physical activities such as walking, sitting, and jogging performed by humans. With the widespread adoption and usage of mobile devices in daily life, several advanced applications of activity recognition were implemented and distributed all over the world. In this study, we explored the power of ensemble of classifiers approach for accelerometer-based activity recognition and built a novel activity prediction model based on machine learning classifiers. Our approach utilizes from J48 decision tree, Multi-Layer Perceptrons (MLP) and Logistic Regression techniques and combines these classifiers with the average of probabilities combination rule. Publicly available activity recognition dataset known as WISDM (Wireless Sensor Data Mining) which includes information from thirty six users was used during the experiments. According to the experimental results, our model provides better performance than MLP-based recognition approach suggested in previous study. These results strongly suggest researchers applying ensemble of classifiers approach for activity recognition problem.

Posted Content
TL;DR: This paper shows that deep activity recognition models provide better recognition accuracy of human activities, and avoid the expensive design of handcrafted features in existing systems, and utilize the massive unlabeled acceleration samples for unsupervised feature extraction.
Abstract: Despite the widespread installation of accelerometers in almost all mobile phones and wearable devices, activity recognition using accelerometers is still immature due to the poor recognition accuracy of existing recognition methods and the scarcity of labeled training data. We consider the problem of human activity recognition using triaxial accelerometers and deep learning paradigms. This paper shows that deep activity recognition models (a) provide better recognition accuracy of human activities, (b) avoid the expensive design of handcrafted features in existing systems, and (c) utilize the massive unlabeled acceleration samples for unsupervised feature extraction. Moreover, a hybrid approach of deep learning and hidden Markov models (DL-HMM) is presented for sequential activity recognition. This hybrid approach integrates the hierarchical representations of deep activity recognition models with the stochastic modeling of temporal sequences in the hidden Markov models. We show substantial recognition improvement on real world datasets over state-of-the-art methods of human activity recognition using triaxial accelerometers.

Book ChapterDOI
09 Nov 2015
TL;DR: This paper proposes an architecture of convnets with sensor data gathered from smartphone sensors to recognize activities and shows that convnet outperforms all the other state-of-the-art techniques in HAR, especially SVM, which achieved the previous best result for the data set.
Abstract: Human activity recognition (HAR) using smartphone sensors utilize time-series, multivariate data to detect activities. Time-series data have inherent local dependency characteristics. Moreover, activities tend to be hierarchical and translation invariant in nature. Consequently, convolutional neural networks (convnet) exploit these characteristics, which make it appropriate in dealing with time-series sensor data. In this paper, we propose an architecture of convnets with sensor data gathered from smartphone sensors to recognize activities. Experiments show that increasing the number of convolutional layers increases performance, but the complexity of the derived features decreases with every additional layer. Moreover, preserving the information passed from layer to layer is more important, as opposed to blindly increasing the hyperparameters to improve performance. The convnet structure can also benefit from a wider filter size and lower pooling size setting. Lastly, we show that convnet outperforms all the other state-of-the-art techniques in HAR, especially SVM, which achieved the previous best result for the data set.

Journal ArticleDOI
TL;DR: The goal of this research is to investigate the prospect of using built-in smartphone sensors as ubiquitous multi-modal data collection and transmission nodes in order to detect detailed construction equipment activities which can ultimately contribute to the process of simulation input modeling.

Journal ArticleDOI
17 Apr 2015-PLOS ONE
TL;DR: Smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients, and common features were identified for all three populations, although the stroke population subset had some differences from both can-bodied and elderly sets.
Abstract: Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naive Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

Journal ArticleDOI
TL;DR: This paper proposes a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras and shows that the proposed approach outperforms several single-task and multitask learning methods.
Abstract: Recognizing human activities from videos is a fundamental research problem in computer vision. Recently, there has been a growing interest in analyzing human behavior from data collected with wearable cameras. First-person cameras continuously record several hours of their wearers’ life. To cope with this vast amount of unlabeled and heterogeneous data, novel algorithmic solutions are required. In this paper, we propose a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras. Our intuition is that, even if the data are not annotated, it is possible to exploit the fact that the tasks of recognizing everyday activities of multiple individuals are related, since typically people perform the same actions in similar environments, e.g., people working in an office often read and write documents). In our framework, rather than clustering data from different users separately, we propose to look for clustering partitions which are coherent among related tasks. In particular, two novel multitask clustering algorithms, derived from a common optimization problem, are introduced. Our experimental evaluation, conducted both on synthetic data and on publicly available first-person vision data sets, shows that the proposed approach outperforms several single-task and multitask learning methods.

Proceedings ArticleDOI
02 Mar 2015
TL;DR: An algorithm to recognize human activities targeting the camera from streaming videos is presented, enabling the robot to predict intended activities of the interacting person as early as possible and take fast reactions to such activities (e.g., avoiding harmful events targeting itself before they actually occur).
Abstract: In this paper, we present a core technology to enable robot recognition of human activities during human-robot interactions. In particular, we propose a methodology for early recognition of activities from robot-centric videos (i.e., first-person videos) obtained from a robot's viewpoint during its interaction with humans. Early recognition, which is also known as activity prediction, is an ability to infer an ongoing activity at its early stage. We present an algorithm to recognize human activities targeting the camera from streaming videos, enabling the robot to predict intended activities of the interacting person as early as possible and take fast reactions to such activities (e.g., avoiding harmful events targeting itself before they actually occur). We introduce the novel concept of'onset' that efficiently summarizes pre-activity observations, and design a recognition approach to consider event history in addition to visual features from first-person videos. We propose to represent an onset using a cascade histogram of time series gradients, and we describe a novel algorithmic setup to take advantage of such onset for early recognition of activities. The experimental results clearly illustrate that the proposed concept of onset enables better/earlier recognition of human activities from first-person videos collected with a robot. Categories and Subject Descriptors I.2.10 [Artificial Intelligence]: Vision and Scene Understanding–video analysis; I.4.8 [Image Processing and Computer Vision]: Scene Analysis-motion; I.2.9 [Artificial Intelligence]: Robotics–sensors

Journal ArticleDOI
TL;DR: An overview of existing approaches and current practices for activity recognition in multioccupant smart homes is provided, which presents the latest developments and highlights the open issues in this field.
Abstract: Human activity recognition in ambient intelligent environments like homes, offices, and classrooms has been the center of a lot of research for many years now. The aim is to recognize the sequence of actions by a specific person using sensor readings. Most of the research has been devoted to activity recognition of single occupants in the environment. However, living environments are usually inhabited by more than one person and possibly with pets. Hence, human activity recognition in the context of multioccupancy is more general, but also more challenging. The difficulty comes from mainly two aspects: resident identification, known as data association, and diversity of human activities. The present survey article provides an overview of existing approaches and current practices for activity recognition in multioccupant smart homes. It presents the latest developments and highlights the open issues in this field.

Journal ArticleDOI
01 Feb 2015
TL;DR: A novel near real-time sensor segmentation approach that incorporates the notions of both sensor and time correlation is presented.
Abstract: Activity recognition is fundamental to many of the services envisaged in pervasive computing and ambient intelligence scenarios. However, delivering sufficiently robust activity recognition systems that could be deployed with confidence in an arbitrary real-world setting remains an outstanding challenge. Developments in wireless, inexpensive and unobtrusive sensing devices have enabled the capture of large data volumes, upon which a variety of machine learning techniques have been applied in order to facilitate interpretation of and inference upon such data. Much of the documented research in this area has in the main focused on recognition across pre-segmented sensor data. Such approaches are insufficient for near real-time analysis as is required for many services, such as those envisaged by ambient assisted living. This paper presents a novel near real-time sensor segmentation approach that incorporates the notions of both sensor and time correlation.

Proceedings ArticleDOI
09 Jun 2015
TL;DR: The designed A-Wristocracy system improves upon the state-of-the-art works on in-home activity recognition using wearables and makes it feasible to classify large number of fine-grained and complex activities, through Deep Learning based data analytics and exploiting multi-modal sensing on wrist-worn device.
Abstract: In this work we present A-Wristocracy, a novel framework for recognizing very fine-grained and complex inhome activities of human users (particularly elderly people) with wrist-worn device sensing. Our designed A-Wristocracy system improves upon the state-of-the-art works on in-home activity recognition using wearables. These works are mostly able to detect coarse-grained ADLs (Activities of Daily Living) but not large number of fine-grained and complex IADLs (Instrumental Activities of Daily Living). These are also not able to distinguish similar activities but with different context (such as sit on floor vs. sit on bed vs. sit on sofa). Our solution helps accurate detection of in-home ADLs/ IADLs and contextual activities, which are all critically important for remote elderly care in tracking their physical and cognitive capabilities. A-Wristocracy makes it feasible to classify large number of fine-grained and complex activities, through Deep Learning based data analytics and exploiting multi-modal sensing on wrist-worn device. It exploits minimal functionality from very light additional infrastructure (through only few Bluetooth beacons), for coarse level location context. A-Wristocracy preserves direct user privacy by excluding camera/ video imaging on wearable or infrastructure. The classification procedure consists of practical feature set extraction from multi-modal wearable sensor suites, followed by Deep Learning based supervised fine-level classification algorithm. We have collected exhaustive home-based ADLs and IADLs data from multiple users. Our designed classifier is validated to be able to recognize very fine-grained complex 22 daily activities (much larger number than 6–12 activities detected by state-of-the-art works using wearable and no camera/ video) with high average test accuracies of 90% or more for two users in two different home environments.

Journal ArticleDOI
21 May 2015-Sensors
TL;DR: The experimental results show that the best combination of a pattern clustering method and an activity decision algorithm provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms.
Abstract: This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment The activity recognition is usually done through two steps: activity pattern clustering and activity type decision Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home

Proceedings ArticleDOI
24 Mar 2015
TL;DR: This paper addresses shape and motion features approach to observe, track and recognize human silhouettes using a sequence of RGB-D images and shows significant recognition results over the state of the art algorithms.
Abstract: Recent development in depth sensors opens up new challenging task in the field of computer vision research areas, including human-computer interaction, computer games and surveillance systems. This paper addresses shape and motion features approach to observe, track and recognize human silhouettes using a sequence of RGB-D images. Under our proposed activity recognition framework, the required procedure includes: detecting human silhouettes from the image sequence, we remove noisy effects from background and track human silhouettes using temporal continuity constraints of human motion information for each activity, extracting the shape and motion features to identify richer motion information and then these features are clustered and fed into Hidden Markov Model (HMM) to train, model and recognize human activities based on transition and emission probabilities values. During experimental results, we demonstrate this approach on two challenging depth video datasets: one based on our own annotated database and other based on public database (i.e., MSRAction3D). Our approach shows significant recognition results over the state of the art algorithms.

Proceedings ArticleDOI
15 Sep 2015
TL;DR: An architecture based on head-and eye-tracking data is introduced in this study and several features are analyzed, showing promising results towards in-vehicle driver-activity recognition.
Abstract: This paper presents a novel approach to automated recognition of the driver's activity, which is a crucial factor for determining the take-over readiness in conditionally autonomous driving scenarios. Therefore, an architecture based on head-and eye-tracking data is introduced in this study and several features are analyzed. The proposed approach is evaluated on data recorded during a driving simulator study with 73 subjects performing different secondary tasks while driving in an autonomous setting. The proposed architecture shows promising results towards in-vehicle driver-activity recognition. Furthermore, a significant improvement in the classification performance is demonstrated due to the consideration of novel features derived especially for the autonomous driving context.

Journal ArticleDOI
TL;DR: Radio based methods utilize wireless transceivers in environments as infrastructure, exploit radio communication characters to achieve high recognition accuracy, reduce energy cost and preserve user privacy, as well as other radio based activity recognition methods.

Journal ArticleDOI
TL;DR: A novel phone-based dynamic recognition framework with evolving data streams for activity recognition that incorporates incremental and active learning for real-time recognition and adaptation in streaming settings.

Journal ArticleDOI
TL;DR: A recursively defined multilayered activity model to represent four types of activities and employ a shapelet-based framework to recognize various activities represented in the model and the experimental results show that the approach is capable of handling complex activity effectively.
Abstract: We exploit time series shapelets for complex human activity recognition.We present a multilayered activity model to represent four types of activities.We implement a prototype system based on smartphone for human activity recognition.Daily living and basketball play activity recognition are conducted for evaluation. Human activity recognition can be exploited to benefit ubiquitous applications using sensors. Current research on sensor-based activity recognition is mainly using data-driven or knowledge-driven approaches. In terms of complex activity recognition, most data-driven approaches suffer from portability, extensibility and interpretability problems, whilst knowledge-driven approaches are often weak in handling intricate temporal data. To address these issues, we exploit time series shapelets for complex human activity recognition. In this paper, we first describe the association between activity and time series transformed from sensor data. Then, we present a recursively defined multilayered activity model to represent four types of activities and employ a shapelet-based framework to recognize various activities represented in the model. A prototype system was implemented to evaluate our approach on two public datasets. We also conducted two real-world case studies for system evaluation: daily living activity recognition and basketball play activity recognition. The experimental results show that our approach is capable of handling complex activity effectively. The results are interpretable and accurate, and our approach is fast and energy-efficient in real-time.

Journal ArticleDOI
TL;DR: This paper presents a novel Knowledge-driven approach for Concurrent Activity Recognition (KCAR), which exploits the Pyramid Match Kernel, with a strength in approximate matching on hierarchical concepts, to recognise activities of varying grained constraints from a potentially noisy sensor sequence.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This work proposes a novel active learning technique which not only exploits the informativeness of the individual activity instances but also utilizes their contextual information during the query selection process, this leads to significant reduction in expensive manual annotation effort.
Abstract: Activity recognition in video has recently benefited from the use of the context e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled and entirely available at the outset. In contrast, we formulate a continuous learning framework for context aware activity recognition from unlabeled video data which has two distinct advantages over most existing methods. First, we propose a novel active learning technique which not only exploits the informativeness of the individual activity instances but also utilizes their contextual information during the query selection process, this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field (CRF) model that encodes the context and devise an information theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative query instances, which need to be labeled by a human. These labels are combined with graphical inference techniques for incrementally updating the model as new videos come in. Experiments on four challenging datasets demonstrate that our framework achieves superior performance with significantly less amount of manual labeling.