A framework for mobile activity recognition

doi:10.14264/UQL.2017.608

Dissertation•DOI•

A framework for mobile activity recognition

22 May 2017-

TL;DR: A hybrid method that integrates Latent Dirichlet Allocation with conventional classifiers for learning a generic activity model with minimum annotated data is proposed and a framework for low-level activity recognition with dynamically available sensors is proposed.

read less

Abstract: Activity recognition is being applied in an increasing number of applications. They include health monitoring of the elderly, discovery of frequent behavioural patterns, monitoring of daily life activities (e.g. eating, tooth brushing, sleeping), and analysis of exercise activities (e.g. swimming, running). Current approaches for activity recognition usually use the process of data preprocessing, feature extraction, activity model learning and activity recognition. Most of the previous research pipeline these steps and create static models for processing activity data and recognizing activities. The static models have predefined data sources that are tightly coupled with the models and never change once the models are created. However, the static models are unable to deal with sensor failures and sensor replacements that are quite common in real scenarios. Moreover, additional information provided by newly available data sources from dynamically discovered new sensors may potentially refine the activity model if this information can discriminatively characterize a specific activity class. However, the static models cannot leverage this additional information for self-refinement due to the static assumption of data sources. The primary goal of our research is to design and develop frameworks for activity recognition with dynamically available data sources, and propose and develop algorithms for activity model adaptation with the additional information provided by those data sources. In this thesis, we first provide a critical literature review in the areas of contexts modelling, context management, sensor modelling and sensors in mobile devices, activity recognition, activity model retraining and adaptation, and sensor dynamics in activity recognition. We then present the research on our activity recognition framework that makes the following key contributions. First, we propose a hybrid method that integrates Latent Dirichlet Allocation with conventional classifiers for learning a generic activity model with minimum annotated data. The hybrid method is able to alleviate the problem of data sparsity and requires a little amount of labelled activity data. Furthermore, it can deal with different variants of activity patterns since it is created with activity data of multiple users. The generic activity modelling serves as the starting point of our activity model adaptation with dynamically available sensor data. However, it can also serve as an independent component for other applications such as activity personalization. Second, based on the generic model, we propose a framework for low-level activity (e.g. running, walking) recognition with dynamically available sensors. The components of the framework include a basic classifier, instance selection and smoothing. Firstly, we use AdaBoost as our basic classifier as it is flexible with feature dimensionality and it can automatically select the discriminative features during the learning process. Secondly, we propose to select the most informative instances for activity model adaptation in an unsupervised manner. The instances contain features of the new sensor data, and the information of new sensors are incorporated seamlessly through the adaptation process. Finally, we design smoothing methods by integrating the graphical models such as Hidden Markov Model and Conditional Random Field with the basic classifier AdaBoost. Finally, we propose a framework for high-level activity (e.g, making coffee) recognition with dynamically available contexts. We propose sensor and activity models to address sensor heterogeneity and populating contextual information. Knowledge-driven and data-driven methods are proposed for incorporating the new contexts. The knowledge-driven method specifies the parameters of the new contexts with external knowledge in an unsupervised manner, and the data-driven method learns the parameters of the new contexts with the users' data using the proposed learning-to-rank technique and temporal regularization. Extensive experiments and comprehensive comparisons demonstrate the effectiveness of the proposed frameworks.

...read moreread less

Summary (22 min read)

Jump to: [1.1 Motivation] – [1.2 Challenges] – [1.3 Thesis statement and contributions] – [1.3.1 Generic activity modelling] – [1.3.2 Physical activity recognition with dynamically available sensors] – [1.3.3 High-level activity recognition and adaptation with dynamically avail-] – [1.4 Thesis structure] – [Critical Literature Survey] – [2.1 Context modelling and management] – [2.1.1 Context modelling] – [2.1.2 Context management] – [2.1.3 Discussion] – [2.2.1 Sensor modelling] – [2.2.2 Sensors in mobile devices] – [2.2.3 Discussion] – [2.3 Sensor-based activity recognition] – [2.3.1 Data-driven methods] – [Feature selection] – [Supervised learning] – [Semi-supervised learning] – [Active learning] – [Transfer learning] – [Zero-shot learning] – [Unsupervised learning] – [Discussion] – [2.3.2 Knowledge-driven methods] – [Mining-based approach] – [Ontology-based approach] – [2.3.3 Hybrid methods] – [2.3.4 Other concerns] – [Robustness] – [Segmentation] – [Personalisation] – [Additional features] – [2.3.5 Activity recognition in specific domains] – [2.3.6 Discussion] – [2.4 Activity model retraining and adaptation] – [2.5 Sensor dynamics in activity recognition] – [3.1 Motivation] – [3.2 Latent Dirichlet Allocation] – [3.3 Conventional classifiers for posterior probability estima-] – [3.3.2 Decision tree] – [3.3.3 Random forest] – [3.3.4 Virtual evidence] – [3.4 Creating generic model] – [3.5.1 Datasets] – [3.5.2 Pre-analysis] – [3.5.3 Comparison] – [3.6 Summary] – [4.1 Motivation] – [4.2 Framework] – [4.3.1 Basic modelling] – [4.3.2 Belief propagation] – [4.3.3 Instances selection] – [Measurements] – [Retraining] – [4.3.4 Sequential prediction] – [BoostCRF] – [4.4 Experiment] – [4.4.2 Set up] – [4.4.3 Incorporating new context] – [4.4.4 Role of belief propagation] – [4.4.5 Role of graphical models] – [4.4.6 Investigation of the usefulness of extra context] – [4.5 Summary] – [5.1 Introduction] – [5.2 Context modelling] – [5.3 Problem definition] – [5.4 Knowledge-driven method] – [5.4.1 Knowledge base] – [5.4.2 Activity modelling] – [5.4.3 Activity prediction] – [5.5 Data-driven method] – [5.5.1 Activity recognition] – [5.5.2 Activity learning] – [Regularization] – [Parameter learning] – [Adaptation data selection] – [Activity model adaptation] – [5.6.1 Public dataset] – [5.6.2 Simulation dataset] – [5.6.3 Validation of knowledge-driven method] – [Validation method] – [Example] – [Experiment results] – [Impact of adaptation] – [Impact of adaptation data] – [Influence of regularization weight] and [5.7 Summary]

1.1 Motivation

Activity recognition has experienced its wide application in the past decades.
Recognizing human lifestyle can help to evaluate energy expenditure [3]; monitoring human activity in smart homes enables just-in-time activity guidance provisioning for elderly people and those suffering from cognitive deficiencies [20]; detecting walk and counting steps can help to monitor elderly health [15].
The static models cannot make use of those additional information as their data sources are pre-defined.
Sensor readings of low-level activities are not semantically interpretable, so in general machine learning methods are employed to map the features extracted from continuous sensor readings to target activities.
In contrast, high-level activities are characterised with contexts that are human readable, and those contexts are processed from sensor data.

1.2 Challenges

Incorporating information provided by dynamically available sensors for activity recognition and activity model adaptation is a non-trivial task, and there are several challenges that need to be addressed in developing such frameworks.
People perform activities differently due to their differences in physical conditions, age, etc.
While a generic activity model can be achieved by learning the model with labelled data, annotating a large amount of activity data is expensive and time-consuming 3. Challenge 2: How to perform the activity model adaptation to incorporate information provided by new sensors.
The knowledge-driven methods leverage the existing knowledge base or common sense to specify those parameters, while data-driven methods select the new instances to retrain the model and learn the parameters.
How to exploit the temporal information in human behaviour for both low- and high-level activity learning, activity model adaptation and activity recognition, also known as Challenge 6.

1.3 Thesis statement and contributions

The authors address aforementioned challenges and the shortcomings of the conventional activity recognition methods that assume pre-defined data sources and create static 4 CHAPTER 1: INTRODUCTION activity recognition models.
This thesis proposes to design and develop activity recognition frameworks that are able to perform activity model adaptation when new data sources become available.
The adaptation process integrates the new information into the frameworks, and hence refines the activity models in terms of recognition accuracy, scalability, robustness.
The contributions of this thesis are presented in more details in the following subsections.

1.3.1 Generic activity modelling

The authors design and develop a generic low-level activity modelling method that learns activity model with minimum labelled data of different users.
For the activity model to be generic, the authors learn the model with data from multiple users so that it is able to cope with variants of activity patterns.
This process is repeated until it converges.
The authors evaluate the proposed method with a large number of datasets, and show that it outperforms the supervised method and conventional semi-supervised method.
The authors also examine the factors (e.g. labelling percentage) that have impact on the recognition performance.

1.3.2 Physical activity recognition with dynamically available sensors

The authors develop a framework for low-level physical activity (e.g. walking, running) recognition with dynamically available sensors and semi-supervised learning method.
Since the sensor data from physical activities cannot be semantically interpreted, the authors incorporate the dynamically available sensor data by retraining the activity model with the selected instances that contain the features of the new sensor data.
Third, the authors design smoothing methods by integrating the graphical models such as Hidden Markov Model and Conditional Random Field with the basic classifier AdaBoost.
The smoothing methods leverage the temporal information embedded in the human activities that the current activity is more likely to be continued in the next time slot.
Finally, the authors investigate the conditions under which the opportunistically discovered sensors are beneficial to the recognition performance, they propose two hypotheses and validate them with controlled experiments.

1.3.3 High-level activity recognition and adaptation with dynamically avail-

The authors develop a framework for high-level activity recognition with dynamically discovered contexts.
There are different types of sensors in real environments, and even the sensor readings of the same types of sensors can be interpreted differently if they are used for different purposes.
The sensor models provide a description of how to process the sensor readings into proper contexts for recognizing activities, while the activity models interrelate the contexts with activities.
In the knowledge-driven method, the parameters of new contexts are obtained from third-party databases, while in the data-driven method the parameters are learned from the new activity data containing new contexts.
The temporal regularization is embedded into the activity learning process and it encourages the neighbouring instances to have the same activity class.

1.4 Thesis structure

The remainder of this thesis is structured as follows: Chapter 2 surveys the related work including context modelling, context management, sensor modelling, overview of sensors in mobile devices, activity recognition methods, activity model retraining and adaptation, sensor dynamic for activity recognition.
The authors discuss shortcomings of the existing approaches and analyse why they are inapplicable in dynamic environments.
Chapter 3 presents a generic activity modelling method that learns the generic activity model with minimum labelled data.
Chapter 4 describes and evaluates the framework of physical activity recognition and its adaptation with dynamically available sensor data.
Chapter 6 concludes the thesis with a summary of this thesis contributions and dis- cusses potential future research.

Critical Literature Survey

In order to create a framework for mobile activity recognition, several areas of previous related work need to be critically reviewed.
The reasons the authors review this related research are multifold.
Finally, the authors review the research on activity model adaptation and sensor dynamics in activity recognition, that are related to their work.
In what follows, the authors review several context modelling and management methods, followed by sensor modelling methods and the state-of-the-art sensors in mobile devices.
In this survey the authors discuss the advantages and shortcomings of the existing approaches and analyse why they are inapplicable in their case.

2.1 Context modelling and management

More than a decade of research on software engineering of context-aware applications has led to the approach that creates a context model for each context-aware application.
The context management system is used for gathering, preprocessing, and reasoning upon context information on behalf of the application based on this application’s context model.
This approach makes the design, development, and management of context-aware applications easier and allows to reuse the gathered context for multiple context-aware applications.
The approach is an extension of the distributed computing middleware - the middleware is extended by the context management system that allows application designers to design the context model (context information types required by the application, sensor readings preprocessing rules, and a logic for reasoning upon context information to recognize situations).
The authors review representative context modelling and management methods.

2.1.1 Context modelling

Fact based context modelling Henricksen et al. [47, 48] proposed a fact-based Context Modelling Language (CML).
Therefore in this approach, the context information used by the application is predefined and the situations that need to be reasoned upon are also static.
In the ontological approach, context information (e.g. activity) is organized into a hierarchical structure of classes, with each class being described by a number of properties.
The datatype properties specify the characteristics of the classes, and the object properties define the relationships between classes.
Therefore, the domain of hasDrinkType is MakeHotDrink and the range is the DrinkType class.

2.1.2 Context management

The general architecture of the context management system is illustrated in Figure 2.3.
In the middle layer, the components include the context models that describe context information types required by the application and a repository of context information that is gathered based on these models.
The individuals and the relations between them, which represent the current fact, are then put into ABox and ontologies reasoning is performed either to check knowledge consistency or to derive new knowledge (find the most specific class for the individual).
In order to deal with the heterogeneity of sensors, they use Process Chains to pre-process the data from context sources into required context information, which is supported by SensorML that specifies the information required for processing sensor readings.
To facilitate the dynamic discovery of context sources, the context management system introduces several managers, they include Context Source Manager which manages communication with/for context sources, provides sensor discovery, and registration and configuration services; Application Context Subscription Manager which stores context subscription defined by application designers; Reconfiguration Manager which performs cross-layer context mapping.

2.1.3 Discussion

Both of the context modelling and management approaches have their own disadvantages.
They focus on reliability and self-configuration of the system and do not address the problem of adapting the reasoning techniques due to dynamically discovered context sources, especially the ones of which the authors do not have prior knowledge.
In ontology based context managements, even though a probabilistic ontological framework [45] has been proposed to incorporate temporal information and data uncertainty, it is still vulnerable in realistic scenarios due to its guarantee of decidable reasoning procedures.
Moreover, the overhead of realization process should never be overlooked [13].
Moreover, the inference rules are pre-defined with the domain knowledge, hence the static inference engine makes them inapplicable in dynamic environments as they are not able to adapt situation reasoning to new types of context information gathered from newly available sensors.

2.2.1 Sensor modelling

Sensor modelling for autonomic context management system.
To facilitate the mapping between context information and sensor observations, they proposed a sensor context model that is able to capture both static features and dynamically changing information for a sensor.
To support opportunistic discovery of the sensor models, they advocate the use of the IEEE 1451 standard that allows the sensors to introduce themselves to external systems that they can communicate with.
First, the TEDS data is extracted and decoded with the TEDS template when a sensor is discovered through physical communication interfaces.
Chen et al. [20] propose to model sensor for unsupervised activity recognition in smart environments.

2.2.2 Sensors in mobile devices

The prevalence of smart phones has offered an unprecedented opportunity for mobile sensing as multiple sensors in smart phones can provide various context information.
Miluzzo et al. [101] present the CenceMe application based on off-the-shelf mobile phones.
The application leverages the on-board accelerometers, microphone, GPS and Bluetooth to infer users‘ activity and social interactions.
The authors discussion focuses on accelerometer, gyroscope, microphone, GPS, Bluetooth and WiFi access points.
The related works are categorized according to the sensors they used, and the corresponding applications are also described, as listed in Table 2.1.

2.2.3 Discussion

The authors review the sensor modelling methods and the applications of the sensors in mobile devices.
From this literature the authors can conclude that most of the previous works fo- 18 CHAPTER 2: CRITICAL LITERATURE SURVEY cus on on-board sensors of mobile devices such as GPS, microphone and accelerometer [76].
The architecture of the systems always includes sensing, learning and prediction, as shown in Figure 2.6, in which the context sources are tightly coupled with context information in the sensing system.
Some researchers assume a collaborative manner [11, 93, 79, 100, 109] of obtaining context information from other users.
The data-flow is predefined and the systems know the semantic meaning of the data sources.

2.3 Sensor-based activity recognition

Human activity is one of the most important high level contexts in the area of context-aware computing, due to the fact that many context-aware applications take actions based on human activities.
The following review of sensor-based activity recognition is divided into data-driven methods, knowledge-driven methods and a hybrid approach.

2.3.1 Data-driven methods

With data-driven approaches to activity recognition, the activities are modelled completely from the available data, and the traditional processes can be characterised by (1) Data acquisition, (2) Preprocessing, (3) Segmentation, (4) Feature extraction, and (5) Model training and classification.
Therefore, the degree of fitting that the created models have with the data is influenced by various factors, such as placement of sensors, training data, feature selection and classifiers.

Feature selection

Raw data generated from sensors are usually pre-processed into feature vectors before it is fed into recognition models.
Since large number of features would result in redundancy and jeopardise the accuracy and energy-efficiency of the model, it is wise to select a subset of the features that have the most discriminative power.
This is very important for mobile devices that are constrained by the limited battery power.
The previous works proposed various methods for feature selection in the wearable activity recognition area.
Chen et al. [22] propose an online LDA (Linear Discriminant Analysis) that is able to add or delete an instance dynamically.

Supervised learning

Supervised learning methods require the training data to be labelled.
The parameters of the models are learned by maximising the likelihood of the training data.
Commonly used generative models for human activity recognition include Naive Bayesian [10, 115, 102, 138, 143], Hidden Markov Model [120, 80, 161, 81, 153], and Dynamic Bayesian Network [114] Discriminative Models: contrary to generative models, discriminative models directly model the conditional probability distribution of latent activity classes over the sensor observations.
The training of AdaBoost follows an iterative process.
The output of the training process is an ensemble of weak learners (step 6).

Semi-supervised learning

The aforementioned supervised learning methods suffer from several drawbacks.
Firstly, the labelling of training data for activity modelling is time-consuming and sometimes errorprone, considering huge amount of data from sensors in realistic scenarios.
To overcome the shortcomings of supervised learning, many researchers turn to the direction of semi-supervised techniques, in which the models are created with small amount of labelled data, and the unlabelled instances classified with high confidence are added into the training data to refine the models.
In En-Co-training, different types of classifiers are trained on the same training data, and during the iterative learning process, the unlabelled instance which all the classifiers agree with the predicted label are used to augment the training data and retrain the classifiers.
The labelled data is regarded as positive class, while the unlabelled data is regarded as negative data to train the initial SVM.

Active learning

Active learning is another way of reducing the labelling effort for human activity recognition.
The basic idea is that the most informative instances usually reside near the classification boundary, and the classifier is uncertain about the predicted activity classes.
There are many existing works using active learning methods to alleviate the training data collection and the labelling effort in the activity recognition pipeline.
They use the Growing Neural Gas (GNG) algorithm to select the most informative instances.
The basic idea is to query the user for the label of each node, so as to reduce the labelling effort.

Transfer learning

In semi-supervised and active learning, the activity classes of the labelled data and the unlabelled are in the same domain, while in transfer learning, the labelled data in one domain is used to recognize the activities in another domain, so that the data acquisition in the target domain is not required.
Then the data in the source domain is labelled with the activities in the target domain along with the similarities, and this data is used to train the SVM activity model.
The loss function of each instance is further weighted with the similarity so that the dissimilar instance contributes less to the model training.
The authors in [51] propose to perform knowledge transferring both in the feature space and in the label space.
In the feature space, the instances in target domain are labelled with the labels of the source domain based on the similarity of the sensor readings distribution between them.

Zero-shot learning

In contrast to transfer learning that recognise activities in the target domain with activity data of the source domain, zero-shot learning recognises new activity classes that have never been observed before.
Cheng et al. [24] first map the raw sensor data into semantic attributes such as upper arm back, upper arm down, and then define the activity-attribute matrix that shows the presence of each semantic attribute in different activity classes (including the unseen activities).
In [23], they extend the previous work by incorporating the sequence nature of the activity data with Conditional Random Fields (CRF).
Specifically, potential functions are defined between the variables (i.e. activity-attribute, attribute-feature vector, attributeattribute), and the weights of the potential functions are learned by minimising the loss function on the training data.
Zero-shot learning suffers from the drawback that it is a non-trivial work to define the activity-attribute matrix, especially when the target activity classes are diverse.

Unsupervised learning

Without any labelled data, unsupervised learning method is to uncover latent activity patterns from the data.
Rashidi et al. [119] propose a technique to discover frequent patterns and model them as multiple HMMs.
In [54], the author performs Latent Dirichlet Allocation (LDA) on accelerometer readings to cluster daily routines such as commuting, office and lunch.
LDA explores the documents and clusters the frequently co-occurring words into the same latent topic.

Discussion

Both of the mining-based and the ontology-based approaches have their own disadvantages.
Activity models established from the object probabilities mined from the external information sources are general models that do not allow to achieve high recognition performance as people perform activities quite differently.
For ontology-based method, even though data uncertainty and temporal reasoning are solved by the probabilistic ontological framework, defining the ontology and specifying the weights of the axioms are non-trivial tasks.
Moreover, the explicitly specified order of actions in the ontology disregards the fact that activities can be performed differently by various users, even the same person may perform the same activity in various manners at different time.
The goal of their research is not to propose solutions to solve aforementioned problems.

2.3.2 Knowledge-driven methods

The knowledge-driven approach leverages domain knowledge for activity modelling and inferring.
The knowledge-driven approach is more compelling than the data-driven for several reasons.
Since most activities usually take place at different time, location and with different object interactions, thus this additional context can be used to better characterise the activities.
Moreover, the knowledge used for activity recognition can be explicitly specified or mined from external information sources, thus avoiding the processes of manual labelling, feature extraction and learning in the data-driven approaches to activity recognition.
Their advantages and disadvantages are also discussed.

Mining-based approach

The basic idea behind this approach is that the activity models can be created by mining knowledge from the existing external sources such as websites, which provide the instructions on performing the activities and the objects that are required.
Then given the objects used at a specific time point, which are usually captured through sensors, the probabilities of the activity classes that current activity belongs to are calculated and the one that has the maximum probability is chosen to label the current activity.
They create a hierarchical ontology of synonymous words for functionally similar objects.
By performing shrinkage over the ontology of objects, they calculate the probabilities for the unseen objects in a probabilistic way.
Instead of relying on object probabilities for activity recognition, Gu et al. [37] mine the activity descriptive texts from the websites, they then use natural language processing method to extract the objects used in the activities from the texts and information retrieval methods to calculate the weights of each object with respect to different activities.

Ontology-based approach

Rather than mining objects usage information from the external resources, an ontologybased approach is to explicitly specify the activity models with a description-based method.
The activation of the sensor can be interpreted as the object-interaction.
The aggregation of sensor activations at a specific time point can be used to establish a situation, which is then reasoned against already established activity models.
Activity ontologies interlink the activities and contextual information through object properties, and activity recognition is equivalent to reasoning on a dynamically constructed situation against activity ontologies.
There may be several ontologies with different set of axioms that are consistent with each other, only the one that have the maximum a-posterior (MAP) is chosen for reasoning activity.

2.3.3 Hybrid methods

There are some works combining the knowledge-driven methods with data-driven methods for activity recognition.
They use domain knowledge to create a Dynamic Bayesian Network (DBN), and the DBN is able to infer the possible actions given the activity and object.
Then, the possible actions are used to label the accelerometer data and those labelled data are used to train boosting classifiers.
Notice that the training of the classifiers also takes into account the probability of the action as virtual evidence.
Then, they use the EM method to estimate the vision-object probabilities.

2.3.4 Other concerns

There are also many previous works addressing other aspects in human activity recognition.
They include robustness, segmentation, personalisation and additional features for activity recognition.

Robustness

Maekawa et al. [95] address the issue of scalability in activity recognition.
They employ the information of the end user to find other users whose sensor data may be similar to those of the end user, and model the activities of the end user in an unsupervised way with the data from other similar users.
While in [77], the authors leverage the similarity (physical similarity, lifestyle similarity and sensor-data similarity) between different subjects to populate and enlarge training data.
Kapitanova et al. [62] address the issue of faulttolerance.
Through observing the performance achieved by different models, it is possible to locate the failed sensors.

Segmentation

Continuous activity recognition usually incurs the problem of data segmentation.
Data collected in a small window may not be sufficient to recognize activity, while a large window may result in situation that data from different activity classes collide into the same window.
Gu et al. [39] propose Max-Gain which is based on the observation that different feature items weight differently for different activities.
This approach has the danger of errors accumulating.
Krishnan et al. [72] propose a probabilistic approach to dynamically determine the window size, which, however, requires large amount of training data.

Personalisation

Existing methods usually incorporate sensor data from multiple users to build robust activity models, termed as a general model.
Actually, as demonstrated in [150], a personalised activity model greatly outperforms a general one.
In [163], the authors first classify the samples with a Decision tree, and then perform clustering method to re-organise the samples.
The parameters (thresholds in branch nodes) are re-estimated with the re-organised samples.
In the testing phase, the probability that a classifier is chosen for prediction is proportional to its weight.

Additional features

Most of the existing activity recognition systems usually employ physical signals or object usage as features, there are also many works considering additional context information to improve the recognition performance.
Lara et al. [78] incorporate vital signs into the feature vector.
While Maekawa et 36 CHAPTER 2: CRITICAL LITERATURE SURVEY al. [96] try to recognize ADLs (activity of daily living) by employing many kinds of sensors including cameras, microphones and accelerometers.
Instead, they use the energy harvesting power signal when the energy is harvested to power the device.
Wang et al. [149] use Wi-Fi signals for activity tracking and recognition.

2.3.5 Activity recognition in specific domains

Most of the previous works recognise physical activities (e.g. standing, sitting) or activities in daily lives (e.g. kitchen activities, activities in smart environment).
With the prevalence of various sensors and mobile devices, recently many researchers study the recognition of activity in a specific domain.
In [168], the authors present a system for nutrition monitoring with a smart table cloth, also known as Eating.
In [40], the authors move a step ahead and detect sleep stages with sensors on smartphone.
They use linear Conditional Random Field to integrate these feature and make further inference.

2.3.6 Discussion

Most of the aforementioned activity recognition techniques are inapplicable in their scenario, since most of them use static models with predefined data sources, and are not able to adapt with dynamically available sensors that may be potentially beneficial to the recognition accuracy.
Even though the knowledge-based method can be used to deal with the unseen contexts provided by the sensors, the parameters specified by the method is general knowledge and cannot achieve satisfactory performance due to the fact that they usually sacrifice the recognition performance for the alleviation of labelling efforts.
On the other hand, some researchers personalise the activity model to a specific user for achieving high accuracy.
They do not consider the dynamically available sensors that are common in realistic scenarios.
The lack of related work and the importance of addressing sensor dynamics have motivated us to propose methods for incorporating the sensors dynamically for activity recognition.

2.4 Activity model retraining and adaptation

The authors first create a generic activity model with limited labelled data, and then perform activity model adaptation with the additional information provided by dynamically available sensors.
Therefore, their work is related to traditional semi-supervised activity recognition methods that select the instances classified with high confidence to retrain and adapt the activity model.
Moreover, those methods do not consider information provided by dynamically available sensors.
The related works are also reviewed in the previous sections.
The main problem with those methods is that labelled data of a specific user is required for the activity model to be adapted to his/her activity, and annotating activity data is time-consuming, expensive and error-prone.

2.5 Sensor dynamics in activity recognition

The state of the art activity recognition models usually rely on a static model, where only pre-defined data sources are considered while opportunistically available contexts that may potentially refine the systems are ignored.
Previous works show that more contextual information can further improve the activity recognition accuracy.
They build an ontology and reason for each location 40 CHAPTER 2: CRITICAL LITERATURE SURVEY which activities could possibly happen.
Therefore, these approaches are not able to incorporate the information provided by dynamically available sensors.
In [35] the authors have to calculate the recognition losses of all the possible combinations of sensors at the training time, so that they are able to select the optimal set of sensors that save the most energy and meet the recognition accuracy requirement for any predicted activities.

3.1 Motivation

The primary goal of this thesis is to develop activity recognition frameworks that are able to incorporate sensor data from dynamically available sensors for low- and high-level activity recognition.
Therefore, the authors aim to create a generic activity model with minimum labelled data while maintaining a satisfactory accuracy.
The key contributions in this chapter are summarised as follows: .
The demonstration is illustrated with pairwise training and testing that trains the activity model with one user’s activity data and tests it on another user.
The authors exploit the temporal information in the human activities during the topic assign- ment process of LDA.

3.2 Latent Dirichlet Allocation

The authors describe the motivation of using LDA for creating a generic activity model and analyse why it cannot be applied directly to activity data.
LDA models the likelihood of the words in the documents by introducing a latent layer of topics.
This characteristic makes it reasonable to leverage LDA to create a generic activity model, as each user may have different proportions of activity data, while the estimation of the parameters for each activity class utilizes the corresponding data from all the users, and the resulted parameters are globally shared among the users.
Therefore, it is infeasible to assume each activity class is multinomially distributed over the feature vectors.
Given 561-dimensional instances, the authors need to estimate a 561-dimensional mean vector and a 561× 561-dimensional covariance matrix for each Gaussian component.

3.3 Conventional classifiers for posterior probability estima-

Tion Since LDA cannot be applied to the activity data directly, the authors propose to combine it with conventional classifiers, AdaBoost, Decision Tree, RandomForest.
AdaBoost is detailed in Section 2.3.1, so the authors briefly introduce some of its notions along with those of Decision Tree and RandomForest.
The authors also describe how to estimate their posterior probabilities that are important to the hybrid approach.

3.3.2 Decision tree

A Decision tree is a flowchart-like structure where each leaf represents a target activity class, and each inner node represents a classification on a specific feature and the branches following that node represent the outcome of the classification; all the instances that satisfy the condition on a branch go through that branch to the next node for the further classification.
Training the Decision tree is to find the feature to split the training data so that the difference in the information entropy before and after the split is maximum.
The training process iterates the splitting until all the instances at the leaf nodes belong to the same class.
In practise, a pruning technique is usually employed when the tree reaches the maximum depth to 48 CHAPTER 3: CREATING GENERIC ACTIVITY MODEL WITH MINIMUM LABELLED DATA avoid overfitting.
Decision tree is one of the most widely used classifiers [129], and one advantage of Decision tree is that it is able to generate decision rules that are understandable and interpretable.

3.3.3 Random forest

Random forest is an ensemble learning method that trains multiple Decision trees at the learning phase and outputs the class that is the majority predictions of those individual Decision trees.
An individual Decision tree may be sensitive to the noisy training data, the average of the multiple uncorrelated Decision trees is not.
Therefore, the training of Random forest usually employs the bagging algorithm to de-correlate the ensemble of Decision trees.
CREATING GENERIC ACTIVITY MODEL WITH MINIMUM LABELLED DATA, also known as 49 CHAPTER 3.
The posterior probability of an instance given by Random forest is the average over the posterior probabilities given by the ensemble of Decision trees.

3.3.4 Virtual evidence

In their case, the confidence value is the classified confidence (posterior probability) for each instance.
Training the classifiers is equivalent to minimising the loss function on the training dataset, and the loss function is usually the summed error made by each instance.
It is easy to incorporate the virtual evidence into the AdaBoost training, as it explicitly calculates the training error in the learning process.

3.4 Creating generic model

In the previous section, the authors described how to approximate the posterior probability of a classified instance using conventional classifiers.
The predictive likelihood in Eq.(2.3.5) can be approximated with the posterior probability of the classifiers as introduced in Section.
Therefore, the authors need to consider the topic assignments of neighbouring instances when sampling the topic for current instance, formulated as follows: P(xi|xk) ∝ P(yi = k|xi)∏j∈N(i)\i P(yj = k|xj) Z (3.4.2) where N(i) indicates the neighbouring instances of xi and Z is the normalization function.
As the iterative process proceeds, the model is able to confidently estimate the labels corresponding to the instances, then the “virtual evidence" approximates the real assignment and the EM process (line 7) results in a more accurate classifier [110].

3.5.1 Datasets

In order to evaluate the proposed methods, the authors experiment with datasets that contain activity data from multiple users.
The Smartphone activity dataset [130], UCI HAR dataset [6] and Heterogeneity Dataset for Human Activity Recognition (HHAR) [134] meet the requirement.
The authors compute time domain features such as mean, standard deviation, median, zero crossing rate, variance, root mean square for each axis of the sensors with a 2-second sliding window and 50% overlapped.
The activity data is collected from on-board accelerometer and gyroscopes on 8 smartphones and 2 smartwatches worn by 9 subjects performing six activities.
The summaries of the datasets are presented in Table.

3.5.2 Pre-analysis

To validate the motivation of creating a generic activity model, the authors examine the differences in performing activities among the users by training the activity model with data from one person and testing it on another.
The authors record the f1-score ( f 1 − score = 2∗precision∗recallprecision+recall ) and present the cumulative distribution function and histogram of the f1-score for each dataset in Figure 3.1 and Figure 3.2 respectively.
Figure 3.2 shows that in most cases, the activity model achieves a low f1-score (0.2-0.8), if it is trained on one person and used to test the data from another.
This experiment demonstrates that, people perform the activities quite differently, and the model trained on an individual person cannot be scaled to other people who have different activity patterns.

3.5.3 Comparison

The authors compare the proposed method that creates the generic activity model with the following methods: Semi-supervised method: traditional classifiers that are trained with the initial labelled data, and used to classify the unlabelled instances, then the instances classified with high confidences are selected to retrain the classifiers.
The hybrid of LDA and AdaBoost is compared with semi-supervised AdaBoost.
This is because the classifier is trained with limited labelled data from various users, and hence contains much uncertainty when it makes predictions.
From the figures the authors can also find that the standard deviation of the f1-score across different datasets can be as high as 11%.
The reason is that the activity data from the other subjects (except the one used for validation) is not diverse enough to create a generic activity model, and adding the activity data of the user used for validation is able to boost the recognition performance [152].

3.6 Summary

The authors collaboratively create a generic activity model with partially labelled data by combining LDA and conventional classifiers.
Since the sensor data is not semantically interpretable, the authors employ a machine learning method to create the generic activity model that maps the sensor readings to target activity classes.
To alleviate the data annotation effort, the authors leverage LDA to collaboratively create the generic model with partially labelled data of various users.
The proposed generic activity modelling in this chapter addresses challenge 1 in Section 1.2, and the topic assignment in Section 3.4 addresses challenge 6 in Section 1.2.
In later chapters, the authors consider activity adaptation and refinement by incorporating dynamically available data sources.

4.1 Motivation

The previous chapter describes how to create a generic model for activity recognition with limited labelled data.
As described in Section 2.5, this contextual information include location of the user [124], vision feature from the on-body camera [162], objects (e.g. cup) in the environment [96], etc.
The knowledge-driven methods require the sensor readings to be human readable (e.g. door sensor monitors door open event), so it cannot be applied to those sensor readings that are not semantically interpretable.
The authors also propose a novel way of combining basic classifier (i.e., AdaBoost) with graphical models (i.e. Hidden Markov model and Conditional Random Field) in order to exploit the temporal information to improve the recognition accuracy.

4.2 Framework

The workflow of their framework can be divided into three phases: modelling, learning to adapt and online prediction.
In the modelling phase, an initial activity model is created with currently available sensor data.
In the prediction phase, the initial model is combined with the graphical models to exploit the temporal information to further improve the recognition performance.
It should be noted that prediction is not the final stage.
To achieve this, the authors perform belief propagation on the predictions given by AdaBoost and select instances for retraining.

4.3.1 Basic modelling

As AdaBoost incrementally builds weak classifiers on the training dataset, it is more flexible in the dimensional changes of the feature space.
When discriminative context is detected during the learning to adapt phase, all AdaBoost has to do is training a weak learner on the context and add it to the ensemble along with its weight, without the necessity to change the feature space and retrain the whole model.
Also, in each iteration, AdaBoost only chooses the weak learner with minimum training error.
It presents an effective and tractable way to automatically select the features with maximum discriminative power [82].
Therefore, it is not necessary to evaluate the discrimination of the new context manually.

4.3.2 Belief propagation

As new sensors are dynamically discovered, the authors need to select instances that contain the new sensor data to adapt AdaBoost.
Therefore, there are strong correlations among the sequential predictions of the instances.
The multinomial distribution is iteratively updated by incorporating the messages from not only local observations, but also adjacent nodes.
Therefore, belief propagation is performed with an inference step and followed by several iterative update steps.
In the inference step, for each observation, AdaBoost generates a posterior probability distribution over the hidden activities using Eq.(2.3.1).

4.3.3 Instances selection

The authors introduce the method to select the instances for classifier retraining and adaptation.
The instances contain dynamically discovered context, and AdaBoost is able to automatically incorporate the new context if it is discriminative enough.
The authors perform instances selection after the belief propagation for the sake of selecting the informative and profitable instances to quickly converge the classifier without human intervention.

Measurements

First of all, the authors introduce the measurements that can evaluate the profitability of an instance, so that based on those quantitative criteria, the instances can be selected to adapt the model.
The first metric the authors consider is the “drift" in the posterior distribution before and after the belief propagation.
Therefore, the authors determine the final score for the profitability of an instance based on the corresponding scores for each of the metrics.
By increasing α3 the authors give more weight to the high-confidence instances, and then the model adapts conservatively and the convergence is quite slow.

Retraining

Upon selecting the instances for model adaptation, AdaBoost can automatically determine the discriminative power of the new context (if there is any) in the instance, and dynamically incorporate them for classification if they are discriminative enough.
The model is adapted to new coming data.
One issue should be addressed when selecting the instances is that the amount of retraining data among different activity classes should be balanced during the adaptation process.
During the experiments the authors found that for an activity class with a small training dataset, the iterative process of training weak learners is unexpectedly terminated earlier.
As a result, the trained ensemble of classifiers for that class overfits the small amount of data.

4.3.4 Sequential prediction

When the adapted AdaBoost is deployed for online prediction, the authors combine it with graphical models to further smooth outliers.
The authors introduce the methods of combining AdaBoost with Conditional Random Field, referred to as BoostCRF.
The modelling of discriminative classifiers is dissociated from the modelling of structured classifiers.
As shown in Figure 4.6a, it models the joint distribution of those variables and naively assumes that hidden state yk at each time step k only depends on hidden state at previous time step, yk−1, while observation xk at time k only depends on the hidden state at the same time slice, as shown in Figure 4.6a.
Then the emission 73 CHAPTER 4: PHYSICAL ACTIVITY RECOGNITION WITH DYNAMICALLY AVAILABLE SENSORS probability can be obtained according to Bayes’ rule: p(xk|yk) = p(yk|xk)p(xk) p(yk) ∝ p(yk|xk) (4.3.10) where prior knowledge p(yk) is identical for different activities because the authors balance the training data over all the activity classes.

BoostCRF

Rather than modelling the joint distribution of the variables, Conditional Random Field (CRF) models the conditional distribution of the hidden variables over the observations.
Due to the flexible definition of the potential functions, CRF has various structures.
Therefore, the authors need to define local potential functions between observation and hidden node at each time step, and pairwise potential functions between consecutive hidden nodes.
Inspired by [83], the authors map the weak learners trained in AdaBoost to the local potential functions in CRF, while the weights of the potential functions are mapped to the weights of the weak learners.
Therefore, the authors multiply the pairwise potential functions with a constant (average number of weak learners of the activity classes), so that the inferred results do not overfit the local evidences.

4.4 Experiment

The authors validate their methods introduced in the previous sections.
The authors firstly describe the datasets, and then specify the method to evaluate their approaches.

4.4.2 Set up

To validate the proposed framework, each of the datasets is divided into three portions, in accordance with the three stages in Figure 4.1.
Specifically, for all the datasets, the authors train the activity model with the first part of the dataset that contains only gyroscope data at the first stage.
The first dataset is personalised, the authors evenly partition the dataset into three parts and perform 6-fold cross validation.
In what follows, the authors will validate the effectiveness of their framework in terms of several aspects, especially the ability to incorporate new context, the importance of belief propagation and instances selection, the benefit of combining AdaBoost with graphical models.
Finally, the authors investigate the conditions under which their methods provide a marginal improvement or even jeopardise the initial model.

4.4.3 Incorporating new context

The authors validate their method by building an activity model with gyroscope data, and dynamically incorporating accelerometer data to refine the model.
The authors do not perform the iterative process to select the instances and retrain the model, as they found that additional iterations do not provide significant accuracy improvement according to their experiments.
Furthermore, the f1-score improvement in SD-POCKET setting is marginal.
In order to confirm the usefulness of extra features, the authors count the proportion of weak learners that are trained on the new features during the retraining process.
While in SD-POCKET dataset, the retraining process terminates unexpectedly early for the activity Sitting, and fewer weak learners are 79 CHAPTER 4: PHYSICAL ACTIVITY RECOGNITION WITH DYNAMICALLY AVAILABLE SENSORS trained on the retraining dataset and hence the new features cannot be sufficiently leveraged for performance improvement.

4.4.4 Role of belief propagation

The authors will examine the role that belief propagation plays in their framework.
Notice that the instances for retraining contain the dynamically available features (i.e. accelerometer data).
The result is presented in Figure 4.9, from which the authors can see that for most of the datasets, noBelief and noExtra provide marginal f1-score improvement.
In some case, noExtra even experiences performance loss.
On the one hand, high-confidence instances are usually less in- 80 CHAPTER 4: PHYSICAL ACTIVITY RECOGNITION WITH DYNAMICALLY AVAILABLE SENSORS formative and make less contribution to the f1-score improvement.

4.4.5 Role of graphical models

The authors evaluate the recognition performance by combining AdaBoost with CRF, which is to smooth the accidental predictions given by AdaBoost.
Actually, when the authors look at the results provided by BoostHMM, instances of some continuous activities are still sporadically classified as other classes.
For the datasets SAD and UCI, BoostCRF seems to present no advantage over BoostHMM.
PHYSICAL ACTIVITY RECOGNITION WITH DYNAMICALLY AVAILABLE SENSORS seems that one iteration is not enough to converge the model, because more iterations can still improve the f1-score, as shown in Figure 4.11, also known as It 82 CHAPTER 4.
The authors only use the 80 gyroscope features while they build their model on the all of the 561 features.

4.4.6 Investigation of the usefulness of extra context

The authors investigate the conditions under which the extra context cannot help with the accuracy improvement.
Secondly, if the initial model is not accurate enough, misclassified instances would be selected for retraining and jeopardise the model.
In order to validate the second assumption, the authors limit the size of initial training dataset, so that the initial model would overfit the dataset and result in an inaccurate classifier.
From the 84 CHAPTER 4: PHYSICAL ACTIVITY RECOGNITION WITH DYNAMICALLY AVAILABLE SENSORS figure one can see that, the adapted model would be negatively affected if the initial model is not accurate enough.

4.5 Summary

The authors develop a framework that automatically incorporates dynamically available sensors for low-level activity recognition.
In the framework, the most informative instances are selected to adapt and refine the initially created activity model.
The authors also leverage the temporal information of human behaviour to boost the performance, both in the off-line data analysis and online predictions.
The proposed method is able to select the valuable instances to adapt and refine the model without human intervention, and its combination with the graphical models is able to further improve the recognition accuracy.
Recognising high-level daily activities (e.g. kitchen activities) require different types of environment-instrumented sensors and on-body sensors.

5.1 Introduction

The previous chapter describes the methods that automatically incorporate dynamically available on-body sensors (e.g. accelerometer, gyroscope) for low-level activity recognition and activity model adaptations.
The authors propose methods to approach the aforementioned challenges, and the key contributions are summarised as follows: .
The authors propose a high-level activity recognition framework that is able to integrate dy- namically available sensors upon their discovery, and to adapt the activity models to take advantage of these additional contexts produced by newly available sensors.
The authors develop a knowledge-driven method for incorporating dynamically available con- texts without supervision.

5.2 Context modelling

Various types of sensors can be used for activity recognition, they include body-worn sensors, object sensors and ambient sensors [126], etc.
The sensor context models capture the necessary information about the sensors.
The context information in this model provides the guideline to pre-process the sensor readings into high-level context (e.g. interaction with objects) for activity recognition.
2. shows an example of activity modelling.
The result of such an adaptation is illustrated in the grey part with dash lines in Figure.

5.3 Problem definition

The authors introduce the concepts and definitions used in this chapter and then formally define activity recognition as a classification problem.
C indicates the total number of activity classes.
How to recognize the set of testing instances x ∈ {0, 1}1×(N+d), given the set of training data L, where d is the number of dynamically available contexts, also known as The problem is.
In the later sections, the authors describe how to learn context weights for activities, followed by the activity recognition adaptation to the newly available context.
Let P ∈ RC×N be the probability matrix with Pkj defining the probability of jth context given kth activity.

5.4 Knowledge-driven method

The authors describe how to leverage an external knowledge base to create activity models and perform the activity model adaptation with dynamically available sensors in an unsupervised manner.
High-level activities are usually characterised by different kinds of 92 CHAPTER 5: HIGH-LEVEL ACTIVITY RECOGNITION AND ADAPTATION WITH DYNAMICALLY AVAILABLE CONTEXTS contexts, (e.g. "making sandwich" can be described by location context "kitchen", and object contexts "knife" and "bread").
Moreover, there exists some descriptive texts that specify the instructions of how to perform high-level activities.
Finally, with the parameters the authors are able to create activity models and perform activity model adaptation using the Bayesian framework.
Dynamically available sensors are incorporated into the activity recognition framework automatically.

5.4.1 Knowledge base

The authors describe how to mine the knowledge (i.e. context-activity conditional probabilities) from the websites, www.wikihow.com and www.ehow.com [116, 156].
Both of these two websites describe how to perform daily activities and involved contexts.
The authors first crawl the websites and get the descriptive documents for each target activity class, and then identify the contexts involved in each activity using the natural language processing method.
Finally, the authors calculate the context-activity conditional probability of each context with respect to different activities.
This makes it feasible to automatically crawl the textual descriptions.

5.4.2 Activity modelling

The authors use the Bayesian framework to formulate the activity model, and enforce the Markov smoother on the neighbouring activity instances to encourage the same activity to be con- 1https://en.wikipedia.org/wiki/Additive_smoothing 96 CHAPTER 5: HIGH-LEVEL ACTIVITY RECOGNITION AND ADAPTATION WITH DYNAMICALLY AVAILABLE CONTEXTS tinued to avoid accidental misclassifications.
By assuming the independences among the different contexts, the authors can have: p(xi|yi) = N ∏ n=1 p(xi,n|yi) (5.4.5) where N is the total number of contexts that are currently available.
In practice, the authors found that Bernoulli Naive Bayes performs better than others such as Gaussian and Multinomial Naive Byes.
In Section 5.4.1, the authors described how to leverage the external source to create the knowledge base that specifies those conditional probabilities, so that when they dynamically discover new contexts they can use those probabilities in the knowledge base for activity recognition.

5.4.3 Activity prediction

Now that the authors have the emission probabilities (e.g. context-activity probabilities) from the knowledge base, they still need the transition probabilities among the activity classes so that they can infer the latent activity sequence on the sequence of the context observations.
The authors manually set the transition probabilities with domain knowledge, similar as in previous work [154, 147].
The basic idea is that a human usually carries out activities for a certain amount of time, and current activity is more likely to be continued in the next time slice.
Therefore, the self-transition probabilities are much higher than the probabilities of transiting one activity class to a different one.
Maximising the joint distribution is equivalent to solving 98 CHAPTER 5: HIGH-LEVEL ACTIVITY RECOGNITION AND ADAPTATION WITH DYNAMICALLY AVAILABLE CONTEXTS max j αj(m).

5.5 Data-driven method

The previous section describes how to mine knowledge from external sources for activity modelling and prediction.
The personalisation process takes the activity data of a specific user as input, and employs the data-driven machine learning method for learning the parameters of the activity models.
The rationale of introducing the weight is twofold.
Second, since margins between activity classes may change due to additional context information provided by newly discovered sensors, learning the weights from the context data will provide activity recognition adaptation to adjust the classification margins.
Therefore, the weight of sugar in Make tea needs to be increased for this person activity recognition to indicate the important role it plays in the activity.

5.5.1 Activity recognition

In the modified activity model , each context probability is further associated with a weight.
The activity recognition is, for a given a feature vector xi, to calculate a score for each activity class and choose the class that has the maximum score as the prediction.
Wyi · (logPyi ◦ xi) T, where W ∈ RC×N is the weight matrix with Wkj being the weight associated with Pkj.
These two processes are illustrated as two steps in Figure.
Their method can be seen as a hybrid of the generic and discriminative model.

5.5.2 Activity learning

The authors draw the idea from learning-to-rank [17] to learn the weight matrix W. Specifically, it learns the interest of each user and the functionality of each item [85].
The authors compute a score against each activity class for each instance during the learning process, and the scores can be regarded as the rankings of activity classes given the particular instance.
Solving the above inequality is equivalent to maximizing the value of the Area Under the ROC Curve (AUC) which is commonly used in classification problems.

Regularization

Employing the learning-to-rank algorithm facilitates the addition of different kinds of constraints.
Adding the l2-norm of the parameters W into the object function Eq.(5.5.4) can avoid the problem of overfitting.
Moreover, the regularization term can be leveraged to perform collaborative learning (e.g. neighbouring instances are more likely to belong to the same activity class).
Human activities present strong temporal relationships and the current activity is more likely to be carried out in the next time slice [147].
Therefore, when performing optimisation for an instance, the authors consider not only the local evidences (i.e. context observations), but also evidences from temporally adjacent instances.

Parameter learning

The authors employ the widely used stochastic gradient descent (SGD) to learn the parameters W, b by maximising the object function Eq.(5.5.5).
In the learning process, the parameters are initially randomised and then iteratively updated based on the initial labelled data.
For illustrative purpose, the authors do not consider regularization items.

Adaptation data selection

Learning the parameters requires the instances that contain the new context and their corresponding activity classes.
The high-confidence instances are less informative and have little contribution to the parameters learning.
Then the authors select the instances with high score and their predicted classes for activity model adaptation.
Different activity classes may have different distributions over the scores .
Therefore, to maintain class balance, the authors set different thresholds for different classes, so that for each class all the instances having the score higher than the threshold of that classes are selected as adaptation data.

Activity model adaptation

With the adaptation data, the authors perform learning to rank to obtain more accurate activity models by weighting each context.
The object function can also be maximised with SGD in the same way as in the previous section, the authors do not detail the process here.
In this light the activity model adaptation process is able to automatically determine the useful context.
Notice that even though the authors only query a small set of the activity instances for retraining, they can still leverage the unlabelled instance for temporal regularization (3rd term in Eq.5.5.9).
The authors include the temporal regularization into the equation and the classification of each instance also considers the classification results of neighbouring ones, prediction = argmaxyi ( r(yi, xi)− β1 2 ∑ j∈N(i) (r(yi , xi)− r(yi, xj)) 2) (5.5.11) where β1 controls the tradeoff between temporal regularization and local evidence.

5.6.1 Public dataset

The authors validate the proposed methods using the OPPORTUNITY dataset [126].
The activities of the user in the scenario are annotated on different levels, including locomotion (e.g. standing), gesture (e.g. opening), high-level activities (e.g. Coffee time).
For object and ambient sensors, even though there are several sensors of the same type, they are used for monitoring different contexts.
In their demonstration of the effectiveness of the proposed method for recognising high-level complex activities the authors use data for 3 subjects, not 4. 5.2.
The window length of 5 seconds is a tradeoff between delay and recognition performance, and examining the influence of the window length is out of scope for this thesis.

5.6.2 Simulation dataset

The authors also manually generate sensor data for the validation of the knowledge-driven method.
In Algorithm 5, the authors first generate the class label Datai,−1 for ith instance, and then generate context presences for that instance.
The authors use the Bernoulli distribution for generating the context presence as it has been demonstrated in previous work [99] that the real sensor event distribution is Bernoulli distribution parameterised with the firing probability.

5.6.3 Validation of knowledge-driven method

The authors describe the validation of the knowledge-driven method for activity recognition and adaptation.
Specifically, the authors demonstrate the possibility of incorporating dynamically discovered contexts for activity recognition.
The authors first introduce the validation method, followed by an illustrative example, and finally the experimental results.

Validation method

The authors use the OPPORTUNITY dataset for validating the data-driven method described in Section 5.5.
The reason is that the authors aim to demonstrate that the data-driven method is able to personalise and adapt the activity model to a specific user for achieving high accuracy.
Notice that in Section 5.5.3, the authors introduce an adaptation data selection method that computes a score for each classified instance and selects the instances scored higher than a threshold for the adaptation.
First, the temporal information preserved in LORO-CV can be used for regularization both in the training and testing process.

Example

The following example illustrates the process of incorporating dynamically available context for activity recognition and adaptation.
Suppose the authors have the activity class make coffee and make tea, characterized by contexts cup, water, and sugar with different probabilities as shown in Figure 5.10.
As make coffee is also characterized by those three contexts with similar probabilities, misclassification occurs when the user is actually carrying out the activity make coffee.
Suppose now the authors dynamically discover a sensor and it provides the context leaf that is used to characterize activity make tea, then the activity model can be adapted with the parameters mined from the website as shown in Figure 5.12.
Feature vector where the context leaf is present is classified as make tea , while feature vector where the context leaf is not present is classified as make coffee .

Experiment results

The experiment results are presented in Figure 5.15.
The percentage of the contexts in the first dataset is varied from 10% to 90% .
The amount of accuracy improvement is proportional to the number of dynamically available contexts, as more contexts provide more information for the activity models.
First, the accuracy depends on the types of activity classes the authors are to classify.
In addition, the authors only dynamically incorporate 3 contexts, some of which may not be discriminative enough to improve the recognition performance significantly.

Impact of adaptation

The authors study the f1-score gain after incorporating dynamically available context in this experiment.
For a given X, the authors randomly sample X contexts and repeat this process for 200 times to avoid biases.
Therefore, any data points on the right side of the line f (x) = x indicate that there is f1-score improvement after the adaptation in these rounds of experiments, larger distance from the line means greater improvement.
The underlying reason is that the f1-score improvement depends on the discriminative power of the dynamically available contexts.
Basically, the more contexts are incorporated, the higher improvement of recognition performance is expected.

Impact of adaptation data

The authors examine the impact that the size of adaptation data has on the recognition performance.
For each X ∈ {15, 18, 21, 24}, the threshold for selecting the adaptation data is varied from 30 percentile to 90 percentile of the scores of the classified instances.
Higher threshold means less adaptation for retraining.
From the figures, the authors can draw the following conclusions.
Secondly, activity models with more contexts initially perform better than those with fewer contexts.

Influence of regularization weight

The temporal regularization term β1 in Eq.(5.5.9) controls the tradeoff between the local contextual information and the information from neighbouring instances.
As shown in Table 5.5, the threshold for selecting adaptation data is 30 percentile of the scores, and the number of initial contexts is set to 24.
The authors compare the proposed method with the conventional machine learning methods.
It seems unfair to compare their method with the conventional machine learning methods, as they make prediction for each instance independently and do not consider the temporal CHAPTER 5: HIGH-LEVEL ACTIVITY RECOGNITION AND ADAPTATION WITH DYNAMICALLY AVAILABLE CONTEXTS information of neighbouring instances.

5.7 Summary

The authors address the problem of adaptive high-level activity recognition with dynamically available sensors.
The existing research shows that additional contextual information can potentially improve the recognition accuracy, and sensor addition or replacement is very common in activity recognition systems.
In the knowledge-driven method, the authors mine external resources (e.g. websites) to specify the parameters in the activity models.
With the knowledge-driven method, the authors can perform activity model adaptation with new contexts in an unsupervised manner.
The improvements vary and depend on several factors such as the amount of adaptation data, the weight of the temporal regularization and the number of contexts in the initial train set.

Did you find this useful? Give us your feedback

Figures (59)

Figure 5.17: F1-score before and after adaptation across the datasets

Figure 4.11: F1-score corresponding to the number of iterations during inference process for BoostCRF.

Table 5.7: Comparison with hybrid classifiers.

Figure 2.6: General architecture of mobile context sensing system.

Figure 5.9: Distribution over the scores of different activities

Figure 2.1: Graphical context modelling.

Figure 3.3: Comparing the hybrid of LDA and AdaBoost with baselines.

Figure 4.2: Belief propagation between hidden variables

Figure 4.13: Percentage of weak learners that are trained on magnetometer features during the adaptation process.

Figure 4.12: Performance(f1-score) improvement by incorporating magnetometer features, we do not experiment on dataset UCI as it does not provide magnetometer data.

Figure 2.7: Pipeline of conventional human activity recognition.

Figure 4.7: F1-score improvement by dynamically and automatically incorporating accelerometer data.

Figure 5.20: The f1-score as a function of the temporal regularization weight.

Table 5.5: Parameters setting description.

Figure 5.8: An example of the parameters learning process.

Figure 5.15: Experiment results on simulation data and OPPORTUNITY

Figure 3.1: Cumulative distribution function (CDF) of the f1-score across the datasets when performing pairwise training and testing.

Figure 3.2: Histogram of the f1-score across the datasets when performing pairwise train and test.

Content maybe subject to copyright Report

A Framework for Mobile Activity Recognition

Jiahui Wen

B.E. (Computer science & Technology), M.Tech(Computer science & Technology)

A thesis submitted for the degree of Doctor of Philosophy at

The University of Queensland in 2017

School of Information Technology & Electrical Engineering

Abstract

Activity recognition is being applied in an increasing number of applications. They include

health monitoring of the elderly, discovery of frequent behavioural patterns, monitoring

of daily life activities (e.g. eating, tooth brushing, sleeping), and analysis of exercise a c-

tivities (e.g. swimming, running). Current app roaches for activity recognition usually use

the process of data preprocessing, feature extraction, activity model learning and activity

recognition. Most of the previous research pipeline these steps a nd create static models

for processing activity data and recognizing activities. The static models have predeﬁned

data sources that are tightly coupled with the models and never change once the models

are created. However, the static models are unable to deal with sensor failures and sensor

replacements that are quite common in real scenarios. Moreover, additional information

provided by newly available data sources from dynamically discovered new sensors may

potentially reﬁne the activity model if this in formation can discriminatively characterize a

speciﬁc activity class. However, the static models cannot leverage this additional informa-

tion for self-reﬁnement due to the static assumption of data sources.

The primary goal of our research is to design and develop frameworks for activity recog-

nition with dynamically available data sources, and propose and develop algorithms for

activity model adaptation with the additional information provided by those data sources.

In this thesis, we ﬁrst provide a critical literature review in the areas of contexts modelling,

context management, sensor modelling and sensors in mobile devices, activity recognition,

activity model retraining and adaptation, and sensor dynamics in activity recognition. We

then present the research on our activity recognition framework that makes the following

key contributions.

First, we propose a hybrid method that integrates Latent Dirichlet Allocation with conven-

tional classiﬁers for learning a gene ric activity model with minimum annotated data. The

hybrid method is able to alleviate the problem of data sparsity and requires a little amount

of labelled activity data. Furthermore, it can deal with different variants of activity patterns

since it is created with activity data of multiple users. The generic activity modelling serves

as the starting point of our activity model adaptation with dynamically available sensor

data. However, it can also serve as an independent component for other applications such

as activity personalization.

Second, based on the generic model, we propose a framework for low-level activity (e.g.

running, walking) recognition with dynamically available sensors. The components of the

framework i nclude a basic classiﬁer, instance sele ction and smoothing. Firstly, we use Ad-

aBoost as our basic classiﬁer as it is ﬂexible with feature dimensionality and it can auto-

matically select the discriminative features during the learning process. Secondly, we pro-

pose to select the most informative instances for activity model adaptation in an unsuper-

vised manner. The instances contain features of the new sensor data, and the information of

new sensors are incorporated seamlessly through the adaptation process. Finally, we design

smoothing methods by integrating the graphical models such as Hidden Markov Model and

Conditional Random Field with the basic classiﬁer AdaBoost.

Finally, we propose a framework for high-level activity (e. g, making coffee) recognition with

dynamically available contexts. We propose sensor and activity models to address sensor

heterogeneity and populating contextual information. Knowledge-driven and data-driven

methods are proposed for incorporating the new contexts. The knowledge-driven m ethod

speciﬁes the parameters of the new contexts with external knowledge in an unsupervised

manner, and the data-driven method learns the parameters of the new contexts with the

users’ data using the proposed learning-to-rank technique and temporal regularization. Ex-

tensive experiments and comprehensive comparisons demonstrate the effectiveness of the

proposed frameworks.

Declaration by author

This thesis is composed of my original work, a nd contains no material previously published

or written by another person except where due reference has been made in the text. I have

clearly stated the contribution by others to jointly-authored works that I have included in

my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statisti-

cal assistance, survey design, data analysis, signiﬁcant technical procedures, professional

editorial advice, and any other original research work used or reported in my thesis. The

content of my thesis is the result of work I have carried out since the commencement of my

research higher degree candidature and does not include a substantial part of work that has

been submitted to qualify for the awa rd of any other degree or diploma in any university

or other tertiary institution. I have clearly stated which parts of m y thesis, if any, have been

submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Li-

brary and, subject to the policy and p rocedures of The University of Queensland, the thesis

be made available for research and study in accordance with the Copyright Act 1968 unless

a period of embargo has been approved by the Dean of the Graduate School.

I acknowledge that copyright of all material contained in my thesis resides with the copy-

right holder(s) of that material. Where appropriate I have obtained copyright permission

from the copyright holder to reproduce material in this thesis.

iii

Publications during candidature

[1] Jiahui Wen, Jadwiga Indulska, Mingyang Zhong, Adaptive Activity Learning with Dy-

namically Available Context, In Proc. of the IEEE International Conference on Pervasive

Computing and Communications (PerCom), Sydney, Australia, March 2016.

[2] Mingyang Zhong, Jiahui Wen, Peizhao Hu, Jadwiga Indulska, Advancing Android Ac-

tivity Recognition Service with Markov Smoother, In Proc. of the IE E E PerCom workshop

on Context and Activity Modelling and R ecognition (CoMoRea), St. Louis, USA, March,

2015.

[3] Jiahui Wen, Mingyang Zhong, Jadwiga Indulska, Creating General Model f or Activity

Recognition with M inimum Labelled Data, In Proc. of the AC M International Symposium

on Wearable Computers (ISWC), Osaka, Japan, September, 2015.

[4] Jiahui Wen, Seng Loke, Jadwiga Indulska, Mingyang Zhong, Sensor-based Activity

Recognition with Dynam ically Added Context, In Proc. of the 12th EAI International Con-

ference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiq-

uitous), Coimbra, Portugal, July 2015.

[5] Jiahui Wen, Jadwiga Indulska, Zhiying Wang, Discovering Latent Structures for Activ-

ity Recognition in Smar t Environments, In Proc. of the IEEE International Conference on

Ubiquitous Intelligence and Computing (UIC), Bali, Indonesia, December 2014.

HTML Viewer