scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2016"


Proceedings ArticleDOI
20 Mar 2016
TL;DR: This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.
Abstract: Many state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) Systems are hybrids of neural networks and Hidden Markov Models (HMMs). Recently, more direct end-to-end methods have been investigated, in which neural architectures were trained to model sequences of characters [1,2]. To our knowledge, all these approaches relied on Connectionist Temporal Classification [3] modules. We investigate an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels. We show how this setup can be applied to LVCSR by integrating the decoding RNN with an n-gram language model and by speeding up its operation by constraining selections made by the attention mechanism and by reducing the source sequence lengths by pooling information over time. Recognition accuracies similar to other HMM-free RNN-based approaches are reported for the Wall Street Journal corpus.

1,167 citations


Journal ArticleDOI
TL;DR: A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations.
Abstract: This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network ( DBN ) to handle skeletal dynamics, and a 3D Convolutional Neural Network ( 3DCNN ) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.

401 citations


Journal ArticleDOI
TL;DR: This paper addresses the problem of the accurate segmentation of the first and second heart sound within noisy real-world PCG recordings using an HSMM, extended with the use of logistic regression for emission probability estimation, and implements a modified Viterbi algorithm for decoding the most likely sequence of states.
Abstract: The identification of the exact positions of the first and second heart sounds within a phonocardiogram (PCG), or heart sound segmentation, is an essential step in the automatic analysis of heart sound recordings, allowing for the classification of pathological events. While threshold-based segmentation methods have shown modest success, probabilistic models, such as hidden Markov models, have recently been shown to surpass the capabilities of previous methods. Segmentation performance is further improved when a priori information about the expected duration of the states is incorporated into the model, such as in a hidden semi-Markov model (HSMM). This paper addresses the problem of the accurate segmentation of the first and second heart sound within noisy real-world PCG recordings using an HSMM, extended with the use of logistic regression for emission probability estimation. In addition, we implement a modified Viterbi algorithm for decoding the most likely sequence of states, and evaluated this method on a large dataset of 10 172 s of PCG recorded from 112 patients (including 12 181 first and 11 627 second heart sounds). The proposed method achieved an average $F_{1}$ score of 95.63 $\,\pm \,$ 0.85%, while the current state of the art achieved 86.28 $\pm \,$ 1.55% when evaluated on unseen test recordings. The greater discrimination between states afforded using logistic regression as opposed to the previous Gaussian distribution-based emission probability estimation as well as the use of an extended Viterbi algorithm allows this method to significantly outperform the current state-of-the-art method based on a two-sided paired t-test.

366 citations


Proceedings Article
30 Sep 2016
TL;DR: In this paper, a unified algorithm is proposed to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks.
Abstract: Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.

307 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work proposes a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure and reports state-of-the-art results on three datasets.
Abstract: While current approaches to action recognition on presegmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results. Automatically locating and classifying the relevant action segments in videos of varying lengths proves to be a challenging task. We propose a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure. Our approach aims at globally optimizing the joint probability of three components, a length and language model and a discriminative action model, without making intermediate decisions. The problem of finding the most likely action sequence and the corresponding segment boundaries in an exponentially large search space is addressed by dynamic programming. We provide an extensive evaluation of each model component on Thumos 14, a large action detection dataset, and report state-of-the-art results on three datasets.

262 citations


Journal ArticleDOI
TL;DR: The R package moveHMM allows ecologists to process GPS tracking data into series of step lengths and turning angles, and to fit an HMM to these data, allowing, in particular, for the incorporation of environmental covariates.
Abstract: Summary Due to the substantial progress in tracking technology, recent years have seen an explosion in the amount of movement data being collected. This has led to a huge demand for statistical tools that allow ecologists to draw meaningful inference from large tracking data sets. The class of hidden Markov models (HMMs) matches the intuitive understanding that animal movement is driven by underlying behavioural modes and has proven to be very useful for analysing movement data. For data that involve a regular sampling unit and negligible measurement error, these models usually are sufficiently flexible to capture the complex correlation structure found in movement data, yet are computationally inexpensive compared to alternative methods. The R package moveHMM allows ecologists to process GPS tracking data into series of step lengths and turning angles, and to fit an HMM to these data, allowing, in particular, for the incorporation of environmental covariates. The package includes assessment and visualization tools for the fitted model. We illustrate the use of moveHMM using (simulated) movement of the legendary wild haggis Haggis scoticus. Our findings illustrate the role our software, and movement modelling in general, can play in conservation and management by illuminating environmental constraints.

257 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel methodological architecture that combines deep learning and state-space modelling, and applies it to rs-fMRI based Mild Cognitive Impairment (MCI) diagnosis, and designs a Deep Auto-Encoder to discover hierarchical non-linear functional relations among regions, which transform the regional features into an embedding space, whose bases are complex functional networks.

251 citations


Journal ArticleDOI
TL;DR: The R package pomp as mentioned in this paper provides a very flexible framework for Monte Carlo statistical investigations using nonlinear, non-Gaussian POMP models, including iterated filtering, particle Markov chain Monte Carlo, approximate Bayesian computation, maximum synthetic likelihood estimation and trajectory matching.
Abstract: Partially observed Markov process (POMP) models, also known as hidden Markov models or state space models, are ubiquitous tools for time series analysis. The R package pomp provides a very flexible framework for Monte Carlo statistical investigations using nonlinear, non-Gaussian POMP models. A range of modern statistical methods for POMP models have been implemented in this framework including sequential Monte Carlo, iterated filtering, particle Markov chain Monte Carlo, approximate Bayesian computation, maximum synthetic likelihood estimation, nonlinear forecasting, and trajectory matching. In this paper, we demonstrate the application of these methodologies using some simple toy problems. We also illustrate the specification of more complex POMP models, using a nonlinear epidemiological model with a discrete population, seasonality, and extra-demographic stochasticity. We discuss the specification of user-defined models and the development of additional methods within the programming environment provided by pomp.

242 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed system leads to substantial improvement on localization accuracy in coping with the turbulent wireless signals.

237 citations


Proceedings ArticleDOI
01 Sep 2016
TL;DR: This paper introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian fashion, to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks.
Abstract: This paper introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian fashion. The hybrid CNN-HMM combines the strong discriminative abilities of CNNs with the sequence modelling capabilities of HMMs. Most current approaches in the field of gesture and sign language recognition disregard the necessity of dealing with sequence data both for training and evaluation. With our presented end-to-end embedding we are able to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks by between 15% and 38% relative and up to 13.3% absolute.

204 citations


Journal ArticleDOI
TL;DR: A novel feature vector with depth information is computed and fed into the Hidden Conditional Neural Field (HCNF) classifier to recognize dynamic hand gestures and Experimental results show that the proposed method is suitable for certain dynamic hand gesture recognition tasks.
Abstract: Dynamic hand gesture recognition is a crucial but challenging task in the pattern recognition and computer vision communities. In this paper, we propose a novel feature vector which is suitable for representing dynamic hand gestures, and presents a satisfactory solution to recognizing dynamic hand gestures with a Leap Motion controller (LMC) only. These have not been reported in other papers. The feature vector with depth information is computed and fed into the Hidden Conditional Neural Field (HCNF) classifier to recognize dynamic hand gestures. The systematic framework of the proposed method includes two main steps: feature extraction and classification with the HCNF classifier. The proposed method is evaluated on two dynamic hand gesture datasets with frames acquired with a LMC. The recognition accuracy is 89.5% for the LeapMotion-Gesture3D dataset and 95.0% for the Handicraft-Gesture dataset. Experimental results show that the proposed method is suitable for certain dynamic hand gesture recognition tasks.

Journal ArticleDOI
TL;DR: This study proposes a novel OSA detection approach based on ECG signals by considering temporal dependence within segmented signals by using a discriminative hidden Markov model (HMM) and corresponding parameter estimation algorithms.
Abstract: Obstructive sleep apnea (OSA) syndrome is a common sleep disorder suffered by an increasing number of people worldwide. As an alternative to polysomnography (PSG) for OSA diagnosis, the automatic OSA detection methods used in the current practice mainly concentrate on feature extraction and classifier selection based on collected physiological signals. However, one common limitation in these methods is that the temporal dependence of signals are usually ignored, which may result in critical information loss for OSA diagnosis. In this study, we propose a novel OSA detection approach based on ECG signals by considering temporal dependence within segmented signals. A discriminative hidden Markov model (HMM) and corresponding parameter estimation algorithms are provided. In addition, subject-specific transition probabilities within the model are employed to characterize the subject-to-subject differences of potential OSA patients. To validate our approach, 70 recordings obtained from the Physionet Apnea-ECG database were used. Accuracies of 97.1% for per-recording classification and 86.2% for per-segment OSA detection with satisfactory sensitivity and specificity were achieved. Compared with other existing methods that simply ignore the temporal dependence of signals, the proposed HMM-based detection approach delivers more satisfactory detection performance and could be extended to other disease diagnosis applications.

Journal ArticleDOI
TL;DR: A dynamic fatigue detection model based on Hidden Markov Model (HMM) provides an effective way in detecting driver fatigue and the posterior of fatigue can be gotten dynamically by this HMM-based fatigue recognition method.
Abstract: Quantification and objective estimation of driver fatigue in real prolonged driving.Simultaneous recording of physiological parameters in wireless and nonintrusive way.Develop a dynamic fatigue detection model by multiple features and contexts. Driver's states in successive time slices are not independent, especially, fatigue is one of a cognitive state that is developing over time. Meanwhile, driver fatigue is also influenced by some corresponding contextual information at a certain time. In such case, classifying driving state at each time slice separately from it in before and after time slices obviously has less meaning. Therefore, a dynamic fatigue detection model based on Hidden Markov Model (HMM) is proposed in this paper. Driver fatigue can be estimated by this model in a probabilistic way using various physiological and contextual information. Electroencephalogram (EEG), Electromyogram (EMG), and respiration signals were simultaneously recorded by wearable sensors and sent to computer by Bluetooth during the real driving. From these physiological information, fatigue likelihood can be achieved using kernel distribution estimate at different time sections. Contextual information offered by specific environmental factors were used as prior of fatigue. As time proceeds, the posterior of fatigue can be gotten dynamically by this HMM-based fatigue recognition method. Based on the results of the method in this paper, it shows that it provides an effective way in detecting driver fatigue.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: This paper presents a more effective stochastic gradient decent (SGD) learning rate schedule that can significantly improve the recognition accuracy, and demonstrates that using multiple recurrent layers in the encoder can reduce the word error rate.
Abstract: Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as in the standard hybrid framework. The recurrent neural network (RNN) encoderdecoder is such a model, performing sequence to sequence mapping without any predefined alignment. This model first transforms the input sequence into a fixed length vector representation, from which the decoder recovers the output sequence. In this paper, we extend our previous work on this model for large vocabulary end-to-end speech recognition. We first present a more effective stochastic gradient decent (SGD) learning rate schedule that can significantly improve the recognition accuracy. We then extend the decoder with long memory by introducing another recurrent layer that performs implicit language modelling. Finally, we demonstrate that using multiple recurrent layers in the encoder can reduce the word error rate. Our experiments were carried out on the Switchboard corpus using a training set of around 300 hours of transcribed audio data, and we have achieved significantly higher recognition accuracy, thereby reduced the gap compared to the hybrid baseline.

Proceedings ArticleDOI
07 Mar 2016
TL;DR: The resulting architecture outperforms state-of-the-art approaches for larger datasets, i.e. when sufficient amount of data is available for training structured generative models.
Abstract: We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system is evaluated for both the recognition of complex activities as well as their parsing into action units. Using a variety of video datasets ranging from human cooking activities to animal behaviors, our experiments demonstrate that the resulting architecture outperforms state-of-the-art approaches for larger datasets, i.e. when sufficient amount of data is available for training structured generative models.

Journal ArticleDOI
TL;DR: This paper presents a novel online-capable interaction-aware intention and maneuver prediction framework for dynamic environments that achieves a significant improvement in terms of reliable prediction time and precision compared with other state-of-the-art approaches.
Abstract: This paper presents a novel online-capable interaction-aware intention and maneuver prediction framework for dynamic environments. The main contribution is the combination of model-based interaction-aware intention estimation with maneuver-based motion prediction based on supervised learning. The advantages of this framework are twofold. On one hand, expert knowledge in the form of heuristics is integrated, which simplifies the modeling of the interaction. On the other hand, the difficulties associated with the scalability and data sparsity of the algorithm due to the so-called curse of dimensionality can be reduced, as a reduced feature space is sufficient for supervised learning. The proposed algorithm can be used for highly automated driving or as a prediction module for advanced driver assistance systems without the need of intervehicle communication. At the start of the algorithm, the motion intention of each driver in a traffic scene is predicted in an iterative manner using the game-theoretic idea of stochastic multiagent simulation. This approach provides an interpretation of what other drivers intend to do and how they interact with surrounding traffic. By incorporating this information into a Bayesian network classifier, the developed framework achieves a significant improvement in terms of reliable prediction time and precision compared with other state-of-the-art approaches. By means of experimental results in real traffic on highways, the validity of the proposed concept and its online capability is demonstrated. Furthermore, its performance is quantitatively evaluated using appropriate statistical measures.

Book ChapterDOI
13 Jul 2016
TL;DR: This paper is proposing a new system which super resolves the image using deep learning convolutional network followed by the Hidden Markov Model and Singular Value Decomposition based face recognition.
Abstract: Due to the importance of security in society, monitoring activities and recognizing specific people through surveillance video cameras play an important role. One of the main issues in such activity arises from the fact that cameras do not meet the resolution requirement for many face recognition algorithms. In order to solve this issue, in this paper we are proposing a new system which super resolves the image using deep learning convolutional network followed by the Hidden Markov Model and Singular Value Decomposition based face recognition. The proposed system has been tested on many well-known face databases such as FERET, HeadPose, and Essex University databases as well as our recently introduced iCV Face Recognition database (iCV-F). The experimental results show that the recognition rate is improving considerably after apply the super resolution.

Journal ArticleDOI
TL;DR: This paper presents an accurate and efficient BCR sequence annotation software package using a novel HMM “factorization” strategy that is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.
Abstract: VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM “factorization” strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.

Proceedings ArticleDOI
11 Jul 2016
TL;DR: A framework based on the Hidden Markov Models (HMMs) benefited from the utilization of the trajectories and hand-shape features of the original sign videos, respectively is proposed, which can capture the spatio-temporal information well and fuse the probabilities of trajectory and hand shape.
Abstract: Sign Language Recognition (SLR) aims at translating the sign language into text or speech, so as to realize the communication between deaf-mute people and ordinary people. This paper proposes a framework based on the Hidden Markov Models (HMMs) benefited from the utilization of the trajectories and hand-shape features of the original sign videos, respectively. First, we propose a new trajectory feature (enhanced shape context), which can capture the spatio-temporal information well. Second, we fetch the hand regions by Kinect mapping functions and describe each frame by HOG (pre-processed by PCA). Moreover, in order to optimize predictions, rather than fixing the number of hidden states for each sign model, we independently determine it through the variation of the hand shapes. As for recognition, we propose a combination method to fuse the probabilities of trajectory and hand shape. At last, we evaluate our approach with our self-building Kinect-based dataset and the experiments demonstrate the effectiveness of our approach.

Book ChapterDOI
17 Oct 2016
TL;DR: This work applies recurrent neural networks to the task of recognizing surgical activities from robot kinematics, and is the first to apply recurrent neural Networks to this task, using a single model and a single set of hyperparameters.
Abstract: We apply recurrent neural networks to the task of recognizing surgical activities from robot kinematics. Prior work in this area focuses on recognizing short, low-level activities, or gestures, and has been based on variants of hidden Markov models and conditional random fields. In contrast, we work on recognizing both gestures and longer, higher-level activites, or maneuvers, and we model the mapping from kinematics to gestures/maneuvers with recurrent neural networks. To our knowledge, we are the first to apply recurrent neural networks to this task. Using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and advance state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance. Code is available at https://github.com/rdipietro/miccai-2016-surgical-activity-rec.

Journal ArticleDOI
TL;DR: In this paper, hidden unit contributions (LHUC) is used to linearly re-combine hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data.
Abstract: This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) ---a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data. We also extend LHUC to a speaker adaptive training (SAT) framework that leads to a more adaptable DNN acoustic model, working both in a speaker-dependent and a speaker-independent manner, without the requirements to maintain auxiliary speaker-dependent feature extractors or to introduce significant speaker-dependent changes to the DNN structure. Through a series of experiments on four different speech recognition benchmarks (TED talks, Switchboard, AMI meetings, and Aurora4) comprising 270 test speakers, we show that LHUC in both its test-only and SAT variants results in consistent word error rate reductions ranging from 5% to 23% relative depending on the task and the degree of mismatch between training and test data. In addition, we have investigated the effect of the amount of adaptation data per speaker, the quality of unsupervised adaptation targets, the complementarity to other adaptation techniques, one-shot adaptation, and an extension to adapting DNNs trained in a sequence discriminative manner.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new Bayesian model and algorithm for depth and reflectivity profiling using full waveforms from the time-correlated single-photon counting measurement in the limit of very low photon counts.
Abstract: This paper presents a new Bayesian model and algorithm used for depth and reflectivity profiling using full waveforms from the time-correlated single-photon counting measurement in the limit of very low photon counts. The proposed model represents each Lidar waveform as a combination of a known impulse response, weighted by the target reflectivity, and an unknown constant background, corrupted by Poisson noise. Prior knowledge about the problem is embedded through prior distributions that account for the different parameter constraints and their spatial correlation among the image pixels. In particular, a gamma Markov random field (MRF) is used to model the joint distribution of the target reflectivity, and a second MRF is used to model the distribution of the target depth, which are both expected to exhibit significant spatial correlations. An adaptive Markov chain Monte Carlo algorithm is then proposed to perform Bayesian inference. This algorithm is equipped with a stochastic optimization adaptation mechanism that automatically adjusts the parameters of the MRFs by maximum marginal likelihood estimation. Finally, the benefits of the proposed methodology are demonstrated through a series of experiments using real data.

Proceedings ArticleDOI
28 Oct 2016
TL;DR: The task of detecting insiders through a novel method of modelling a user's normal behaviour in order to detect anomalies in that behaviour which may be indicative of an attack is investigated.
Abstract: The threat that malicious insiders pose towards organisations is a significant problem. In this paper, we investigate the task of detecting such insiders through a novel method of modelling a user's normal behaviour in order to detect anomalies in that behaviour which may be indicative of an attack. Specifically, we make use of Hidden Markov Models to learn what constitutes normal behaviour, and then use them to detect significant deviations from that behaviour. Our results show that this approach is indeed successful at detecting insider threats, and in particular is able to accurately learn a user's behaviour. These initial tests improve on existing research and may provide a useful approach in addressing this part of the insider-threat challenge.

Journal ArticleDOI
TL;DR: A novel algorithm combining the hidden Markov model (HMM) and Bayesian filtering techniques to recognize a driver’s lane changing intention and can achieve a recognition accuracy of 93.5% and 90.3% which is a significant improvement compared with the HMM-only algorithm.
Abstract: Poor driving habits such as not using turn signals when changing lanes present a major challenge to advanced driver assistance systems that rely on turn signals. To address this problem, we propose a novel algorithm combining the hidden Markov model (HMM) and Bayesian filtering (BF) techniques to recognize a driver’s lane changing intention. In the HMM component, the grammar definition is inspired by speech recognition models, and the output is a preliminary behavior classification. As for the BF component, the final behavior classification is produced based on the current and preceding outputs of the HMMs. A naturalistic data set is used to train and validate the proposed algorithm. The results reveal that the proposed HMM–BF framework can achieve a recognition accuracy of 93.5% and 90.3% for right and left lane changing, respectively, which is a significant improvement compared with the HMM-only algorithm. The recognition time results show that the proposed algorithm can recognize a behavior correctly at an early stage.

Journal ArticleDOI
22 Mar 2016-PLOS ONE
TL;DR: This work introduces the Expectation-Maximization binary Clustering (EMbC), a general purpose, unsupervised approach to multivariate data clustering, and focuses on the suitability of the EMbC algorithm for behavioural annotation of movement data.
Abstract: The growing capacity to process and store animal tracks has spurred the development of new methods to segment animal trajectories into elementary units of movement. Key challenges for movement trajectory segmentation are to (i) minimize the need of supervision, (ii) reduce computational costs, (iii) minimize the need of prior assumptions (e.g. simple parametrizations), and (iv) capture biologically meaningful semantics, useful across a broad range of species. We introduce the Expectation-Maximization binary Clustering (EMbC), a general purpose, unsupervised approach to multivariate data clustering. The EMbC is a variant of the Expectation-Maximization Clustering (EMC), a clustering algorithm based on the maximum likelihood estimation of a Gaussian mixture model. This is an iterative algorithm with a closed form step solution and hence a reasonable computational cost. The method looks for a good compromise between statistical soundness and ease and generality of use (by minimizing prior assumptions and favouring the semantic interpretation of the final clustering). Here we focus on the suitability of the EMbC algorithm for behavioural annotation of movement data. We show and discuss the EMbC outputs in both simulated trajectories and empirical movement trajectories including different species and different tracking methodologies. We use synthetic trajectories to assess the performance of EMbC compared to classic EMC and Hidden Markov Models. Empirical trajectories allow us to explore the robustness of the EMbC to data loss and data inaccuracies, and assess the relationship between EMbC output and expert label assignments. Additionally, we suggest a smoothing procedure to account for temporal correlations among labels, and a proper visualization of the output for movement trajectories. Our algorithm is available as an R-package with a set of complementary functions to ease the analysis.

Journal ArticleDOI
TL;DR: SIDL method is introduced as an adaptive feature extraction technique and an effective approach based on SIDL and hidden Markov model (HMM) is addressed for machinery fault diagnosis.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel temporal multi-modality deep learning architecture, named as Recurrent Temporal Multimodal RB-M (RTMRBM), that models multimodal sequences by transforming the sequence of connected MRBMs into a probabilistic series model and can obviously improve the accuracy of recognition compared with standard MRBM and the temporal model based on conditional RBM.
Abstract: In view of the advantages of deep networks in producing useful representation, the generated features of different modality data (such as image, audio) can be jointly learned using Multimodal Restricted Boltzmann Machines (MRB-M). Recently, audiovisual speech recognition based the M-RBM has attracted much attention, and the MRBM shows its effectiveness in learning the joint representation across audiovisual modalities. However, the built networks have weakness in modeling the multimodal sequence which is the natural property of speech signal. In this paper, we will introduce a novel temporal multimodal deep learning architecture, named as Recurrent Temporal Multimodal RB-M (RTMRBM), that models multimodal sequences by transforming the sequence of connected MRBMs into a probabilistic series model. Compared with existing multimodal networks, it's simple and efficient in learning temporal joint representation. We evaluate our model on audiovisual speech datasets, two public (AVLetters and AVLetters2) and one self-build. The experimental results demonstrate that our approach can obviously improve the accuracy of recognition compared with standard MRBM and the temporal model based on conditional RBM. In addition, RTMRBM still outperforms non-temporal multimodal deep networks in the presence of the weakness of long-term dependencies.

Journal ArticleDOI
TL;DR: The experimental results show the superiority of the proposed method over the state-of-the-art methods using two challenging depth images datasets.
Abstract: This paper presents spatiotemporal hybrid features, human tracking, and activity recognition into a single framework from video sequences captured by a RGB-D sensor. Initially, we received a sequence of depth maps to extract human silhouettes from the noisy background and track them using temporal human motion information from each frame. Then, hybrid features as optical flow motion features and distance parameters features are extracted from the depth silhouette region and used in an augmented form to work as patiotemporal features. In order to represent each activity in a better way, the augmented features are being clustered and symbolized by self-organization maps. Finally, these features are then processed by hidden Markov models to train and recognize human activities based on transition and emission probabilities values. The experimental results show the superiority of the proposed method over the state-of-the-art methods using two challenging depth images datasets.

Proceedings ArticleDOI
04 Dec 2016
TL;DR: An end-to-end deep network is trained for continuous gesture recognition (jointly learning both the feature representation and the classifier) that performs three-dimensional convolutions to extract features related to both the appearance and motion from volumes of color frames.
Abstract: In this paper, we propose using 3D Convolutional Neural Networks for large scale user-independent continuous gesture recognition. We have trained an end-to-end deep network for continuous gesture recognition (jointly learning both the feature representation and the classifier). The network performs three-dimensional (i.e. space-time) convolutions to extract features related to both the appearance and motion from volumes of color frames. Space-time invariance of the extracted features is encoded via pooling layers. The earlier stages of the network are partially initialized using the work of Tran et al. before being adapted to the task of gesture recognition. An earlier version of the proposed method, which was trained for 11,250 iterations, was submitted to ChaLearn 2016 Continuous Gesture Recognition Challenge and ranked 2nd with the Mean Jaccard Index Score of 0.269235. When the proposed method was further trained for 28,750 iterations, it achieved state-of-the-art performance on the same dataset, yielding a 0.314779 Mean Jaccard Index Score.

Journal ArticleDOI
TL;DR: The construction of a more robust system-an accelerometer glove-as well as its application in the recognition of sign language gestures with a described method based on Hidden Markov Model (HMM) and parallel HMM approaches are presented.
Abstract: The most popular systems for automatic sign language recognition are based on vision. They are user-friendly, but very sensitive to changes in regard to recording conditions. This paper presents a description of the construction of a more robust system—an accelerometer glove—as well as its application in the recognition of sign language gestures. The basic data regarding inertial motion sensors and the design of the gesture acquisition system as well as project proposals are presented. The evaluation of the solution presents the results of the gesture recognition attempt by using a selected set of sign language gestures with a described method based on Hidden Markov Model (HMM) and parallel HMM approaches. The proposed usage of parallel HMM for sensor-fusion modeling reduced the equal error rate by more than 60%, while preserving 99.75% recognition accuracy.