scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2018"


Proceedings ArticleDOI
15 Apr 2018
TL;DR: The Speech-Transformer is presented, a no-recurrence sequence-to-sequence model entirely relies on attention mechanisms to learn the positional dependencies, which can be trained faster with more efficiency and a 2D-Attention mechanism which can jointly attend to the time and frequency axes of the 2-dimensional speech inputs, thus providing more expressive representations for the Speech- Transformer.
Abstract: Recurrent sequence-to-sequence models using encoder-decoder architecture have made great progress in speech recognition task. However, they suffer from the drawback of slow training speed because the internal recurrence limits the training parallelization. In this paper, we present the Speech-Transformer, a no-recurrence sequence-to-sequence model entirely relies on attention mechanisms to learn the positional dependencies, which can be trained faster with more efficiency. We also propose a 2D-Attention mechanism, which can jointly attend to the time and frequency axes of the 2-dimensional speech inputs, thus providing more expressive representations for the Speech-Transformer. Evaluated on the Wall Street Journal (WSJ) speech recognition dataset, our best model achieves competitive word error rate (WER) of 10.9%, while the whole training process only takes 1.2 days on 1 GPU, significantly faster than the published results of recurrent sequence-to-sequence models.

771 citations


Journal ArticleDOI
23 Mar 2018
TL;DR: This paper revise one of the most popular RNN models, namely, gated recurrent units (GRUs), and proposes a simplified architecture that turned out to be very effective for ASR, and proposes to replace hyperbolic tangent with rectified linear unit activations.
Abstract: A field that has directly benefited from the recent advances in deep learning is automatic speech recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human–machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on recurrent neural networks (RNNs) that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals. In this paper, we revise one of the most popular RNN models, namely, gated recurrent units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is twofold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with rectified linear unit activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues. Results show that the proposed architecture, called light GRU, not only reduces the per-epoch training time by more than 30% over a standard GRU, but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to end-to-end connectionist temporal classification models.

231 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: This work presents a novel approach to human motion modeling based on convolutional neural networks (CNN), which is able to capture both invariant and dynamic information of human motion, which results in more accurate predictions.
Abstract: Human motion modeling is a classic problem in computer vision and graphics. Challenges in modeling human motion include high dimensional prediction as well as extremely complicated dynamics.We present a novel approach to human motion modeling based on convolutional neural networks (CNN). The hierarchical structure of CNN makes it capable of capturing both spatial and temporal correlations effectively. In our proposed approach, a convolutional long-term encoder is used to encode the whole given motion sequence into a long-term hidden variable, which is used with a decoder to predict the remainder of the sequence. The decoder itself also has an encoder-decoder structure, in which the short-term encoder encodes a shorter sequence to a short-term hidden variable, and the spatial decoder maps the long and short-term hidden variable to motion predictions. By using such a model, we are able to capture both invariant and dynamic information of human motion, which results in more accurate predictions. Experiments show that our algorithm outperforms the state-of-the-art methods on the Human3.6M and CMU Motion Capture datasets. Our code is available at the project website1.

192 citations


Journal ArticleDOI
TL;DR: In this article, a convolutional neural network-based hidden Markov models (CNN HMMs) are presented to classify multi-fault in mechanical systems, and the average classification accuracy ratios are 98.125% and 98% for two data series with agreeable error rate reductions.
Abstract: Vibration signals of faulty rolling element bearings usually exhibit non-linear and non-stationary characteristics caused by the complex working environment. It is difficult to develop a robust method to detect faults in bearings based on signal processing techniques. In this paper, convolutional neural network -based hidden Markov models (CNN HMMs) are presented to classify multi-faults in mechanical systems. In CNN HMMs, a CNN model is first employed to learn data features automatically from raw vibration signals. By utilizing the t-distributed stochastic neighbor embedding (t-SNE) technique, feature visualization is constructed to manifest the powerful learning ability of CNN. Then, HMMs are employed as a strong stability tool to classify faults. Both the benchmark data and experimental data are applied to the CNN HMMs. Classification results confirm the superior performance of the present combination model by comparing with CNN model alone, support vector machine (SVM) and back propagation (BP) neural network. It is shown that the average classification accuracy ratios are 98.125% and 98% for two data series with agreeable error rate reductions.

183 citations


Journal ArticleDOI
TL;DR: A Bayesian hidden Markov model (HMM) with Gaussian Mixture (GM) Clustering approach is used to model the DNA copy number change across the genome and is compared with various existing approaches such as Pruned Exact Linear Time method, binary segmentation method and segment neighborhood method.
Abstract: The change in the DNA is a form of genetic variation in the human genome. In addition, the DNA copy number change is also linked with the progression of many emerging diseases. Array-based Comparative Genomic Hybridization (CGH) is considered as a major task when measuring the DNA copy number change across the genome. Moreover, DNA copy number change is an essential measure to diagnose the cancer disease. Next generation sequencing is an important method for studying the spread of infectious disease qualitatively and quantitatively. CGH is widely used in continuous monitoring of copy number of thousands of genes throughout the genome. In recent years, the size of the DNA sequence data is very large. Hence, there is a need to use a scalable machine learning approach to overcome the various issues in DNA copy number change detection. In this paper, we use a Bayesian hidden Markov model (HMM) with Gaussian Mixture (GM) Clustering approach to model the DNA copy number change across the genome. The proposed Bayesian HMM with GM Clustering approach is compared with various existing approaches such as Pruned Exact Linear Time method, binary segmentation method and segment neighborhood method. Experimental results demonstrate the effectiveness of our proposed change detection algorithm.

182 citations


Journal ArticleDOI
TL;DR: The proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless of its hyperparameter settings, and it is found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.
Abstract: A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an oversmoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness for text-to-speech and voice conversion, and found that the proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless of its hyperparameter settings. Furthermore, we investigated the effect of the divergence of various GANs, and found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.

178 citations


Proceedings ArticleDOI
01 Dec 2018
TL;DR: In this paper, a deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data has been proposed, and the proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
Abstract: Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.

158 citations


Journal ArticleDOI
Kunyang Li1, Weifeng Pan1, Yifan Li1, Qing Jiang1, Guanzheng Liu1 
TL;DR: A method to detect OSA based on deep neural network and Hidden Markov model (HMM) using single-lead ECG signal and sparse auto-encoder to learn features, which belongs to unsupervised learning that only requires unlabeled ECG signals.

153 citations


Journal ArticleDOI
TL;DR: This paper proposes a deep learning approach to energy disaggregation—instead of learning one level of dictionary, it learns multiple layers of dictionaries for each device, used as a basis for source separation during disaggregation.
Abstract: Energy disaggregation is the task of segregating the aggregate energy of the entire building (as logged by the smartmeter) into the energy consumed by individual appliances. This is a single channel (the only channel being the smart-meter) blind source (different electrical appliances) separation problem. The traditional way to address this is via stochastic finite state machines (e.g., factorial hidden Markov model). In recent times, dictionary learning-based approaches have shown promise in addressing the disaggregation problem. The usual technique is to learn a dictionary for every device and use the learned dictionaries as basis for blind source separation during disaggregation. Prior studies in this area are shallow learning techniques, i.e., they learn a single layer of dictionary for every device. In this paper, we propose a deep learning approach—instead of learning one level of dictionary, we learn multiple layers of dictionaries for each device. These multi-level dictionaries are used as a basis for source separation during disaggregation. Results on two benchmark datasets and one actual implementation show that our method outperforms state-of-the-art techniques.

144 citations


Journal ArticleDOI
TL;DR: A NILM algorithm based on the Deep Neural Networks is proposed, which outperforms the AFAMAP algorithm both in seen and unseen condition, and that it exhibits a significant robustness in presence of noise.

134 citations


Journal ArticleDOI
TL;DR: Recently, momentuHMM as mentioned in this paper has been proposed as an open-source R package for modeling animal behavior from telemetry data using discrete-time hidden Markov models (HMM).
Abstract: 1. Discrete‐time hidden Markov models (HMMs) have become an immensely popular tool for inferring latent animal behaviours from telemetry data. While movement HMMs typically rely solely on location data (e.g. step length and turning angle), auxiliary biotelemetry and environmental data are powerful and readily‐available resources for incorporating much more ecological and behavioural realism. However, complex movement or observation process models often necessitate custom and computationally demanding HMM model‐fitting techniques that are impractical for most practitioners, and there is a paucity of generalized user‐friendly software available for implementing multivariate HMMs of animal movement. 2. Here, we introduce an open‐source R package, momentuHMM, that addresses many of the deficiencies in existing HMM software. Features include: (1) data pre‐processing and visualization; (2) user‐specified probability distributions for an unlimited number of data streams and latent behaviour states; (3) biased and correlated random walk movement models, including dynamic “activity centres” associated with attractive or repulsive forces; (4) user‐specified design matrices and constraints for covariate modelling of parameters using formulas familiar to most R users; (5) multiple imputation methods that account for measurement error and temporally irregular or missing data; (6) seamless integration of spatio‐temporal covariate raster data; (7) cosinor and spline models for cyclical and other complicated patterns; (8) model checking and selection; and (9) simulation. 3. After providing an overview of the main features of the package, we demonstrate some of the capabilities of momentuHMM using real‐world examples. These include models for cyclical movement patterns of African elephants, foraging trips of northern fur seals, loggerhead turtle movements relative to ocean surface currents, and grey seal movements among three activity centres. 4. momentuHMM considerably extends the capabilities of existing HMM software while accounting for common challenges associated with telemetry data. It therefore facilitates more realistic hypothesis‐driven animal movement analyses that have hitherto been largely inaccessible to non‐statisticians. While motivated by telemetry data, the package can be used for analysing any type of data that is amenable to HMMs. Practitioners interested in additional features are encouraged to contact the authors.

Journal ArticleDOI
TL;DR: This manuscript introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian framework, and compares the hybrid modelling to a tandem approach and evaluates the gain of model combination.
Abstract: This manuscript introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian framework. The hybrid CNN-HMM combines the strong discriminative abilities of CNNs with the sequence modelling capabilities of HMMs. Most current approaches in the field of gesture and sign language recognition disregard the necessity of dealing with sequence data both for training and evaluation. With our presented end-to-end embedding we are able to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks by between 15 and 38% relative reduction in word error rate and up to 20% absolute. We analyse the effect of the CNN structure, network pretraining and number of hidden states. We compare the hybrid modelling to a tandem approach and evaluate the gain of model combination.

Journal ArticleDOI
TL;DR: Investigation on the running time of different steps in FMM reveals that after precomputation is employed, the new bottleneck is located in candidate search, and more specifically, the projection of a GPS point to the polyline of a road edge.
Abstract: Wide deployment of global positioning system (GPS) sensors has generated a large amount of data with numerous applications in transportation research. Due to the observation error, a map matching ( ...

Proceedings ArticleDOI
01 Jun 2018
TL;DR: A novel action modeling framework is proposed, which consists of a new temporal convolutional network, named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting frame-wise action labels, and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion.
Abstract: In this work, we address the task of weakly-supervised human action segmentation in long, untrimmed videos. Recent methods have relied on expensive learning models, such as Recurrent Neural Networks (RNN) and Hidden Markov Models (HMM). However, these methods suffer from expensive computational cost, thus are unable to be deployed in large scale. To overcome the limitations, the keys to our design are efficiency and scalability. We propose a novel action modeling framework, which consists of a new temporal convolutional network, named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting frame-wise action labels, and a novel training strategy for weakly-supervised sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align action sequences and update the network in an iterative fashion. The proposed framework is evaluated on two benchmark datasets, Breakfast and Hollywood Extended, with four different evaluation metrics. Extensive experimental results show that our methods achieve competitive or superior performance to state-of-the-art methods.

Proceedings ArticleDOI
04 May 2018
TL;DR: In this article, a WaveNet generative speech model is used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s.
Abstract: Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.

Journal Article
TL;DR: An overview of the design choices in pomegranate is presented, and how they have enabled complex features to be supported by simple code, making it competitive with---or outperform---other implementations of similar algorithms.
Abstract: We present pomegranate, an open source machine learning package for probabilistic modeling in Python. Probabilistic modeling encompasses a wide range of methods that explicitly describe uncertainty using probability distributions. Three widely used probabilistic models implemented in pomegranate are general mixture models, hidden Markov models, and Bayesian networks. A primary focus of pomegranate is to abstract away the complexities of training models from their definition. This allows users to focus on specifying the correct model for their application instead of being limited by their understanding of the underlying algorithms. An aspect of this focus involves the collection of additive sufficient statistics from data sets as a strategy for training models. This approach trivially enables many useful learning strategies, such as out-of-core learning, minibatch learning, and semi-supervised learning, without requiring the user to consider how to partition data or modify the algorithms to handle these tasks themselves. pomegranate is written in Cython to speed up calculations and releases the global interpreter lock to allow for built-in multithreaded parallelism, making it competitive with---or outperform---other implementations of similar algorithms. This paper presents an overview of the design choices in pomegranate, and how they have enabled complex features to be supported by simple code.

Journal ArticleDOI
TL;DR: A new sentiment analysis method, based on text-based hidden Markov models (TextHMMs), for text classification that uses a sequence of words in training texts instead of a predefined sentiment lexicon and has potential to classify implicit opinions.
Abstract: Proposed a new sentiment analysis method, based on text-based hidden Markov models, that uses word orders without the need of sentiment lexicons.Proposed an ensemble of text-based hidden Markov models using boosting and clusters of words produced by latent semantic analysis.Showed the method has potential to classify implicit opinions by the proposed ensemble method.Showed better performance in comparison to several previous algorithms in several datasets.Applied it to a real-life dataset to classify paper titles. With the rapid growth of social media, text mining is extensively utilized in practical fields, and opinion mining, also known as sentiment analysis, plays an important role in analyzing opinion and sentiment in texts. Methods in opinion mining generally depend on a sentiment lexicon, which is a set of predefined key words that express sentiment. Opinion mining requires proper sentiment words to be extracted in advance and has difficulty classifying sentences that imply an opinion without using any sentiment key words. This paper presents a new sentiment analysis method, based on text-based hidden Markov models (TextHMMs), for text classification that uses a sequence of words in training texts instead of a predefined sentiment lexicon. We sought to learn text patterns representing sentiment through ensemble TextHMMs. Our method defines hidden variables in TextHMMs by semantic cluster information in consideration of the co-occurrence of words, and thus calculates the sentiment orientation of sentences by fitted TextHMMs. To reflect diverse patterns, we applied an ensemble of TextHMM-based classifiers. In the experiments with a benchmark data set, we show that this method is superior to some existing methods and particularly has potential to classify implicit opinions. We also demonstrate the practicality of the proposed method in a real-life data set of online market reviews.

Journal ArticleDOI
TL;DR: This work shows how the HMM can be inferred on continuous, parcellated source-space Magnetoencephalography (MEG) task data in an unsupervised manner, without any knowledge of the task timings, and reveals task-dependent HMM states that represent whole-brain dynamic networks transiently bursting at millisecond time scales as cognition unfolds.
Abstract: Complex thought and behaviour arise through dynamic recruitment of large-scale brain networks The signatures of this process may be observable in electrophysiological data; yet robust modelling of rapidly changing functional network structure on rapid cognitive timescales remains a considerable challenge Here, we present one potential solution using Hidden Markov Models (HMMs), which are able to identify brain states characterised by engaging distinct functional networks that reoccur over time We show how the HMM can be inferred on continuous, parcellated source-space Magnetoencephalography (MEG) task data in an unsupervised manner, without any knowledge of the task timings We apply this to a freely available MEG dataset in which participants completed a face perception task, and reveal task-dependent HMM states that represent whole-brain dynamic networks transiently bursting at millisecond time scales as cognition unfolds The analysis pipeline demonstrates a general way in which the HMM can be used to do a statistically valid whole-brain, group-level task analysis on MEG task data, which could be readily adapted to a wide range of task-based studies

Proceedings ArticleDOI
01 Dec 2018
TL;DR: In this article, the authors proposed a hybrid CTC/attention architecture for audio-visual recognition of speech in the wild, which leads to an 1.3% absolute decrease in word error rate over the audio-only model and achieves the state-of-the-art performance on LRS2 database.
Abstract: Recent works in speech recognition rely either on connectionist temporal classification (CTC) or sequence-to-sequence models for character-level recognition. CTC assumes conditional independence of individual characters, whereas attention-based models can provide nonsequential alignments. Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption. In this paper, we use the recently proposed hybrid CTC/attention architecture for audio-visual recognition of speech in-the-wild. To the best of our knowledge, this is the first time that such a hybrid architecture architecture is used for audio-visual recognition of speech. We use the LRS2 database and show that the proposed audio-visual model leads to an 1.3% absolute decrease in word error rate over the audio-only model and achieves the new state-of-the-art performance on LRS2 database (7% word error rate). We also observe that the audio-visual model significantly outperforms the audio-based model (up to 32.9% absolute improvement in word error rate) for several different types of noise as the signal-to-noise ratio decreases.

Journal ArticleDOI
TL;DR: In this article, a multi-spatial context fully convolutional recurrent network (MC-FCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem.
Abstract: Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps, successfully capturing the analytic and geometric properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network (MC-FCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.50 and 96.58 percent, respectively, which are significantly better than the best result reported thus far in the literature.

Proceedings ArticleDOI
Kartik Audhkhasi1, Brian Kingsbury1, Bhuvana Ramabhadran1, George Saon1, Michael Picheny1 
15 Apr 2018
TL;DR: In this paper, a joint word-character A2W model was proposed to learn to first spell the word and then recognize it, achieving a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder, pronunciation lexicon, or externally-trained language model.
Abstract: Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional subword based automatic speech recognition models using phones, characters, or context-dependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English Switchboard-Fisher data set. This paper describes a recipe to train an A2W model that closes this gap and is at-par with state-of-the-art sub-word based models. We achieve a word error rate of 8.8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint word-character A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarely-seen during training.

Journal ArticleDOI
TL;DR: In this paper, the filtering problem is investigated for a class of discrete-time Markov jump linear parameter varying systems with packet dropouts and channel noises in the network surroundings and the influence of monotonicity on the performance index is explored.
Abstract: In this paper, the filtering problem is investigated for a class of discrete-time Markov jump linear parameter varying systems with packet dropouts and channel noises in the network surroundings. The partial accessibility of system modes with respect to the designed filter is described by a hidden Markov model (HMM). A typical behavior characterization mechanism is proposed in the communication channel including data losses and additive noises, which occurs in a probabilistic way based on two mutually independent Bernoulli sequences. With the aid of a class of Lyapunov function subject to parameter-dependent and mode-dependent constraints, sufficient conditions ensuring the existence of HMM-based filters are obtained such that the filtering error system is stochastically stable with a guaranteed ${\mathcal {H}_{\infty }}$ error performance. The influence of monotonicity on the performance index is explored while changing the degree of both additive noise and mode inaccessibility. The effectiveness and applicability of the obtained results are finally verified by two numerical examples.

Journal ArticleDOI
TL;DR: Tests on publically available data show that the HHMM and proposed algorithm can effectively handle the modeling of appliances with multiple functional modes, as well as better representing a general type of appliances.
Abstract: Correctly anticipating load characteristics of low voltage level is getting increased interest by distribution network operators. Energy disaggregation could be one of the potential approaches to exploit the massive amount of smart meter data to fulfill the task. Proper individual home appliance modeling is critical to the performance of NILM. In this paper, a hierarchical hidden Markov model (HHMM) framework to model home appliances is proposed. This model aims to provide better representation for those appliances that have multiple built-in modes with distinct power consumption profiles, such as washing machines and dishwashers. The dynamic Bayesian network representation of such an appliance model is built. A forward–backward algorithm, which is based on the framework of expectation maximization, is formalized for the HHMM fitting process. Tests on publically available data show that the HHMM and proposed algorithm can effectively handle the modeling of appliances with multiple functional modes, as well as better representing a general type of appliances. A disaggregation test also demonstrates that the fitted HHMM can be easily applied to a general inference solver to outperform conventional hidden Markov model in the estimation of energy disaggregation.

Journal ArticleDOI
TL;DR: A turnkey method for scanpath modeling and classification based on variational hidden Markov models (HMMs) and discriminant analysis (DA), which allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior.
Abstract: How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.

Journal ArticleDOI
TL;DR: In this article, a learning-based approach was proposed to predict unintended lane departure behaviors and chances of drivers to bring vehicles back to the lane by combining the Gaussian mixture model and the hidden Markov model.
Abstract: Misunderstanding of driver correction behaviors is the primary reason for false warnings of lane-departure-prediction systems. We proposed a learning-based approach to predict unintended lane-departure behaviors and chances of drivers to bring vehicles back to the lane. First, a personalized driver model for lane-departure and lane-keeping behavior is established by combining the Gaussian mixture model and the hidden Markov model. Second, based on this model, we developed an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will act a lane departure behavior or correction behavior. We also develop a warning strategy based on the model-based prediction algorithm that allows the lane-departure warning system to be acceptable for drivers according to the predicted trajectory. In addition, the naturalistic driving data of ten drivers were collected to train the personalized driver model and validate this approach. We compared the proposed method with a basic time-to-lane-crossing (TLC) method and a TLC-directional sequence of piecewise lateral slopes (TLC-DSPLS) method. Experimental results show that the proposed approach can reduce the false-warning rate to 3.13% on average at 1-s prediction time.

Journal ArticleDOI
TL;DR: A set of switching ARDLV models are proposed in the probabilistic framework, which extends the original single model to its multimode form and a hierarchical fault detection method is developed for process monitoring in the multimode processes.
Abstract: In most industrials, the dynamic characteristics are very common and should be paid enough attention for process control and monitoring purposes. As a high-order Bayesian network model, autoregressive dynamic latent variable (ARDLV) is able to effectively extract both autocorrelations and cross-correlations in data for a dynamic process. However, the operating conditions will be frequently changed in a real production line, which indicates that the measurements cannot be described using a single steady-state model. In this paper, a set of switching ARDLV models are proposed in the probabilistic framework, which extends the original single model to its multimode form. Based on it, a hierarchical fault detection method is developed for process monitoring in the multimode processes. Finally, the proposed method is demonstrated by a numerical example and a real predecarburization unit in an ammonia synthesis process.

Journal ArticleDOI
TL;DR: P predictive routing based on the hidden Markov model (PRHMM) for VANETS, which exploits the regularity of vehicle moving behaviors to increase the transmission performance and enables seamless handoff between vehicle-to-vehicle and vehicle- to-infrastructure communications.
Abstract: It is very difficult to establish and maintain end-to-end connections in a vehicle ad hoc network (VANET) as a result of high vehicle speed, long inter-vehicle distance, and varying vehicle density. Instead, a store-and-forward strategy has been considered for vehicle communications. The success of this strategy, however, depends heavily on the cooperation among nodes. Different from exiting store-and-forward solutions, we propose predictive routing based on the hidden Markov model (PRHMM) for VANETS, which exploits the regularity of vehicle moving behaviors to increase the transmission performance. As vehicle movements often exhibit a high degree of repetition, including regular visits to certain places and regular contacts during daily activities, we can predict a vehicle’s future locations based on the knowledge of past traces and the hidden Markov model. Consequently, the short-term route of a vehicle and its packet delivery probability for a specific mobile destination can be predicted. Moreover, PRHMM enables seamless handoff between vehicle-to-vehicle and vehicle-to-infrastructure communications so that the transmission performance will not be constrained by the vehicle density and moving speed. Simulation evaluation demonstrates that PRHMM performs much better in terms of delivery ratio, end-to-end delay, traffic overhead, and buffer occupancy.

Journal ArticleDOI
TL;DR: This paper focuses on predicting a driver's intent to brake in car-following scenarios from a perception–decision–action perspective according to his/her driving history, and believes that this learning-based inference method has great potential for real-world active safety systems.
Abstract: Accurately predicting and inferring a driver's decision to brake is critical for designing warning systems and avoiding collisions. In this paper, we focus on predicting a driver's intent to brake in car-following scenarios from a perception–decision–action perspective according to his/her driving history. A learning-based inference method, using onboard data from CAN-Bus, radar, and cameras as explanatory variables, is introduced to infer drivers’ braking decisions by combining a Gaussian mixture model (GMM) with a hidden Markov model (HMM). The GMM is used to model stochastic relationships among variables, while the HMM is applied to infer drivers’ braking actions based on the GMM. Real-case driving data from 49 drivers (more than three years’ driving data per driver on average) have been collected from the University of Michigan Safety Pilot Model Deployment database. We compare the GMM-HMM method to a support vector machine (SVM) method and a SVM-Bayesian filtering method. The experimental results are evaluated by employing three performance metrics: accuracy , sensitivity , and specificity . The comparison results show that the GMM–HMM obtains the best performance, with an accuracy of 90%, sensitivity of 84%, and specificity of 97%. Thus, we believe that this method has great potential for real-world active safety systems.

Journal ArticleDOI
TL;DR: The results show the potential application of the proposed model to track the change of students’ skills directly and provide immediate remediation as well as to evaluate the efficacy of different interventions by investigating how different types of learning interventions impact the transitions from nonmastery to mastery.
Abstract: A family of learning models that integrates a cognitive diagnostic model and a higher-order, hidden Markov model in one framework is proposed. This new framework includes covariates to model skill transition in the learning environment. A Bayesian formulation is adopted to estimate parameters from a learning model. The developed methods are applied to a computer-based assessment with a learning intervention. The results show the potential application of the proposed model to track the change of students’ skills directly and provide immediate remediation as well as to evaluate the efficacy of different interventions by investigating how different types of learning interventions impact the transitions from nonmastery to mastery.

Journal ArticleDOI
TL;DR: In this article, the authors present a methodology to estimate and predict the functional reliability of a system using system functional indicators and condition indicators of components in continuous time domain using Hidden Markov Model and Dynamic Bayesian Network.