scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2010"


Book
01 Dec 2010
TL;DR: This book is a comprehensive treatment of inference for hidden Markov models, including both algorithms and statistical theory, and builds on recent developments to present a self-contained view.
Abstract: This book is a comprehensive treatment of inference for hidden Markov models, including both algorithms and statistical theory. Topics range from filtering and smoothing of the hidden Markov chain to parameter estimation, Bayesian methods and estimation of the number of states. In a unified way the book covers both models with finite state spaces and models with continuous state spaces (also called state-space models) requiring approximate simulation-based algorithms that are also described in detail. Many examples illustrate the algorithms and theory. This book builds on recent developments to present a self-contained view.

1,537 citations


Journal ArticleDOI
TL;DR: A series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMer package, in an effort to reduce search time, significantly reduces the time needed to score a profile-HMM against large sequence databases.
Abstract: Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.

890 citations


Posted Content
TL;DR: This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.
Abstract: — Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patterns

846 citations


Journal ArticleDOI
TL;DR: An overview of HSMMs is presented, including modelling, inference, estimation, implementation and applications, which has been applied in thirty scientific and engineering areas, including speech recognition/synthesis, human activity recognition/prediction, handwriting recognition, functional MRI brain mapping, and network anomaly detection.

734 citations


Proceedings ArticleDOI
25 Oct 2010
TL;DR: This paper presents Torchvision an open source machine vision package for Torch that provides additional functionalities to manipulate and process images with standard image processing algorithms.
Abstract: This paper presents Torchvision an open source machine vision package for Torch. Torch is a machine learning library providing a series of the state-of-the-art algorithms such as Neural Networks, Support Vector Machines, Gaussian Mixture Models, Hidden Markov Models and many others. Torchvision provides additional functionalities to manipulate and process images with standard image processing algorithms. Hence, the resulting images can be used directly with the Torch machine learning algorithms as Torchvision is fully integrated with Torch. Both Torch and Torchvision are written in C++ language and are publicly available under the Free-BSD License.

341 citations


Journal ArticleDOI
TL;DR: depmixS4 as discussed by the authors is a general framework for defining and estimating dependent mixture models in the R programming language, including standard Markov models, latent/hidden Markov model, and latent class and finite mixture distribution models.
Abstract: depmixS4 implements a general framework for defining and estimating dependent mixture models in the R programming language. This includes standard Markov models, latent/hidden Markov models, and latent class and finite mixture distribution models. The models can be fitted on mixed multivariate data with distributions from the glm family, the (logistic) multinomial, or the multivariate normal distribution. Other distributions can be added easily, and an example is provided with the exgaus distribution. Parameters are estimated by the expectation-maximization (EM) algorithm or, when (linear) constraints are imposed on the parameters, by direct numerical optimization with the Rsolnp or Rdonlp2 routines.

312 citations


Journal ArticleDOI
TL;DR: The experimental validation shows the effectiveness of the proposed driver fatigue recognition model and indicates that the contact physiological features are significant factors for inferring the fatigue state of a driver.

288 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: Two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes by employing Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models.
Abstract: We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be learned. While the first method extracts rules based on a learned topic model, the second model called DDP-HMM jointly learns co-occurring activities and their time dependencies. To this end we employ Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models. In contrast to previous work, we build on state-of-the-art topic models that allow to automatically infer all parameters such as the optimal number of HMMs necessary to explain the rules governing a scene. The models are trained offline by Gibbs Sampling using unlabeled training data.

267 citations


Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper proposes an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers, and applies the model on both synthetic data and DBLP data sets to demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.
Abstract: Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called "information networks"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on both synthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.

260 citations


Journal ArticleDOI
TL;DR: A dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos is proposed.
Abstract: In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set.

254 citations


Journal ArticleDOI
TL;DR: The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities, and is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners.
Abstract: Motivation: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge. Results: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile–profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners. Availability: The source code of MSAProbs, written in C++, is freely and publicly available from http://msaprobs.sourceforge.net.

Proceedings Article
01 Dec 2010
TL;DR: It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer.
Abstract: Recently, deep learning techniques have been successfully applied to automatic speech recognition tasks -first to phonetic recognition with context-independent deep belief network (DBN) hidden Markov models (HMMs) and later to large vocabulary continuous speech recognition using context-dependent (CD) DBN-HMMs. In this paper, we report our most recent experiments designed to understand the roles of the two main phases of the DBN learning -pre-training and fine tuning -in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer. As expected, we show that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models. However, a moderate increase of the amount of unlabeled pre-training data has an insignificant effect on the final recognition results as long as the original training size is sufficiently large to initialize the DBN weights. On the other hand, with additional labeled training data, the fine-tuning phase of DBN training can significantly improve the recognition accuracy.

Proceedings Article
21 Jun 2010
TL;DR: This work proposes a nonparametric HMM that extends traditional HMMs to structured and non-Gaussian continuous distributions, and derives a local-minimum-free kernel spectral algorithm for learning these HMMs.
Abstract: Hidden Markov Models (HMMs) are important tools for modeling sequence data. However, they are restricted to discrete latent states, and are largely restricted to Gaussian and discrete observations. And, learning algorithms for HMMs have predominantly relied on local search heuristics, with the exception of spectral methods such as those described below. We propose a nonparametric HMM that extends traditional HMMs to structured and non-Gaussian continuous distributions. Furthermore, we derive a local-minimum-free kernel spectral algorithm for learning these HMMs. We apply our method to robot vision data, slot car inertial sensor data and audio event classification data, and show that in these applications, embedded HMMs exceed the previous state-of-the-art performance.

Journal ArticleDOI
01 Nov 2010
TL;DR: A novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images, based on loopy belief propagation (LBP).
Abstract: We propose a novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images. Like other recent works, we model wavelet structure using a hidden Markov tree (HMT) but, unlike other works, ours is based on loopy belief propagation (LBP). For LBP, we adopt a recently proposed “turbo” message passing schedule that alternates between exploitation of HMT structure and exploitation of compressive-measurement structure. For the latter, we leverage Donoho, Maleki, and Montanari's recently proposed approximate message passing (AMP) algorithm. Experiments with a large image database suggest that, relative to existing schemes, our turbo LBP approach yields state-of-the-art reconstruction performance with substantial reduction in complexity.

Journal ArticleDOI
01 Sep 2010
TL;DR: A wireless sensor network for unintrusive observations in the home is presented and the potential of generative and discriminative models for recognizing activities from such observations are shown.
Abstract: An activity monitoring system allows many applications to assist in care giving for elderly in their homes. In this paper we present a wireless sensor network for unintrusive observations in the home and show the potential of generative and discriminative models for recognizing activities from such observations. Through a large number of experiments using four real world datasets we show the effectiveness of the generative hidden Markov model and the discriminative conditional random fields in activity recognition.

Proceedings Article
01 Jan 2010
TL;DR: The discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints.
Abstract: Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints. Experimental results show that the data-driven constrained discriminative transforms are much more robust for unsupervised adaptation.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.
Abstract: We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: This work reports experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages.
Abstract: Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.

Journal ArticleDOI
TL;DR: Experimental results showed that using reinforcement learning based method with the vehicle dynamic parameters feature outperforms the rest algorithms, and adding the other two features could further improve the prediction accuracy.

Proceedings Article
01 Jan 2010
TL;DR: A context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues is applied, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database.
Abstract: In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long ShortTerm Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72 %, 65 %, and 55 % for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively.

Journal ArticleDOI
TL;DR: The proposed DBN-based hand gesture model and the design of a gesture network model are believed to have a strong potential for successful applications to other related problems such as sign language recognition although it is a bit more complicated requiring analysis of hand shapes.

Journal ArticleDOI
TL;DR: This work proposes extracting discriminative features for AED using a boosting approach, which outperform classical speech perceptual features, such as Mel-frequency Cepstral Coefficients and log frequency filterbank parameters, and leverages statistical models better fitting the task.

Journal ArticleDOI
TL;DR: This paper shows how sequential probabilistic models can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze).
Abstract: During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

Proceedings Article
06 Dec 2010
TL;DR: It is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology, and in a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built.
Abstract: Automatic speech recognition has gradually improved over the years, but the reliable recognition of unconstrained speech is still not within reach. In order to achieve a breakthrough, many research groups are now investigating new methodologies that have potential to outperform the Hidden Markov Model technology that is at the core of all present commercial systems. In this paper, it is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology. In a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built. The system already achieves a state-of-the-art performance, and there is evidence that the margin for further improvements is still significant.

Journal ArticleDOI
Arshia Cont1
TL;DR: This paper proposes a design for a real-time music-to-score alignment system that is capable of following the musician in real time within the score and decoding the tempo (or pace) of its performance.
Abstract: The capacity for real-time synchronization and coordination is a common ability among trained musicians performing a music score that presents an interesting challenge for machine intelligence. Compared to speech recognition, which has influenced many music information retrieval systems, music's temporal dynamics and complexity pose challenging problems to common approximations regarding time modeling of data streams. In this paper, we propose a design for a real-time music-to-score alignment system. Given a live recording of a musician playing a music score, the system is capable of following the musician in real time within the score and decoding the tempo (or pace) of its performance. The proposed design features two coupled audio and tempo agents within a unique probabilistic inference framework that adaptively updates its parameters based on the real-time context. Online decoding is achieved through the collaboration of the coupled agents in a Hidden Hybrid Markov/semi-Markov framework, where prediction feedback of one agent affects the behavior of the other. We perform evaluations for both real-time alignment and the proposed temporal model. An implementation of the presented system has been widely used in real concert situations worldwide and the readers are encouraged to access the actual system and experiment the results.

Journal ArticleDOI
TL;DR: This paper demonstrates how this difficult problem can be addressed through Hidden Markov models that are able to estimate unobservable health-states using observable sensor signals, and implementation of HMM based models as dynamic Bayesian networks (DBNs) facilitates compact representation as well as additional flexibility with regard to model structure.
Abstract: Failure mechanisms of electromechanical systems usually involve several degraded health-states. Tracking and forecasting the evolution of health-states and impending failures, in the form of remaining-useful-life (RUL), is a critical challenge and regarded as the Achilles' heel of condition-based-maintenance (CBM). This paper demonstrates how this difficult problem can be addressed through Hidden Markov models (HMMs) that are able to estimate unobservable health-states using observable sensor signals. In particular, implementation of HMM based models as dynamic Bayesian networks (DBNs) facilitates compact representation as well as additional flexibility with regard to model structure. Both regular HMM pools and hierarchical HMMs are employed here to estimate online the health-state of drill-bits as they deteriorate with use on a CNC drilling machine. Hierarchical HMM is composed of sub-HMMs in a pyramid structure, providing functionality beyond an HMM for modeling complex systems. In the case of regular HMMs, each HMM within the pool competes to represent a distinct health-state and adapts through competitive learning. In the case of hierarchical HMMs, health-states are represented as distinct nodes at the top of the hierarchy. Monte Carlo simulation, with state transition probabilities derived from a hierarchical HMM, is employed for RUL estimation. Detailed results on health-state and RUL estimation are very promising and are reported in this paper. Hierarchical HMMs seem to be particularly effective and efficient and outperform other HMM methods from literature.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: A gesture recognition system based primarily on a single 3-axis accelerometer that achieves almost perfect user-dependent recognition and a user-independent recognition accuracy that is competitive with the statistical methods that require significantly a large number of training samples and with the other accelerometer-based gesture recognition systems available in literature.
Abstract: We propose a gesture recognition system based primarily on a single 3-axis accelerometer. The system employs dynamic time warping and affinity propagation algorithms for training and utilizes the sparse nature of the gesture sequence by implementing compressive sensing for gesture recognition. A dictionary of 18 gestures is defined and a database of over 3,700 repetitions is created from 7 users. Our dictionary of gestures is the largest in published studies related to acceleration-based gesture recognition, to the best of our knowledge. The proposed system achieves almost perfect user-dependent recognition and a user-independent recognition accuracy that is competitive with the statistical methods that require significantly a large number of training samples and with the other accelerometer-based gesture recognition systems available in literature.

Journal ArticleDOI
TL;DR: The investigation suggests that ApEn and Kc can effectively describe the dynamic complexity of EEG, which is strongly correlated with mental fatigue.

Journal ArticleDOI
TL;DR: This paper compares the choice of conjugate and non-conjugate base distributions on a particular class of DPM models, the Dirichlet process Gaussian mixture model (DPGMM), and shows that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.
Abstract: In the Bayesian mixture modeling framework it is possible to infer the necessary number of components to model the data and therefore it is unnecessary to explicitly restrict the number of components. Nonparametric mixture models sidestep the problem of finding the "correct" number of mixture components by assuming infinitely many components. In this paper Dirichlet process mixture (DPM) models are cast as infinite mixture models and inference using Markov chain Monte Carlo is described. The specification of the priors on the model parameters is often guided by mathematical and practical convenience. The primary goal of this paper is to compare the choice of conjugate and non-conjugate base distributions on a particular class of DPM models which is widely used in applications, the Dirichlet process Gaussian mixture model (DPGMM). We compare computational efficiency and modeling performance of DPGMM defined using a conjugate and a conditionally conjugate base distribution. We show that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.

Journal ArticleDOI
TL;DR: In this article, a Markov-switching GARCH model (MS-GARCH) is proposed, where the conditional mean and variance switch in time from one GARCH process to another.
Abstract: We develop a Markov-switching GARCH model (MS-GARCH) wherein the conditional mean and variance switch in time from one GARCH process to another. The switching is governed by a hidden Markov chain. We provide su‐cient conditions for geometric ergodicity and existence of moments of the process. Because of path dependence, maximum likelihood estimation is not feasible. By enlarging the parameter space to include the state variables, Bayesian estimation using a Gibbs sampling algorithm is feasible. We illustrate the model on SP500 daily returns.