scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2007"


Journal ArticleDOI
01 May 2007
TL;DR: A survey on gesture recognition with particular emphasis on hand gestures and facial expressions is provided, and applications involving hidden Markov models, particle filtering and condensation, finite-state machines, optical flow, skin color, and connectionist models are discussed in detail.
Abstract: Gesture recognition pertains to recognizing meaningful expressions of motion by a human, involving the hands, arms, face, head, and/or body. It is of utmost importance in designing an intelligent and efficient human-computer interface. The applications of gesture recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. In this paper, we provide a survey on gesture recognition with particular emphasis on hand gestures and facial expressions. Applications involving hidden Markov models, particle filtering and condensation, finite-state machines, optical flow, skin color, and connectionist models are discussed in detail. Existing challenges and future research possibilities are also highlighted

1,797 citations


Journal ArticleDOI
TL;DR: A hidden Markov model, Phobius, is designed that combines transmembrane topology and signal peptide predictions, and also allows constrained and homology-enriched predictions.
Abstract: When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 30–65% of all predicted signal peptides and 25–35% of all predicted transmembrane topologies overlap. This impairs predictions of 5–10% of the proteome, hence this is an important issue in protein annotation. To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homologyenriched predictions. We here present a web interface (http:// phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius.

1,410 citations


Reference EntryDOI
TL;DR: In this paper, the concept of hidden Markov models in computational biology is introduced and described using simple biological examples, requiring as little mathematical knowledge as possible, and an overview of their current applications are presented.
Abstract: This unit introduces the concept of hidden Markov models in computational biology. It describes them using simple biological examples, requiring as little mathematical knowledge as possible. The unit also presents a brief history of hidden Markov models and an overview of their current applications before concluding with a discussion of their limitations.

1,305 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: A real-time liveness detection approach against photograph spoofing in face recognition, by recognizing spontaneous eyeblinks, which is a non-intrusive manner, which outperforms the cascaded Adaboost and HMM in task of eyeblink detection.
Abstract: We present a real-time liveness detection approach against photograph spoofing in face recognition, by recognizing spontaneous eyeblinks, which is a non-intrusive manner. The approach requires no extra hardware except for a generic webcamera. Eyeblink sequences often have a complex underlying structure. We formulate blink detection as inference in an undirected conditional graphical framework, and are able to learn a compact and efficient observation and transition potentials from data. For purpose of quick and accurate recognition of the blink behavior, eye closity, an easily-computed discriminative measure derived from the adaptive boosting algorithm, is developed, and then smoothly embedded into the conditional model. An extensive set of experiments are presented to show effectiveness of our approach and how it outperforms the cascaded Adaboost and HMM in task of eyeblink detection.

611 citations


Journal ArticleDOI
TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
Abstract: The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech recognition problems. However, few if any studies have reported results for the application of MCE training to large-vocabulary, continuous-speech recognition tasks. This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary (up to 100 k word) speech recognition tasks: the Corpus of Spontaneous Japanese lecture speech transcription task, a telephone-based name recognition task, and the MIT Jupiter telephone-based conversational weather information task. On these tasks, starting from maximum likelihood (ML) baselines, MCE training yielded relative reductions in word error ranging from 7% to 20%. Furthermore, this paper evaluates the use of different methods for optimizing the MCE criterion function, as well as the use of precomputed recognition lattices to speed up training. An overview of the MCE framework is given, with an emphasis on practical implementation issues

581 citations


Journal ArticleDOI
TL;DR: A discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations and a hidden-state conditional random field framework learns a set of latent variables conditioned on local features.
Abstract: We present a discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations. A hidden-state conditional random field framework learns a set of latent variables conditioned on local features. Observations need not be independent and may overlap in space and time.

578 citations


01 Jan 2007
TL;DR: This paper describes HTS version 2.0 in detail, as well as future release plans, which include a number of new features which are useful for both speech synthesis researchers and developers.
Abstract: A statistical parametric speech synthesis system based on hidden Markov models (HMMs) has grown in popularity over the last few years. This system simultaneously models spectrum, excitation, and duration of speech using context-dependent HMMs and generates speech waveforms from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named HMM-based speech synthesis system (HTS) to provide a research and development platform for the speech synthesis community. In December 2006, HTS version 2.0 was released. This version includes a number of new features which are useful for both speech synthesis researchers and developers. This paper describes HTS version 2.0 in detail, as well as future release plans.

546 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: A new framework is proposed where actions are model actions using three dimensional occupancy grids, built from multiple viewpoints, in an exemplar-based HMM, where a 3D reconstruction is not required during the recognition phase, instead learned 3D exemplars are used to produce 2D image information that is compared to the observations.
Abstract: In this paper, we address the problem of learning compact, view-independent, realistic 3D models of human actions recorded with multiple cameras, for the purpose of recognizing those same actions from a single or few cameras, without prior knowledge about the relative orientations between the cameras and the subjects. To this aim, we propose a new framework where we model actions using three dimensional occupancy grids, built from multiple viewpoints, in an exemplar-based HMM. The novelty is, that a 3D reconstruction is not required during the recognition phase, instead learned 3D exemplars are used to produce 2D image information that is compared to the observations. Parameters that describe image projections are added as latent variables in the recognition process. In addition, the temporal Markov dependency applied to view parameters allows them to evolve during recognition as with a smoothly moving camera. The effectiveness of the framework is demonstrated with experiments on real datasets and with challenging recognition scenarios.

509 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a parameter generation algorithm for an HMM-based speech synthesis technique. But the generated trajectory is often excessively smoothed due to the statistical processing. And the over-smoothing effect usually causes muffled sounds.
Abstract: This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.

469 citations


Proceedings ArticleDOI
01 Apr 2007
TL;DR: It is found that the CDP-based detector and the HMM-based classifier can detect and classify incoming signals at a range of low SNRs.
Abstract: Spectrum awareness is currently one of the most challenging problems in cognitive radio (CR) design. Detection and classification of very low SNR signals with relaxed information on the signal parameters being detected is critical for proper CR functionality as it enables the CR to react and adapt to the changes in its radio environment. In this work, the cycle frequency domain profile (CDP) is used for signal detection and preprocessing for signal classification. Signal features are extracted from CDP using a threshold-test method. For classification, a Hidden Markov Model (HMM) has been used to process extracted signal features due to its robust pattern-matching capability. We also investigate the effects of varied observation length on signal detection and classification. It is found that the CDP-based detector and the HMM-based classifier can detect and classify incoming signals at a range of low SNRs.

432 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics and incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels.
Abstract: Many problems in vision involve the prediction of a class label for each frame in an unsegmented sequence. In this paper, we develop a discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics. Our approach incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels. Each class label has a disjoint set of associated hidden states, which enables efficient training and inference in our model. We evaluated our method on the task of recognizing human gestures from unsegmented video streams and performed experiments on three different datasets of head and eye gestures. Our results demonstrate that our model compares favorably to Support Vector Machines, Hidden Markov Models, and Conditional Random Fields on visual gesture recognition tasks.

Proceedings ArticleDOI
14 May 2007
TL;DR: It is found that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM, and it is confirmed that CRFs are robust against any degradation in performance.
Abstract: Activity recognition is a key component for creating intelligent, multi-agent systems. Intrinsically, activity recognition is a temporal classification problem. In this paper, we compare two models for temporal classification: hidden Markov models (HMMs), which have long been applied to the activity recognition problem, and conditional random fields (CRFs). CRFs are discriminative models for labeling sequences. They condition on the entire observation sequence, which avoids the need for independence assumptions between observations. Conditioning on the observations vastly expands the set of features that can be incorporated into the model without violating its assumptions. Using data from a simulated robot tag domain, chosen because it is multi-agent and produces complex interactions between observations, we explore the differences in performance between the discriminatively trained CRF and the generative HMM. Additionally, we examine the effect of incorporating features which violate independence assumptions between observations; such features are typically necessary for high classification accuracy. We find that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM. In cases where features depend on observations from many time steps, we confirm that CRFs are robust against any degradation in performance.

Proceedings Article
01 Jun 2007
TL;DR: This model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE.
Abstract: Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.

Journal ArticleDOI
TL;DR: A fusion model by combining the Hidden Markov Model (HMM), Artificial Neural Networks (ANN) and Genetic Algorithms (GA) to forecast financial market behaviour is proposed and implemented.
Abstract: In this paper we propose and implement a fusion model by combining the Hidden Markov Model (HMM), Artificial Neural Networks (ANN) and Genetic Algorithms (GA) to forecast financial market behaviour. The developed tool can be used for in depth analysis of the stock market. Using ANN, the daily stock prices are transformed to independent sets of values that become input to HMM. We draw on GA to optimize the initial parameters of HMM. The trained HMM is used to identify and locate similar patterns in the historical data. The price differences between the matched days and the respective next day are calculated. Finally, a weighted average of the price differences of similar patterns is obtained to prepare a forecast for the required next day. Forecasts are obtained for a number of securities in the IT sector and are compared with a conventional forecast method.

Journal ArticleDOI
TL;DR: A function-based approach to on-line signature verification using a set of time sequences and Hidden Markov Models (HMMs) is presented and is compared to other state-of-the-art systems based on the results of the SVC 2004.

Journal ArticleDOI
TL;DR: A statistical modelling methodology for performing both diagnosis and prognosis in a unified framework based on segmental hidden semi-Markov models (HSMMs), which can be used to predict the useful remaining life of a system.

Proceedings Article
11 Mar 2007
TL;DR: This paper proposes modeling the topics of words in the document as a Markov chain, and shows that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics.
Abstract: Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity.

Book ChapterDOI
09 Sep 2007
TL;DR: A discriminative keyword spotting system based on recurrent neural networks only, that uses information from long time spans to estimate word-level posterior probabilities of sub-word units, is presented.
Abstract: The goal of keyword spotting is to detect the presence of specific spoken words in unconstrained speech. The majority of keyword spotting systems are based on generative hidden Markov models and lack discriminative capabilities. However, discriminative keyword spotting systems are currently based on frame-level posterior probabilities of sub-word units. This paper presents a discriminative keyword spotting system based on recurrent neural networks only, that uses information from long time spans to estimate word-level posterior probabilities. In a keyword spotting task on a large database of unconstrained speech the system achieved a keyword spotting accuracy of 84.5%

Journal ArticleDOI
TL;DR: This paper presents novel classification algorithms for recognizing object activity using object motion trajectory, and uses hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology.
Abstract: Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed HMM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature.

Proceedings Article
03 Dec 2007
TL;DR: A system capable of directly transcribing raw online handwriting data is described, consisting of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model.
Abstract: In online handwriting recognition the trajectory of the pen is recorded during writing. Although the trajectory provides a compact and complete representation of the written output, it is hard to transcribe directly, because each letter is spread over many pen locations. Most recognition systems therefore employ sophisticated preprocessing techniques to put the inputs into a more localised form. However these techniques require considerable human effort, and are specific to particular languages and alphabets. This paper describes a system capable of directly transcribing raw online handwriting data. The system consists of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained online database, we record excellent results using either raw or preprocessed data, well outperforming a state-of-the-art HMM based system in both cases.

Journal ArticleDOI
TL;DR: A discriminative model for polyphonic piano transcription is presented and a frame-level transcription accuracy of 68% was achieved on a newly generated test set, and direct comparisons to previous approaches are provided.
Abstract: We present a discriminative model for polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances. The classifier outputs are temporally constrained via hidden Markov models, and the proposed system is used to transcribe both synthesized and real piano recordings. A frame-level transcription accuracy of 68% was achieved on a newly generated test set, and direct comparisons to previous approaches are provided.

Journal ArticleDOI
TL;DR: Experimental results reveal that the first proposed combination of VQ and DTW (by means of score fusion) outperforms the other algorithms and achieves a minimum detection cost function (DCF) value equal to 1.37% for random forgeries and 5.42% for skilled forgeries.

Proceedings ArticleDOI
01 Jan 2007

Journal ArticleDOI
TL;DR: An integrated platform for multi-sensor equipment diagnosis and prognosis based on hidden semi-Markov model (HSMM), which shows that the increase of correct diagnostic rate is indeed very promising and the equipment prognosis can be implemented in the same integrated framework.

Posted Content
01 Jan 2007
TL;DR: This research constructs and estimates a nonhomogeneous hidden Markov model to model the transitions among latent relationship states and effects on buying behavior, and uses a hierarchical Bayes approach to capture the unobserved heterogeneity across customers.
Abstract: This research models the dynamics of customer relationships using typical transaction data. It permits the evaluation of the effectiveness of customer-brand encounters on the dynamics of customer relationships and the subsequent buying behavior. Our approach to modeling relationship dynamics is structurally different from existing approaches. In the proposed model, customer-brand encounters may have an enduring impact by shifting the customer to a different (unobservable) relationship state. We constructed and estimated a hidden Markov model (HMM) to model the transitions among latent relationship states and effects on buying behavior. This model enables to dynamically segment the firm's customer base, and to examine methods by which the firm can alter the long-term buying behavior. We use a hierarchical Bayes approach to capture the unobserved heterogeneity across customers. We calibrate the model in the context of alumni relations using a longitudinal gift-giving dataset. Using the proposed model, we are able to probabilistically classify the alumni base into three relationship states, and estimate the marginal impact of alumni-university interactions on moving the alumni between these states. The application of the model for marketing decisions is illustrated using a "what-if" analysis of a reunion marketing campaign. Additionally, we demonstrate improved prediction ability on a validation sample.

Journal ArticleDOI
TL;DR: The technical details, building processes, and performance of the basic HMM-based speech synthesis system, and new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM- based acoustic modeling, and a speech parameter generation algorithm considering GV are described.
Abstract: In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3 ×RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.

Journal ArticleDOI
TL;DR: Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized Speech Synthesis, which can be viewed as an HMM with explicit state duration PDFs.
Abstract: A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.

Proceedings Article
11 Mar 2007
TL;DR: A new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems are described, and their performance is demonstrated using synthetic video sequences of two balls bouncing in a box.
Abstract: We describe a new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems. Our models have simple approximate inference and learning procedures that work well in practice. Multilevel representations of sequential data can be learned one hidden layer at a time, and adding extra hidden layers improves the resulting generative models. The models can be trained with very high-dimensional, very non-linear data such as raw pixel sequences. Their performance is demonstrated using synthetic video sequences of two balls bouncing in a box.

Proceedings Article
01 Apr 2007
TL;DR: This work presents a novel technique of training with many-to-many alignments of letters and phonemes, and applies an HMM method in conjunction with a local classification model to predict a global phoneme sequence given a word.
Abstract: Letter-to-phoneme conversion generally requires aligned training data of letters and phonemes. Typically, the alignments are limited to one-to-one alignments. We present a novel technique of training with many-to-many alignments. A letter chunking bigram prediction manages double letters and double phonemes automatically as opposed to preprocessing with fixed lists. We also apply an HMM method in conjunction with a local classification model to predict a global phoneme sequence given a word. The many-to-many alignments result in significant improvements over the traditional one-to-one approach. Our system achieves state-of-the-art performance on several languages and data sets.

Journal ArticleDOI
TL;DR: A new class of models, mixed HMMs (MHMMs), where both covariates and random effects are used to capture differences among processes, are presented, and it is shown that the model can describe the heterogeneity among patients.
Abstract: Hidden Markov models (HMMs) are a useful tool for capturing the behavior of overdispersed, autocorrelated data. These models have been applied to many different problems, including speech recognition, precipitation modeling, and gene finding and profiling. Typically, HMMs are applied to individual stochastic processes; HMMs for simultaneously modeling multiple processes—as in the longitudinal data setting—have not been widely studied. In this article I present a new class of models, mixed HMMs (MHMMs), where I use both covariates and random effects to capture differences among processes. I define the models using the framework of generalized linear mixed models and discuss their interpretation. I then provide algorithms for parameter estimation and illustrate the properties of the estimators via a simulation study. Finally, to demonstrate the practical uses of MHMMs, I provide an application to data on lesion counts in multiple sclerosis patients. I show that my model, while parsimonious, can describe the...