scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2000"


Journal ArticleDOI
TL;DR: In this article, Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model are presented, and two new classes of methods are presented. But neither of these methods is suitable for handling general models with non-conjugate priors.
Abstract: This article reviews Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to make Metropolis—Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling. The other new approach extends Gibbs sampling for these indicators by using a set of auxiliary parameters. These methods are simple to implement and are more efficient than previous ways of handling general Dirichlet process mixture models with non-conjugate priors.

2,320 citations


Proceedings Article
01 Jan 2000
TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Abstract: This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used for the evaluation of front-end feature extraction algorithms using a defined HMM recognition back-end or complete recognition systems. The source speech for this database is the TIdigits, consisting of connected digits task spoken by American English talkers (downsampled to 8kHz) . A selection of 8 different real-world noises have been added to the speech over a range of signal to noise ratios and special care has been taken to control the filtering of both the speech and noise. The framework was prepared as a contribution to the ETSI STQ-AURORA DSR Working Group [1]. Aurora is developing standards for Distributed Speech Recognition (DSR) where the speech analysis is done in the telecommunication terminal and the recognition at a central location in the telecom network. The framework is currently being used to evaluate alternative proposals for front-end feature extraction. The database has been made publicly available through ELRA so that other speech researchers can evaluate and compare the performance of noise robust algorithms. Recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.

1,909 citations


Proceedings Article
29 Jun 2000
TL;DR: A new Markovian sequence model is presented that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences.
Abstract: Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial distributions over a discrete vocabulary, and the HMM parameters are set to maximize the likelihood of the observations. This paper presents a new Markovian sequence model, closely related to HMMs, that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We present positive experimental results on the segmentation of FAQ’s.

1,522 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.
Abstract: This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where the state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.

1,071 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.
Abstract: Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. By training the network to generate the subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, we achieve relative error rate reductions of 35% or more on the multicondition Aurora noisy continuous digits task.

803 citations


Journal ArticleDOI
TL;DR: A speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments and is demonstrated on a large multispeaker database of continuously spoken digits.
Abstract: This paper describes a speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments. The system consists of three components: a visual module; an acoustic module; and a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This task is performed with an appearance-based lip model that is learned from example images. Visual speech features are represented by contour information of the lips and grey-level information of the mouth area. The acoustic module extracts noise-robust features from the audio signal. Finally the sensor fusion module is responsible for the joint temporal modeling of the acoustic and visual feature streams and is realized using multistream hidden Markov models (HMMs). The multistream method allows the definition of different temporal topologies and levels of stream integration and hence enables the modeling of temporal dependencies more accurately than traditional approaches. We present two different methods to learn the asynchrony between the two modalities and how to incorporate them in the multistream models. The superior performance for the proposed system is demonstrated on a large multispeaker database of continuously spoken digits. On a recognition task at 15 dB acoustic signal-to-noise ratio (SNR), acoustic perceptual linear prediction (PLP) features lead to 56% error rate, noise robust RASTA-PLP (relative spectra) acoustic features to 7.2% error rate and combined noise robust acoustic features and visual features to 2.5% error rate.

620 citations


Journal ArticleDOI
TL;DR: An algorithm, referred to as spatio-temporal Markov random field, for traffic images at intersections, that models a tracking problem by determining the state of each pixel in an image and its transit, and how such states transit along both the x-y image axes as well as the time axes.
Abstract: We have developed an algorithm, referred to as spatio-temporal Markov random field, for traffic images at intersections. This algorithm models a tracking problem by determining the state of each pixel in an image and its transit, and how such states transit along both the x-y image axes as well as the time axes. Our algorithm is sufficiently robust to segment and track occluded vehicles at a high success rate of 93%-96%. This success has led to the development of an extendable robust event recognition system based on the hidden Markov model (HMM). The system learns various event behavior patterns of each vehicle in the HMM chains and then, using the output from the tracking system, identifies current event chains. The current system can recognize bumping, passing, and jamming. However, by including other event patterns in the training set, the system can be extended to recognize those other events, e.g., illegal U-turns or reckless driving. We have implemented this system, evaluated it using the tracking results, and demonstrated its effectiveness.

545 citations


Journal ArticleDOI
TL;DR: A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily using a new kernel function derived from a generative statistical model for a protein family, in this case a hidden Markov model.
Abstract: A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a protein family, in this case a hidden Markov model. This general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

530 citations


Journal ArticleDOI
TL;DR: A new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes is introduced and the results suggest that variational approximations are a viable method for inference and learning in switching state-space models.
Abstract: We introduce a new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series models— hidden Markov models and linear dynamical systems—and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs, Jordan, Nowlan, & Hinton, 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact expectation maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log-likelihood and makes use of both the forward and backward recursions for hidden Markov models and the Kalman filter recursions for linear dynamical systems. We tested the algorithm on artificial data sets and a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching state-space models.

478 citations


Journal ArticleDOI
TL;DR: This work combines prosodic cues with word-based approaches, and evaluates performance on two speech corpora, Broadcast News and Switchboard, finding that the prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events.

464 citations


Proceedings Article
01 Jan 2000
TL;DR: A new variational inference algorithm is obtained by casting the SLDS model as a Dynamic Bayesian Network, and classification experiments show the superiority of SLDS over conventional HMM's for the problem domain.
Abstract: The human figure exhibits complex and rich dynamic behavior that is both nonlinear and time-varying. Effective models of human dynamics can be learned from motion capture data using switching linear dynamic system (SLDS) models. We present results for human motion synthesis, classification, and visual tracking using learned SLDS models. Since exact inference in SLDS is intractable, we present three approximate inference algorithms and compare their performance. In particular, a new variational inference algorithm is obtained by casting the SLDS model as a Dynamic Bayesian Network. Classification experiments show the superiority of SLDS over conventional HMM's for our problem domain.

Journal ArticleDOI
TL;DR: In this article, Hidden Markov Models (HMMs) are used to organize observed activity into meaningful states by minimizing the entropy of the joint distribution of the HMMs' internal state machine.
Abstract: Hidden Markov models (HMMs) have become the workhorses of the monitoring and event recognition literature because they bring to time-series analysis the utility of density estimation and the convenience of dynamic time warping. Once trained, the internals of these models are considered opaque; there is no effort to interpret the hidden states. We show that by minimizing the entropy of the joint distribution, an HMM's internal state machine can be made to organize observed activity into meaningful states. This has uses in video monitoring and annotation, low bit-rate coding of scene activity, and detection of anomalous behavior. We demonstrate with models of office activity and outdoor traffic, showing how the framework learns principal modes of activity and patterns of activity change. We then show how this framework can be adapted to infer hidden state from extremely ambiguous images, in particular, inferring 3D body orientation and pose from sequences of low-resolution silhouettes.

Proceedings Article
01 Oct 2000
TL;DR: This paper proposes a method based on truncated vector Taylor series that approximates the performance of a system trained with that corrupted speech and compares them with the lognormal approximation in PMC.
Abstract: In this paper we address the problem of robustness of speech recognition systems in noisy environments The goal is to estimate the parameters of a HMM that is matched to a noisy environment, given a HMM trained with clean speech and knowledge of the acoustical environment We propose a method based on truncated vector Taylor series that approximates the performance of a system trained with that corrupted speech We also provide insight on the approximations used in the model of the environment and compare them with the lognormal approximation in PMC

Journal ArticleDOI
TL;DR: A hidden Markov model for general protein sequence based on the I-sites library of sequence-structure motifs, HMMSTR, which attributes a considerably higher probability to coding sequence than does an equivalent dipeptide model, predicts secondary structure better than any previously reported method and the structural context of beta strands and turns with an accuracy that should be useful for tertiary structure prediction.

Proceedings ArticleDOI
TL;DR: Analysis of HMM-based steering behavior models developed using a moving base driving simulator showed that driver behavior modeling and recognition of different types of lane changes is possible using HMMs.
Abstract: A method for detecting drivers’ intentions is essential to facilitate operating mode transitions between driver and driver assistance systems. We propose a driver behavior recognition method using Hidden Markov Models (HMMs) to characterize and detect driving maneuvers and place it in the framework of a cognitive model of human behavior. HMM-based steering behavior models for emergency and normal lane changes as well as for lane keeping were developed using a moving base driving simulator. Analysis of these models after training and recognition tests showed that driver behavior modeling and recognition of different types of lane changes is possible using HMMs.

Proceedings Article
30 Jul 2000
TL;DR: This paper demonstrates that extraction accuracy strongly depends on the selection of structure, and presents an algorithm for automatically finding good structures by stochastic optimization, which finds HMM models that almost always out-perform a fixed model, and have superior average performance across tasks.
Abstract: Recent research has demonstrated the strong performance of hidden Markov models applied to information extraction—the task of populating database slots with corresponding phrases from text documents. A remaining problem, however, is the selection of state-transition structure for the model. This paper demonstrates that extraction accuracy strongly depends on the selection of structure, and presents an algorithm for automatically finding good structures by stochastic optimization. Our algorithm begins with a simple model and then performs hill-climbing in the space of possible structures by splitting states and gauging performance on a validation set. Experimental results show that this technique finds HMM models that almost always out-perform a fixed model, and have superior average performance across tasks.

Proceedings Article
Nong Ye1
01 Jan 2000
TL;DR: The technique was implemented and tested on the audit data of a Sun Solaris system and showed that the technique clearly distinguished intrusive activities from normal activities in the testing data.
Abstract: This paper presents an anomaly detection technique to detect intrusions into computer and network systems. In this technique, a Markov chain model is used to represent a temporal profile of normal behavior in a computer and network system. The Markov chain model of the norm profile is learned from historic data of the system’s normal behavior. The observed behavior of the system is analyzed to infer the probability that the Markov chain model of the norm profile supports the observed behavior. A low probability of support indicates an anomalous behavior that may result from intrusive activities. The technique was implemented and tested on the audit data of a Sun Solaris system. The testing results showed that the technique clearly distinguished intrusive activities from normal activities in the testing data.

Journal ArticleDOI
TL;DR: An algorithm is proposed that models images by two dimensional (2-D) hidden Markov models (HMMs) that outperforms CART/sup TM/, LVQ, and Bayes VQ in classification by context.
Abstract: For block-based classification, an image is divided into blocks, and a feature vector is formed for each block by grouping statistics extracted from the block. Conventional block-based classification algorithms decide the class of a block by examining only the feature vector of this block and ignoring context information. In order to improve classification by context, an algorithm is proposed that models images by two dimensional (2-D) hidden Markov models (HMMs). The HMM considers feature vectors statistically dependent through an underlying state process assumed to be a Markov mesh, which has transition probabilities conditioned on the states of neighboring blocks from both horizontal and vertical directions. Thus, the dependency in two dimensions is reflected simultaneously. The HMM parameters are estimated by the EM algorithm. To classify an image, the classes with maximum a posteriori probability are searched jointly for all the blocks. Applications of the HMM algorithm to document and aerial image segmentation show that the algorithm outperforms CART/sup TM/, LVQ, and Bayes VQ.

Journal ArticleDOI
M.J.F. Gales1
TL;DR: This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT), which may be viewed as a simple extension to speaker clustering, a linear interpolation of all the cluster means is used as the mean of the particular speaker.
Abstract: When performing speaker adaptation, there are two conflicting requirements. First, the speaker transform must be powerful enough to represent the speaker. Second, the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or the acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speaker-independent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speaker-independent model set.

Journal ArticleDOI
Koichi Shinoda1, Takao Watanabe1
TL;DR: A method in which state clustering is accomplished by way of phonetic decision trees and in which the minimum description length (MDL)criterion is used to optimize the number of clusters is proposed.
Abstract: Context-dependent phone units, such as triphones, have recently come to be used to model subword units in speech recognition systems that are based on the use of hidden Markov models(HMMs).While most such systems employ clustering of the HMM parameters(e.g., subword clustering and state clustering)to control the HMM size, so as to avoid poor recognition accuracy due to a lack of training data, none of them provide any effective criteria for determining the optimal number of clusters.This paper proposes a method in which state clustering is accomplished by way of phonetic decision trees and in which the minimum description length(MDL)criterion is used to optimize the number of clusters.Large-vocabulary Japanese-language recognition experiments show that this method achieves higher accuracy than the maximum-likelihood approach.

Journal ArticleDOI
TL;DR: A Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences that strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability.
Abstract: We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Because phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important byproduct of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.

Proceedings ArticleDOI
31 Jul 2000
TL;DR: A study into the use of a linear interpolating hidden Markov model (HMM) for the task of extracting technical terminology from MEDLINE abstracts and texts in the molecular-biology domain, the first stage in a system that will extract event information for automatically updating biology databases.
Abstract: We report the results of a study into the use of a linear interpolating hidden Markov model (HMM) for the task of extracting technical terminology from MEDLINE abstracts and texts in the molecular-biology domain. This is the first stage in a system that will extract event information for automatically updating biology databases. We trained the HMM entirely with bigrams based on lexical and character features in a relatively small corpus of 100 MEDLINE abstracts that were marked-up by domain experts with term classes such as proteins and DNA. Using cross-validation methods we achieved an F-score of 0.73 and we examine the contribution made by each part of the interpolation model to overcoming data sparseness.

01 Jan 2000
TL;DR: This chapter contains sections titled: Introduction: Linear Methods using Kernel function, Applying Linear Methods to Structured Objects, Conditional Symmetric Independence Kernels, Pair Hidden Markov Models, Conditionally Symmetrically Independent PHMMs, Conclusion.
Abstract: This chapter contains sections titled: Introduction: Linear Methods using Kernel function, Applying Linear Methods to Structured Objects, Conditional Symmetric Independence Kernels, Pair Hidden Markov Models, Conditionally Symmetrically Independent PHMMs, Conclusion


Journal ArticleDOI
TL;DR: Comparisons with Baum's reestimation suggest that the quasi-Newton method has a superior convergence speed when the likelihood surface is poorly defined due to, for example, a low signal-to-noise ratio or the aggregation of multiple states having identical conductances.

Journal ArticleDOI
TL;DR: A novel, simple characterization of linearly dependent processes, called observable operator models, is provided, which leads to a constructive learning algorithm for the identification of linially dependent processes.
Abstract: A widely used class of models for stochastic systems is hidden Markov models. Systems that can be modeled by hidden Markov models are a proper subclass of linearly dependent processes, a class of s...

01 Jan 2000
TL;DR: In this paper, a new probabilistic background model based on a Hidden Markov Model is presented, which enables discrimination between foreground, background and shadow, using a low level process for a car tracker.
Abstract: A new probabilistic background model based on a Hidden Markov Model is presented. The hidden states of the model enable discrimination between foreground, background and shadow. This model functions as a low level process for a car tracker. A particle filter is employed as a stochastic filter for the car tracker. The use of a particle filter allows the incorporation of the information from the low level process via importance sampling. A novel observation density for the particle filter which models the statistical dependence of neighboring pixels based on a Markov random field is presented. The effectiveness of both the low level process and the observation likelihood are demonstrated.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a nonhomogeneous hidden Markov model (NHMMs) to simulate precipitation at a network of 24 rain gauge stations in Washington state over the course of 17 winters.
Abstract: Nonhomogeneous hidden Markov models (NHMMs) provide a relatively simple frame- work for simulating precipitation at multiple rain gauge stations conditional on synoptic atmospheric patterns. Building on existing NHMMs for precipitation occurrences, we propose an extension to in- clude precipitation amounts. The model we describe assumes the existence of unobserved (or hidden) weather patterns, the weather states, which follow a Markov chain. The weather states depend on observable synoptic information and therefore serve as a link between the synoptic-scale atmospheric patterns and the local-scale precipitation. The presence of the hidden states simplifies the spatio-tem- poral structure of the precipitation process. We assume the temporal dependence of precipitation is completely accounted for by the Markov evolution of the weather state. The spatial dependence of pre- cipitation can also be partially or completely accounted for by the existence of a common weather state. In the proposed model, occurrences are assumed to be conditionally spatially independent given the current weather state and, conditional on occurrences, precipitation amounts are modeled indepen- dently at each rain gauge as gamma deviates with gauge-specific parameters. We apply these methods to model precipitation at a network of 24 rain gauge stations in Washington state over the course of 17 winters. The first 12 yr are used for model fitting purposes, while the last 5 serve to evaluate the model performance. The analysis of the model results for the reserved years suggests that the characteristics of the data are captured fairly well and points to possible directions for future improvements.

Proceedings ArticleDOI
01 Aug 2000
TL;DR: A novel and flexible approach is proposed based on segmental semiMarkov models that provides a systematic and coherent framework for leveraging both prior knowledge and training data for automatically detecting specific patterns or shapes in time-series data.
Abstract: This paper addresses the problem of automatically detecting specific patterns or shapes in time-series data. A novel and flexible approach is proposed based on segmental semiMarkov models. Unlike dynamic time-warping or templatematching, the proposed framework provides a systematic and coherent framework for leveraging both prior knowledge and training data. The pattern of interest is modeled as a K-state segmental hidden Markov model where each state is responsible for the generation of a component of the overall shape using a state-based regression function. The distance (in time) between segments is modeled as a semiMarkov process, allowing flexible deformation of time. The model can be constructed from a single training example. Recognition of a pattern in a new time series is achieved by a recursive Viterbi-like algorithm which scales linearly in the length of the sequence. The method is successfully demonstrated on real data sets, including an application to end-point detection in semiconductor manufacturing.

Proceedings ArticleDOI
26 Mar 2000
TL;DR: The findings show that the audio and video information can be combined using a rule-based system to improve the recognition rate.
Abstract: This paper describes the use of statistical techniques and hidden Markov models (HMM) in the recognition of emotions. The method aims to classify 6 basic emotions (anger, dislike, fear, happiness, sadness and surprise) from both facial expressions (video) and emotional speech (audio). The emotions of 2 human subjects were recorded and analyzed. The findings show that the audio and video information can be combined using a rule-based system to improve the recognition rate.