scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2008"


Journal ArticleDOI
01 Feb 2008
TL;DR: This work marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings, which results in a nonparametric model for dynamical systems that accounts for uncertainty in the model.
Abstract: We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a low-dimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.

1,026 citations


Proceedings ArticleDOI
21 Sep 2008
TL;DR: This paper presents an easy to install sensor network and an accurate but inexpensive annotation method and shows how the hidden Markov model and conditional random fields perform in recognizing activities.
Abstract: A sensor system capable of automatically recognizing activities would allow many potential ubiquitous applications. In this paper, we present an easy to install sensor network and an accurate but inexpensive annotation method. A recorded dataset consisting of 28 days of sensor data and its annotation is described and made available to the community. Through a number of experiments we show how the hidden Markov model and conditional random fields perform in recognizing activities. We achieve a timeslice accuracy of 95.6% and a class accuracy of 79.4%.

873 citations


Book
21 Feb 2008
TL;DR: The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
Abstract: Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs. Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication. The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.

763 citations


Journal ArticleDOI
TL;DR: A new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes that does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM).
Abstract: We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for species with the kinds of variations in splicing mechanisms seen in the fungal phyla Ascomycota, Basidiomycota, and Zygomycota. Upon self-training, the intron submodel switches on in several steps to reach its full complexity. We demonstrate that the algorithm accuracy, both at the exon and the whole gene level, is favorably compared to the accuracy of gene finders that employ supervised training. Application of the new method to known fungal genomes indicates substantial improvement over existing annotations. By eliminating the effort necessary to build comprehensive training sets, the new algorithm can streamline and accelerate the process of annotation in a large number of fungal genome sequencing projects.

737 citations


Journal ArticleDOI
TL;DR: This paper model the sequence of operations in credit card transaction processing using a hidden Markov model (HMM) and shows how it can be used for the detection of frauds and compares it with other techniques available in the literature.
Abstract: Due to a rapid advancement in the electronic commerce technology, the use of credit cards has dramatically increased. As credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are also rising. In this paper, we model the sequence of operations in credit card transaction processing using a hidden Markov model (HMM) and show how it can be used for the detection of frauds. An HMM is initially trained with the normal behavior of a cardholder. If an incoming credit card transaction is not accepted by the trained HMM with sufficiently high probability, it is considered to be fraudulent. At the same time, we try to ensure that genuine transactions are not rejected. We present detailed experimental results to show the effectiveness of our approach and compare it with other techniques available in the literature.

430 citations


Journal ArticleDOI
TL;DR: It is shown how the marginal likelihood can be computed via Markov chain Monte Carlo methods on modified posterior distributions for each model, which then allows Bayes factors or posterior model probabilities to be calculated.
Abstract: Model choice plays an increasingly important role in statistics. From a Bayesian perspective a crucial goal is to compute the marginal likelihood of the data for a given model. However, this is typically a difficult task since it amounts to integrating over all model parameters. The aim of the paper is to illustrate how this may be achieved by using ideas from thermodynamic integration or path sampling. We show how the marginal likelihood can be computed via Markov chain Monte Carlo methods on modified posterior distributions for each model. This then allows Bayes factors or posterior model probabilities to be calculated. We show that this approach requires very little tuning and is straightforward to implement. The new method is illustrated in a variety of challenging statistical settings.

360 citations


Book ChapterDOI
01 Jan 2008
TL;DR: General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition.
Abstract: This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.

316 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: A sampling algorithm is developed that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates and demonstrating the advantages of the sticky extension, and the utility of the HDP-HMM in real-world applications.
Abstract: The hierarchical Dirichlet process hidden Markov model (HDP-HMM) is a flexible, nonparametric model which allows state spaces of unknown size to be learned from data We demonstrate some limitations of the original HDP-HMM formulation (Teh et al, 2006), and propose a sticky extension which allows more robust learning of smoothly varying dynamics Using DP mixtures, this formulation also allows learning of more complex, multimodal emission distributions We further develop a sampling algorithm that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates Via extensive experiments with synthetic data and the NIST speaker diarization database, we demonstrate the advantages of our sticky extension, and the utility of the HDP-HMM in real-world applications

313 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: An auto-context algorithm that learns an integrated low-level and context model, and is very general and easy to implement, and has the potential to be used for a wide variety of problems of multi-variate labeling.
Abstract: The notion of using context information for solving high-level vision problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with the image appearance, remains mostly unknown. The current literature using Markov random fields (MRFs) and conditional random fields (CRFs) often involves specific algorithm design, in which the modeling and computing stages are studied in isolation. In this paper, we propose an auto-context algorithm. Given a set of training images and their corresponding label maps, we first learn a classifier on local image patches. The discriminative probability (or classification confidence) maps by the learned classifier are then used as context information, in addition to the original image patches, to train a new classifier. The algorithm then iterates to approach the ground truth. Auto-context learns an integrated low-level and context model, and is very general and easy to implement. Under nearly the identical parameter setting in the training, we apply the algorithm on three challenging vision applications: object segmentation, human body configuration, and scene region labeling. It typically takes about 30 ~ 70 seconds to run the algorithm in testing. Moreover, the scope of the proposed algorithm goes beyond high-level vision. It has the potential to be used for a wide variety of problems of multi-variate labeling.

310 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: This paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling, which typically outperforms the Gibbs sampler and is more robust.
Abstract: The infinite hidden Markov model is a non-parametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each time step to a finite number, with dynamic programming, which samples whole state trajectories efficiently. Our algorithm typically outperforms the Gibbs sampler and is more robust. We present applications of iHMM inference using the beam sampler on changepoint detection and text prediction problems.

269 citations


Journal ArticleDOI
TL;DR: New methodology for the detection and matching of salient points over several views of an object, modelled by a Hidden Markov Model, which is trained in an unsupervised way by using contextual 3D neighborhood information, thus providing a robust and invariant point signature is proposed.
Abstract: This paper proposes new methodology for the detection and matching of salient points over several views of an object. The process is composed by three main phases. In the first step, detection is carried out by adopting a new perceptually-inspired 3D saliency measure. Such measure allows the detection of few sparse salient points that characterize distinctive portions of the surface. In the second step, a statistical learning approach is considered to describe salient points across different views. Each salient point is modelled by a Hidden Markov Model (HMM), which is trained in an unsupervised way by using contextual 3D neighborhood information, thus providing a robust and invariant point signature. Finally, in the third step, matching among points of different views is performed by evaluating a pairwise similarity measure among HMMs. An extensive and comparative experimental session has been carried out, considering real objects acquired by a 3D scanner from different points of view, where objects come from standard 3D databases. Results are promising, as the detection of salient points is reliable, and the matching is robust and accurate.

Book
01 Jan 2008
TL;DR: In this paper, a nonhomogeneous hidden Markov model is proposed to model the transitions among latent relationship states and effects on buying behavior, where the transitions between the states are a function of time-varying covariates such as customer-firm encounters that could have an enduring impact by shifting the customer to a different relationship state.
Abstract: This research models the dynamics of customer relationships using typical transaction data. Our proposed model permits not only capturing the dynamics of customer relationships, but also incorporating the effect of the sequence of customer-firm encounters on the dynamics of customer relationships and the subsequent buying behavior. Our approach to modeling relationship dynamics is structurally different from existing approaches. Specifically, we construct and estimate a nonhomogeneous hidden Markov model to model the transitions among latent relationship states and effects on buying behavior. In the proposed model, the transitions between the states are a function of time-varying covariates such as customer-firm encounters that could have an enduring impact by shifting the customer to a different (unobservable) relationship state. The proposed model enables marketers to dynamically segment their customer base and to examine methods by which the firm can alter long-term buying behavior. We use a hierarchical Bayes approach to capture the unobserved heterogeneity across customers. We calibrate the model in the context of alumni relations using a longitudinal gift-giving data set. Using the proposed model, we probabilistically classify the alumni base into three relationship states and estimate the effect of alumni-university interactions, such as reunions, on the movement of alumni between these states. Additionally, we demonstrate improved prediction ability on a hold-out sample.

Journal ArticleDOI
TL;DR: A novel approach for autonomous and incremental learning of motion pattern primitives by observation of human motion, abstracted into a dynamic stochastic model, analogous to the mirror neuron hypothesis in primates is described.
Abstract: This paper describes a novel approach for autonomous and incremental learning of motion pattern primitives by observation of human motion. Human motion patterns are abstracted into a dynamic stochastic model, which can be used for both subsequent motion recognition and generation, analogous to the mirror neuron hypothesis in primates. The model size is adaptable based on the discrimination requirements in the associated region of the current knowledge base. A new algorithm for sequentially training the Markov chains is developed, to reduce the computation cost during model adaptation. As new motion patterns are observed, they are incrementally grouped together using hierarchical agglomerative clustering based on their relative distance in the model space. The clustering algorithm forms a tree structure, with specialized motions at the tree leaves, and generalized motions closer to the root. The generated tree structure will depend on the type of training data provided, so that the most specialized motions will be those for which the most training has been received. Tests with motion capture data for a variety of motion primitives demonstrate the efficacy of the algorithm.

Journal ArticleDOI
TL;DR: This work trained a discrete hidden Markov model (HMM) to map contextual information to a user activity and evaluated the model using data captured from almost 200 hours of detailed observation and documentation of hospital workers.
Abstract: Although researchers have developed robust approaches for estimating, location, and user identity, estimating user activities has proven much more challenging. Human activities are so complex and dynamic that it's often unclear what information is even relevant for modeling activities. Robust approaches to recognize user activities requires identifying the relevant information to be sensed and the appropriate sensing technologies. In our effort to develop an approach for automatically estimating hospital-staff activities, we trained a discrete hidden Markov model (HMM) to map contextual information to a user activity. We trained the model and evaluated it using data captured from almost 200 hours of detailed observation and documentation of hospital workers. In this article, we discuss our approach, the results, and how activity recognition could empower our vision of the hospital as a smart environment.

Journal ArticleDOI
TL;DR: Two different types of visual activity analysis modules based on vehicle tracking are presented, adding realtime situational awareness to highway monitoring for high-level activity and behavior analysis.
Abstract: This paper presents two different types of visual activity analysis modules based on vehicle tracking. The highway monitoring module accurately classifies vehicles into eight different types and collects traffic flow statistics by leveraging tracking information. These statistics are continuously accumulated to maintain daily highway models that are used to categorize traffic flow in real time. The path modeling block is a more general analysis tool that learns the normal motions encountered in a scene in an unsupervised fashion. The spatiotemporal motion characteristics of these motion paths are encoded by a hidden Markov model. With the path definitions, abnormal trajectories are detected and future intent is predicted. These modules add realtime situational awareness to highway monitoring for high-level activity and behavior analysis.

30 Jan 2008
TL;DR: This unit introduces the concept of hidden Markov models in computational biology by describing them using simple biological examples, requiring as little mathematical knowledge as possible.
Abstract: This unit introduces the concept of hidden Markov models in computational biology. It describes them using simple biological examples, requiring as little mathematical knowledge as possible. The unit also presents a brief history of hidden Markov models and an overview of their current applications before concluding with a discussion of their limitations.

Journal ArticleDOI
TL;DR: This work proposes a novel treatment of HMRF models, formulated on the basis of a fuzzy clustering principle, which utilizes a fuzzy objective function regularized by Kullback--Leibler divergence information, and is facilitated by application of a mean-field-like approximation of the MRF prior.
Abstract: Hidden Markov random field (HMRF) models have been widely used for image segmentation, as they appear naturally in problems where a spatially constrained clustering scheme, taking into account the mutual influences of neighboring sites, is asked for. Fuzzy c-means (FCM) clustering has also been successfully applied in several image segmentation applications. In this paper, we combine the benefits of these two approaches, by proposing a novel treatment of HMRF models, formulated on the basis of a fuzzy clustering principle. We approach the HMRF model treatment problem as an FCM-type clustering problem, effected by introducing the explicit assumptions of the HMRF model into the fuzzy clustering procedure. Our approach utilizes a fuzzy objective function regularized by Kullback--Leibler divergence information, and is facilitated by application of a mean-field-like approximation of the MRF prior. We experimentally demonstrate the superiority of the proposed approach over competing methodologies, considering a series of synthetic and real-world image segmentation applications.

Journal ArticleDOI
TL;DR: It is found that combining likelihoods of multiple models in a second classification stage degrades performance of the proposed classifiers, while improving performance with HMM and SD TW, and combining DFFM mappings of multiple SDTW models with SDTW likelihoods can provide significant improvement over SDTW.
Abstract: To recognize speech, handwriting, or sign language, many hybrid approaches have been proposed that combine dynamic time warping (DTW) or hidden Markov models (HMMs) with discriminative classifiers. However, all methods rely directly on the likelihood models of DTW/HMM. We hypothesize that time warping and classification should be separated because of conflicting likelihood modeling demands. To overcome these restrictions, we propose using statistical DTW (SDTW) only for time warping, while classifying the warped features with a different method. Two novel statistical classifiers are proposed - combined discriminative feature detectors (CDFDs) and quadratic classification on DF Fisher mapping (Q-DFFM) - both using a selection of discriminative features (DFs), and are shown to outperform HMM and SDTW. However, we have found that combining likelihoods of multiple models in a second classification stage degrades performance of the proposed classifiers, while improving performance with HMM and SDTW. A proof-of-concept experiment, combining DFFM mappings of multiple SDTW models with SDTW likelihoods, shows that, also for model-combining, hybrid classification can provide significant improvement over SDTW. Although recognition is mainly based on 3D hand motion features, these results can be expected to generalize to recognition with more detailed measurements such as hand/body pose and facial expression.

Proceedings ArticleDOI
12 Mar 2008
TL;DR: This paper proposes an approach that allows a robot to detect intentions of others based on experience acquired through its own sensory-motor capabilities, then using this experience while taking the perspective of the agent whose intent should be recognized.
Abstract: Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among agents or detection of situations that can pose a threat. In this paper, we propose an approach that allows a robot to detect intentions of others based on experience acquired through its own sensory-motor capabilities, then using this experience while taking the perspective of the agent whose intent should be recognized. Our method uses a novel formulation of Hidden Markov Models designed to model a robot's experience and interaction with the world. The robot's capability to observe and analyze the current scene employs a novel vision-based technique for target detection and tracking, using a non-parametric recursive modeling approach. We validate this architecture with a physically embedded robot, detecting the intent of several people performing various activities.

Journal ArticleDOI
TL;DR: 1-SVM-based multiclass classification approach overperforms the conventional hidden Markov model-based system in the experiments conducted, the improvement in the error rate can reach 50%.
Abstract: This paper presents a method aimed at recognizing environmental sounds for surveillance and security applications. We propose to apply one-class support vector machines (1-SVMs) together with a sophisticated dissimilarity measure in order to address audio classification, and more specifically, sound recognition. We illustrate the performance of this method on an audio database, which consists of 1015 sounds belonging to nine classes. The database used presents high intraclass diversity in temps of signal properties and some kind of interclass similarities. A large discrepancy in the number of items in each class implies nonuniform probability of sound appearances. The method proceeds as follows: first, the use of a set of state-of-the-art audio features is studied. Then, we introduce a set of novel features obtained by combining elementary features. Experiments conducted on a nine-class classification problem show the superiority of this novel sound recognition method. The best recognition accuracy (96.89%) is obtained when combining wavelet-based features, MFCCs, and individual temporal and frequency features. Our 1-SVM-based multiclass classification approach overperforms the conventional hidden Markov model-based system in the experiments conducted, the improvement in the error rate can reach 50%. Besides, we provide empirical results showing that the single-class SVM outperforms a combination of binary SVMs. Additional experiments demonstrate our method is robust to environmental noise.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: An automatic system that recognizes both isolated and continuous gestures for Arabic numbers (0-9) in real-time based on hidden Markov model (HMM) and the LRB topology in conjunction with forward algorithm presents the best performance and achieves average rate recognition.
Abstract: In this paper, we propose an automatic system that recognizes both isolated and continuous gestures for Arabic numbers (0-9) in real-time based on hidden Markov model (HMM). To handle isolated gestures, HMM using ergodic, left-right (LR) and left-right banded (LRB) topologies with different number of states ranging from 3 to 10 is applied. Orientation dynamic features are obtained from spatio-temporal trajectories and then quantized to generate its codewords. The continuous gestures are recognized by our novel idea of zero-codeword detection with static velocity motion. Therefore, the LRB topology in conjunction with forward algorithm presents the best performance and achieves average rate recognition 98.94% and 95.7% for isolated and continuous gestures, respectively.

Journal ArticleDOI
TL;DR: Experimental results show that in many cases the resulting segmentations correspond well to conventional notions of musical form, and how the constrained clustering approach can be extended to include prior musical knowledge, input from other machine approaches, or semi-supervision.
Abstract: We describe a method of segmenting musical audio into structural sections based on a hierarchical labeling of spectral features. Frames of audio are first labeled as belonging to one of a number of discrete states using a hidden Markov model trained on the features. Histograms of neighboring frames are then clustered into segment-types representing distinct distributions of states, using a clustering algorithm in which temporal continuity is expressed as a set of constraints modeled by a hidden Markov random field. We give experimental results which show that in many cases the resulting segmentations correspond well to conventional notions of musical form. We show further how the constrained clustering approach can easily be extended to include prior musical knowledge, input from other machine approaches, or semi-supervision.

Journal ArticleDOI
TL;DR: The work presented in this article explores the ability of hidden Markov models to distinguish songs from five species of antbirds that share the same territory in a rainforest environment in Mexico.
Abstract: Behavioral and ecological studies would benefit from the ability to automatically identify species from acoustic recordings. The work presented in this article explores the ability of hidden Markov models to distinguish songs from five species of antbirds that share the same territory in a rainforest environment in Mexico. When only clean recordings were used, species recognition was nearly perfect, 99.5%. With noisy recordings, performance was lower but generally exceeding 90%. Besides the quality of the recordings, performance has been found to be heavily influenced by a multitude of factors, such as the size of the training set, the feature extraction method used, and number of states in the Markov model. In general, training with noisier data also improved recognition in test recordings, because of an increased ability to generalize. Considerations for improving performance, including beamforming with sensor arrays and design of preprocessing methods particularly suited for bird songs, are discussed. Combining sensor network technology with effective event detection and species identification algorithms will enable observation of species interactions at a spatial and temporal resolution that is simply impossible with current tools. Analysis of animal behavior through real-time tracking of individuals and recording of large amounts of data with embedded devices in remote locations is thus a realistic goal.

Journal ArticleDOI
TL;DR: Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model using a deterministic half-sampling procedure and an adaptation of the EM algorithm, to take into account randomly missing values induced byCross-validation.
Abstract: The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.

Journal ArticleDOI
TL;DR: A technique to automatically differentiate between baseline, plan, and perimovement epochs of neural activity is developed based on a hidden Markov model (HMM), which can detect transitions in neural activity corresponding to targets not found in training data.
Abstract: Neural prosthetic interfaces use neural activity related to the planning and perimovement epochs of arm reaching to afford brain-directed control of external devices. Previous research has primarily centered on accurately decoding movement intention from either plan or perimovement activity, but has assumed that temporal boundaries between these epochs are known to the decoding system. In this work, we develop a technique to automatically differentiate between baseline, plan, and perimovement epochs of neural activity. Specifically, we use a generative model of neural activity to capture how neural activity varies between these three epochs. Our approach is based on a hidden Markov model (HMM), in which the latent variable (state) corresponds to the epoch of neural activity, coupled with a state-dependent Poisson firing model. Using an HMM, we demonstrate that the time of transition from baseline to plan epochs, a transition in neural activity that is not accompanied by any external behavior changes, can be detected using a threshold on the a posteriori HMM state probabilities. Following detection of the plan epoch, we show that the intended target of a center-out movement can be detected about as accurately as that by a maximum-likelihood estimator using a window of known plan activity. In addition, we demonstrate that our HMM can detect transitions in neural activity corresponding to targets not found in training data. Thus the HMM technique for automatically detecting transitions between epochs of neural activity enables prosthetic interfaces that can operate autonomously.

Journal ArticleDOI
TL;DR: This paper proposes a multiperson tracking solution based on a dynamic Bayesian network that simultaneously infers the number of people in a scene, their body locations, their head locations, and their head pose, and proposes Gaussian Mixture Model and Hidden Markov Model-based VFOA-W models, which use head pose and location information.
Abstract: In this paper, we define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W), determining where a person is looking when their movement is unconstrained. The VFOA-W estimation is a new and important problem with implications in behavior understanding and cognitive science and real-world applications. One such application, presented in this paper, monitors the attention passers-by pay to an outdoor advertisement by using a single video camera. In our approach to the VFOA-W problem, we propose a multiperson tracking solution based on a dynamic Bayesian network that simultaneously infers the number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting variable-dimensional state-space, we propose a Reversible-Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme and a novel global observation model, which determines the number of people in the scene and their locations. To determine if a person is looking at the advertisement or not, we propose Gaussian Mixture Model (GMM)-based and Hidden Markov Model (HMM)-based VFOA-W models, which use head pose and location information. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where up to three mobile observers pass in front of an advertisement.

Journal ArticleDOI
TL;DR: The development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making is focused on.
Abstract: Advances in computer processing power and emerging algorithms are allowing new ways of envisioning human-computer interaction. Although the benefit of audio-visual fusion is expected for affect recognition from the psychological and engineering perspectives, most of existing approaches to automatic human affect analysis are unimodal: information processed by computer system is limited to either face images or the speech signals. This paper focuses on the development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our multistream fused hidden Markov model (MFHMM), we analyzed coupled audio and visual streams to detect four cognitive states (interest, boredom, frustration and puzzlement) and seven prototypical emotions (neural, happiness, sadness, anger, disgust, fear and surprise). The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion, under clean and varying audio channel noise condition.

Journal ArticleDOI
TL;DR: An acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results and the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.
Abstract: We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries-which must be done to provide machine learning models with ground truth-by performing automatic harmony analysis on symbolic music files. In parallel, we synthesize audio from the same symbolic files and extract acoustic feature vectors which are in perfect alignment with the labels. We, therefore, generate a large set of labeled training data with a minimal amount of human labor. This allows for richer models. Thus, we build 24 key-dependent HMMs, one for each key, using the key information derived from symbolic data. Each key model defines a unique state-transition characteristic and helps avoid confusions seen in the observation vector. Given acoustic input, we identify a musical key by choosing a key model with the maximum likelihood, and we obtain the chord sequence from the optimal state path of the corresponding key model, both of which are returned by a Viterbi decoder. This not only increases the chord recognition accuracy, but also gives key information. Experimental results show the models trained on synthesized data perform very well on real recordings, even though the labels automatically generated from symbolic data are not 100% accurate. We also demonstrate the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.

Journal ArticleDOI
TL;DR: Results have indicated the promise of the approach which can accurately interpret 85% of the elderly behaviors, and the approach was found to have 90% accuracy, with 0% false alarm for abnormal detection.

Journal ArticleDOI
TL;DR: An automatic sketch synthesis algorithm is proposed based on embedded hidden Markov model (E-HMM) and selective ensemble strategy and achieves satisfactory effect of sketch synthesis with a small set of face training samples.
Abstract: Sketch synthesis plays an important role in face sketch-photo recognition system. In this manuscript, an automatic sketch synthesis algorithm is proposed based on embedded hidden Markov model (E-HMM) and selective ensemble strategy. First, the E-HMM is adopted to model the nonlinear relationship between a sketch and its corresponding photo. Then based on several learned models, a series of pseudo-sketches are generated for a given photo. Finally, these pseudo-sketches are fused together with selective ensemble strategy to synthesize a finer face pseudo-sketch. Experimental results illustrate that the proposed algorithm achieves satisfactory effect of sketch synthesis with a small set of face training samples.