scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2011"


Journal ArticleDOI
TL;DR: The range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R are reviewed, intended to be straightforward to use, flexible and comprehensively documented.
Abstract: Panel data are observations of a continuous-time process at arbitrary times, for example, visits to a hospital to diagnose disease status. Multi-state models for such data are generally based on the Markov assumption. This article reviews the range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R. Transition intensities may vary between individuals, or with piecewise-constant time-dependent covariates, giving an inhomogeneous Markov model. Hidden Markov models can be used for multi-state processes which are misclassified or observed only through a noisy marker. The package is intended to be straightforward to use, flexible and comprehensively documented. Worked examples are given of the use of msm to model chronic disease progression and screening. Assessment of model fit, and potential future developments of the software, are also discussed.

926 citations


Journal ArticleDOI
01 Nov 2011
TL;DR: A framework for hand gesture recognition based on the information fusion of a three-axis accelerometer (ACC) and multichannel electromyography (EMG) sensors that facilitates intelligent and natural control in gesture-based interaction.
Abstract: This paper presents a framework for hand gesture recognition based on the information fusion of a three-axis accelerometer (ACC) and multichannel electromyography (EMG) sensors. In our framework, the start and end points of meaningful gesture segments are detected automatically by the intensity of the EMG signals. A decision tree and multistream hidden Markov models are utilized as decision-level fusion to get the final results. For sign language recognition (SLR), experimental results on the classification of 72 Chinese Sign Language (CSL) words demonstrate the complementary functionality of the ACC and EMG sensors and the effectiveness of our framework. Additionally, the recognition of 40 CSL sentences is implemented to evaluate our framework for continuous SLR. For gesture-based control, a real-time interactive system is built as a virtual Rubik's cube game using 18 kinds of hand gestures as control commands. While ten subjects play the game, the performance is also examined in user-specific and user-independent classification. Our proposed framework facilitates intelligent and natural control in gesture-based interaction.

544 citations


Journal ArticleDOI
TL;DR: The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.
Abstract: This paper proposes to use exemplar-based sparse representations for noise robust automatic speech recognition. First, we describe how speech can be modeled as a linear combination of a small number of exemplars from a large speech exemplar dictionary. The exemplars are time-frequency patches of real speech, each spanning multiple time frames. We then propose to model speech corrupted by additive noise as a linear combination of noise and speech exemplars, and we derive an algorithm for recovering this sparse linear combination of exemplars from the observed noisy speech. We describe how the framework can be used for doing hybrid exemplar-based/HMM recognition by using the exemplar-activations together with the phonetic information associated with the exemplars. As an alternative to hybrid recognition, the framework also allows us to take a source separation approach which enables exemplar-based feature enhancement as well as missing data mask estimation. We evaluate the performance of these exemplar-based methods in connected digit recognition on the AURORA-2 database. Our results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB. Although not as effective as two baseline recognizers at higher SNRs, the novel approach offers a promising direction of future research on exemplar-based ASR.

388 citations


Journal ArticleDOI
TL;DR: A framework for live video analysis in which the behaviors of surveillance subjects are described using a vocabulary learned from recurrent motion patterns, for real-time characterization and prediction of future activities, as well as the detection of abnormalities.
Abstract: Society is rapidly accepting the use of video cameras in many new and varied locations, but effective methods to utilize and manage the massive resulting amounts of visual data are only slowly developing. This paper presents a framework for live video analysis in which the behaviors of surveillance subjects are described using a vocabulary learned from recurrent motion patterns, for real-time characterization and prediction of future activities, as well as the detection of abnormalities. The repetitive nature of object trajectories is utilized to automatically build activity models in a 3-stage hierarchical learning process. Interesting nodes are learned through Gaussian mixture modeling, connecting routes formed through trajectory clustering, and spatio-temporal dynamics of activities probabilistically encoded using hidden Markov models. Activity models are adapted to small temporal variations in an online fashion using maximum likelihood regression and new behaviors are discovered from a periodic retraining for long-term monitoring. Extensive evaluation on various data sets, typically missing from other work, demonstrates the efficacy and generality of the proposed framework for surveillance-based activity analysis.

349 citations


Journal ArticleDOI
01 Apr 2011-Proteins
TL;DR: A new approach to learning statistical models from multiple sequence alignments (MSA) of proteins, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA, which encodes both the position‐specific conservation statistics and the correlated mutation statistics between sequential and long‐range pairs of residues.
Abstract: We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the position-specific conservation statistics and the correlated mutation statistics between sequential and long-range pairs of residues. Existing techniques for learning graphical models from MSA either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e.g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out-performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out-perform Hidden Markov Models in terms of predictive accuracy.

331 citations


Proceedings Article
19 Jun 2011
TL;DR: A novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language, using graph-based label propagation for cross-lingual knowledge transfer.
Abstract: We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised model (Berg-Kirkpatrick et al., 2010). Across eight European languages, our approach results in an average absolute improvement of 10.4% over a state-of-the-art baseline, and 16.7% over vanilla hidden Markov models induced with the Expectation Maximization algorithm.

326 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: Deep Belief Networks work even better when their inputs are speaker adaptive, discriminative features, and on the standard TIMIT corpus, they give phone error rates of 19.6% using monophone HMMs and a bigram language model.
Abstract: Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of features that capture the higher-order statistical structure of the data. These features can be used to initialize the hidden units of a feed-forward neural network that is then trained to predict the HMM state for the central frame of the window. Initializing with features that are good at generating speech makes the neural network perform much better than initializing with random weights. DBNs have already been used successfully for phone recognition with input coefficients that are MFCCs or filterbank outputs [1, 2]. In this paper, we demonstrate that they work even better when their inputs are speaker adaptive, discriminative features. On the standard TIMIT corpus, they give phone error rates of 19.6% using monophone HMMs and a bigram language model and 19.4% using monophone HMMs and a trigram language model.

321 citations


Journal ArticleDOI
TL;DR: The article deals with the analysis and interpretation of dynamic scenes typical of urban driving, to assess risks of collision for the ego-vehicle with the use of Hidden Markov Models and Gaussian processes.
Abstract: The article deals with the analysis and interpretation of dynamic scenes typical of urban driving. The key objective is to assess risks of collision for the ego-vehicle. We describe our concept and methods, which we have integrated and tested on our experimental platform on a Lexus car and a driving simulator. The on-board sensors deliver visual, telemetric and inertial data for environment monitoring. The sensor fusion uses our Bayesian Occupancy Filter for a spatio-temporal grid representation of the traffic scene. The underlying probabilistic approach is capable of dealing with uncertainties when modeling the environment as well as detecting and tracking dynamic objects. The collision risks are estimated as stochastic variables and are predicted for a short period ahead with the use of Hidden Markov Models and Gaussian processes. The software implementation takes advantage of our methods, which allow for parallel computation. Our tests have proven the relevance and feasibility of our approach for improving the safety of car driving.

316 citations


Book
19 Mar 2011
TL;DR: This introduction to the expectation–maximization (EM) algorithm provides an intuitive and mathematically rigorous understanding of EM.
Abstract: This introduction to the expectation–maximization (EM) algorithm provides an intuitive and mathematically rigorous understanding of EM. Two of the most popular applications of EM are described in detail: estimating Gaussian mixture models (GMMs), and estimating hidden Markov models (HMMs). EM solutions are also derived for learning an optimal mixture of fixed models, for estimating the parameters of a compound Dirichlet distribution, and for dis-entangling superimposed signals. Practical issues that arise in the use of EM are discussed, as well as variants of the algorithm that help deal with these challenges.

314 citations


Journal ArticleDOI
TL;DR: The use of hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models for recognizing unconstrained offline handwritten texts and new techniques to remove slope and slant from handwritten text and to normalize the size of text images with supervised learning methods are presented.
Abstract: This paper proposes the use of hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models for recognizing unconstrained offline handwritten texts. The structural part of the optical models has been modeled with Markov chains, and a Multilayer Perceptron is used to estimate the emission probabilities. This paper also presents new techniques to remove slope and slant from handwritten text and to normalize the size of text images with supervised learning methods. Slope correction and size normalization are achieved by classifying local extrema of text contours with Multilayer Perceptrons. Slant is also removed in a nonuniform way by using Artificial Neural Networks. Experiments have been conducted on offline handwritten text lines from the IAM database, and the recognition rates achieved, in comparison to the ones reported in the literature, are among the best for the same task.

304 citations


Journal ArticleDOI
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

Journal ArticleDOI
TL;DR: In this article, a Bayesian nonparametric approach to speaker diarization is proposed, which builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al.
Abstract: We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566–1581]. Although the basic HDP-HMM tends to over-segment the audio data—creating redundant states and rapidly switching among them—we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.

Journal ArticleDOI
TL;DR: The hidden Markov model (HMM) is used to separate target vehicles from the background and track them probabilistically, and the approach is robust and effective in dealing with changes in environment and illumination and that real-time processing becomes possible for vehicle-borne cameras.
Abstract: This paper aims at real-time in-car video analysis to detect and track vehicles ahead for safety, autodriving, and target tracing. This paper describes a comprehensive approach to localizing target vehicles in video under various environmental conditions. The extracted geometry features from the video are continuously projected onto a 1-D profile and are constantly tracked. We rely on temporal information of features and their motion behaviors for vehicle identification, which compensates for the complexity in recognizing vehicle shapes, colors, and types. We probabilistically model the motion in the field of view according to the scene characteristic and the vehicle motion model. The hidden Markov model (HMM) is used to separate target vehicles from the background and track them probabilistically. We have investigated videos of day and night on different types of roads, showing that our approach is robust and effective in dealing with changes in environment and illumination and that real-time processing becomes possible for vehicle-borne cameras.

Journal ArticleDOI
TL;DR: In this article, a Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes, and additionally employs automatic relevance determination to infer a sparse set of dynamic dependencies allowing to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order.
Abstract: Many complex dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We additionally employ automatic relevance determination to infer a sparse set of dynamic dependencies allowing us to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, the IBOVESPA stock index and a maneuvering target tracking application.

Journal ArticleDOI
TL;DR: A new prognostic method is developed using adaptive neuro-fuzzy inference systems (ANFISs) and high-order particle filtering that outperforms classical condition predictors.
Abstract: Machine prognosis is a significant part of condition-based maintenance and intends to monitor and track the time evolution of a fault so that maintenance can be performed or the task can be terminated to avoid a catastrophic failure. A new prognostic method is developed in this paper using adaptive neuro-fuzzy inference systems (ANFISs) and high-order particle filtering. The ANFIS is trained via machine historical failure data. The trained ANFIS and its modeling noise constitute an mth-order hidden Markov model to describe the fault propagation process. The high-order particle filter uses this Markov model to predict the time evolution of the fault indicator in the form of a probability density function. An online update scheme is developed to adapt the Markov model to various machine dynamics quickly. The performance of the proposed method is evaluated by using the testing data from a cracked carrier plate and a faulty bearing. Results show that it outperforms classical condition predictors.

Book ChapterDOI
23 May 2011
TL;DR: An activity recognition system on a smartphone is proposed where the uncertain time-series acceleration signal is analyzed by using hierarchical hidden Markov models by addressing the limitations on the memory storage and computational power of the mobile devices.
Abstract: As smartphone users have been increased, studies using mobile sensors on smartphone have been investigated in recent years. Activity recognition is one of the active research topics, which can be used for providing users the adaptive services with mobile devices. In this paper, an activity recognition system on a smartphone is proposed where the uncertain time-series acceleration signal is analyzed by using hierarchical hidden Markov models. In order to address the limitations on the memory storage and computational power of the mobile devices, the recognition models are designed hierarchy as actions and activities. We implemented the real-time activity recognition application on a smartphone with the Google android platform, and conducted experiments as well. Experimental results showed the feasibility of the proposed method.

Journal ArticleDOI
TL;DR: In this article, the problem of learning a latent tree graphical model where samples are available only from a subset of variables has been studied and two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes, have been proposed.
Abstract: We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world data sets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups data set.

Proceedings ArticleDOI
22 May 2011
TL;DR: A novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable is presented and initial results demonstrate phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
Abstract: State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.

Journal ArticleDOI
TL;DR: An hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques.
Abstract: This paper describes an hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed method, speech is first decomposed into the glottal source signal and the model of the vocal tract filter through glottal inverse filtering, and thus parametrized into excitation and spectral features. The source and filter features are modeled individually in the framework of HMM and generated in the synthesis stage according to the text input. The glottal excitation is synthesized through interpolating and concatenating natural glottal flow pulses, and the excitation signal is further modified according to the spectrum of the desired voice source characteristics. Speech is synthesized by filtering the reconstructed source signal with the vocal tract filter. Experiments show that the proposed system is capable of generating natural sounding speech, and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques.

Proceedings ArticleDOI
22 May 2011
TL;DR: This work proposes a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task.
Abstract: The context-independent deep belief network (DBN) hidden Markov model (HMM) hybrid architecture has recently achieved promising results for phone recognition. In this work, we propose a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task. Our system achieves absolute sentence accuracy improvements of 5.8% and 9.2% over GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively, which translate to relative error reductions of 16.0% and 23.2%.

Journal ArticleDOI
01 May 2011
TL;DR: This paper addresses natural human-robot interaction (HRI) in a smart assisted living (SAIL) system for the elderly and the disabled with a neural network for gesture spotting and a hierarchical hidden Markov model for context-based recognition.
Abstract: In this paper, we address natural human-robot interaction (HRI) in a smart assisted living (SAIL) system for the elderly and the disabled. Two common HRI problems are studied: hand gesture recognition and daily activity recognition. For hand gesture recognition, we implemented a neural network for gesture spotting and a hierarchical hidden Markov model for context-based recognition. For daily activity recognition, a multisensor fusion scheme is developed to process motion data collected from the foot and the waist of a human subject. Experiments using a prototype wearable sensor system show the effectiveness and accuracy of our algorithms.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This pedestrian-movement prediction based on MMM using tracking data will make it possible to provide so-called "adaptive mobile services" with proactive functions, and is substantially more accurate than other methods based on a Markov-chain model.
Abstract: A method for predicting pedestrian movement on the basis of a mixed Markov-chain model (MMM) is proposed. MMM takes into account a pedestrian's personality as an unobservable parameter. It also takes into account the effects of the pedestrian's previous status. A promotional experiment in a major shopping mall demonstrated that the highest prediction accuracy of the MMM method is 74.4%. In comparison with methods based on a Markov-chain model (MM) and a hidden-Markov model (HMM) (i.e., prediction rates of about 45% and 2%, respectively), the proposed MMM-based prediction method is substantially more accurate. This pedestrian-movement prediction based on MMM using tracking data will make it possible to provide so-called "adaptive mobile services" with proactive functions.

Journal ArticleDOI
TL;DR: Generative embedding achieves a near-perfect balanced classification accuracy of 98% and significantly outperforms conventional activation-based and correlation-based methods and is envisaged that future applications of generativeembedding may provide crucial advances in dissecting spectrum disorders into physiologically more well-defined subgroups.
Abstract: Decoding models, such as those underlying multivariate classification algorithms, have been increasingly used to infer cognitive or clinical brain states from measures of brain activity obtained by functional magnetic resonance imaging (fMRI). The practicality of current classifiers, however, is restricted by two major challenges. First, due to the high data dimensionality and low sample size, algorithms struggle to separate informative from uninformative features, resulting in poor generalization performance. Second, popular discriminative methods such as support vector machines (SVMs) rarely afford mechanistic interpretability. In this paper, we address these issues by proposing a novel generative-embedding approach that incorporates neurobiologically interpretable generative models into discriminative classifiers. Our approach extends previous work on trial-by-trial classification for electrophysiological recordings to subject-by-subject classification for fMRI and offers two key advantages over conventional methods: it may provide more accurate predictions by exploiting discriminative information encoded in 'hidden' physiological quantities such as synaptic connection strengths; and it affords mechanistic interpretability of clinical classifications. Here, we introduce generative embedding for fMRI using a combination of dynamic causal models (DCMs) and SVMs. We propose a general procedure of DCM-based generative embedding for subject-wise classification, provide a concrete implementation, and suggest good-practice guidelines for unbiased application of generative embedding in the context of fMRI. We illustrate the utility of our approach by a clinical example in which we classify moderately aphasic patients and healthy controls using a DCM of thalamo-temporal regions during speech processing. Generative embedding achieves a near-perfect balanced classification accuracy of 98% and significantly outperforms conventional activation-based and correlation-based methods. This example demonstrates how disease states can be detected with very high accuracy and, at the same time, be interpreted mechanistically in terms of abnormalities in connectivity. We envisage that future applications of generative embedding may provide crucial advances in dissecting spectrum disorders into physiologically more well-defined subgroups.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an online parameter estimation algorithm that combines two key ideas: reparameterizing the problem using complete-data sufficient statistics and exploiting a purely recursive form of smoothing in HMMs based on an auxiliary recursion.
Abstract: Online (also called “recursive” or “adaptive”) estimation of fixed model parameters in hidden Markov models is a topic of much interest in times series modeling. In this work, we propose an online parameter estimation algorithm that combines two key ideas. The first one, which is deeply rooted in the Expectation-Maximization (EM) methodology, consists in reparameterizing the problem using complete-data sufficient statistics. The second ingredient consists in exploiting a purely recursive form of smoothing in HMMs based on an auxiliary recursion. Although the proposed online EM algorithm resembles a classical stochastic approximation (or Robbins–Monro) algorithm, it is sufficiently different to resist conventional analysis of convergence. We thus provide limited results which identify the potential limiting points of the recursion as well as the large-sample behavior of the quantities involved in the algorithm. The performance of the proposed algorithm is numerically evaluated through simulations in the ca...

Journal ArticleDOI
TL;DR: General convergence results, including exponential deviation inequalities and central limit theorems, are established and time uniform bounds on the marginal smoothing error are obtained under appropriate mixing conditions on the transition kernel of the latent chain.
Abstract: Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models The aim of this paper is to provide a foundation of particle-based approximation of such distributions and to analyze, in a common unifying framework, different schemes producing such approximations In this setting, general convergence results, including exponential deviation inequalities and central limit theorems, are established In particular, time uniform bounds on the marginal smoothing error are obtained under appropriate mixing conditions on the transition kernel of the latent chain In addition, we propose an algorithm approximating the joint smoothing distribution at a cost that grows only linearly with the number of particles

Proceedings ArticleDOI
09 May 2011
TL;DR: The proposed hidden Markov model accurately estimates the identity and configuration of clothing articles, enabling the procedure to autonomously bring a variety of articles into desired configurations that are useful for other tasks, such as folding.
Abstract: We consider the problem of autonomously bringing an article of clothing into a desired configuration using a general-purpose two-armed robot. We propose a hidden Markov model (HMM) for estimating the identity of the article and tracking the article's configuration throughout a specific sequence of manipulations and observations. At the end of this sequence, the article's configuration is known, though not necessarily desired. The estimated identity and configuration of the article are then used to plan a second sequence of manipulations that brings the article into the desired configuration. We propose a relaxation of a strain-limiting finite element model for cloth simulation that can be solved via convex optimization; this serves as the basis of the transition and observation models of the HMM. The observation model uses simple perceptual cues consisting of the height of the article when held by a single gripper and the silhouette of the article when held by two grippers. The model accurately estimates the identity and configuration of clothing articles, enabling our procedure to autonomously bring a variety of articles into desired configurations that are useful for other tasks, such as folding.

Journal ArticleDOI
TL;DR: A novel combination of vision based features in order to enhance the recognition of underlying signs and kurtosis position and principal component analysis, PCA are presented.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work proposes a framework for automatic facial expression recognition from continuous video sequence by modeling temporal variations within shapes using Latent-Dynamic Conditional Random Fields, and shows that the proposed approach outperforms CRFs for recognizing facial expressions.
Abstract: Conditional Random Fields (CRFs) can be used as a discriminative approach for simultaneous sequence segmentation and frame labeling. Latent-Dynamic Conditional Random Fields (LDCRFs) incorporates hidden state variables within CRFs which model sub-structure motion patterns and dynamics between labels. Motivated by the success of LDCRFs in gesture recognition, we propose a framework for automatic facial expression recognition from continuous video sequence by modeling temporal variations within shapes using LDCRFs. We show that the proposed approach outperforms CRFs for recognizing facial expressions. Using Principal Component Analysis (PCA) we study the separability of various expression classes in lower dimension projected spaces. By comparing the performance of CRFs and LDCRFs against that of Support Vector Machines (SVMs), we demonstrate that temporal variations within shapes are crucial in classifying expressions especially for those with a small range of facial motion like anger and sadness. We also show empirically that only using changes in facial appearance over time, without using shape variations, is not sufficient to obtain high performance for facial expression recognition.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work presents a framework for the automatic recognition of complex multi-agent events in settings where structure is imposed by rules that agents must follow while performing activities, relying on an efficient bottom-up grounding scheme to avoid combinatorial explosion.
Abstract: We present a framework for the automatic recognition of complex multi-agent events in settings where structure is imposed by rules that agents must follow while performing activities. Given semantic spatio-temporal descriptions of what generally happens (i.e., rules, event descriptions, physical constraints), and based on video analysis, we determine the events that occurred. Knowledge about spatio-temporal structure is encoded using first-order logic using an approach based on Allen's Interval Logic, and robustness to low-level observation uncertainty is provided by Markov Logic Networks (MLN). Our main contribution is that we integrate interval-based temporal reasoning with probabilistic logical inference, relying on an efficient bottom-up grounding scheme to avoid combinatorial explosion. Applied to one-on-one basketball, our framework detects and tracks players, their hands and feet, and the ball, generates event observations from the resulting trajectories, and performs probabilistic logical inference to determine the most consistent sequence of events. We demonstrate our approach on 1hr (100,000 frames) of outdoor videos.

Book
25 May 2011
TL;DR: An Application: Handwritten Digit Recognition using Bayes Classifier, Hidden Markov Models, and Support Vector Machines for Digit recognition.
Abstract: Introduction Representation Nearest Neighbour Based Classifiers Bayes Classifier Hidden Markov Models Decision Trees Support Vector Machines Combination of Classifiers Clustering Summary An Application: Handwritten Digit Recognition