Showing papers on "Hidden Markov model published in 2010"

PDF

Open Access

Book•

[...]

Olivier Capp, Eric Moulines, Tobias Rydén

01 Dec 2010

TL;DR: This book is a comprehensive treatment of inference for hidden Markov models, including both algorithms and statistical theory, and builds on recent developments to present a self-contained view.

...read moreread less

Abstract: This book is a comprehensive treatment of inference for hidden Markov models, including both algorithms and statistical theory. Topics range from filtering and smoothing of the hidden Markov chain to parameter estimation, Bayesian methods and estimation of the number of states. In a unified way the book covers both models with finite state spaces and models with continuous state spaces (also called state-space models) requiring approximate simulation-based algorithms that are also described in detail. Many examples illustrate the algorithms and theory. This book builds on recent developments to present a self-contained view.

...read moreread less

1,537 citations

Journal Article•DOI•

Hidden Markov model speed heuristic and iterative HMM search procedure

[...]

L. Steven Johnson¹, Sean R. Eddy², Elon Portugaly³•Institutions (3)

Washington University in St. Louis¹, Howard Hughes Medical Institute², Hebrew University of Jerusalem³

18 Aug 2010-BMC Bioinformatics

TL;DR: A series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMer package, in an effort to reduce search time, significantly reduces the time needed to score a profile-HMM against large sequence databases.

...read moreread less

Abstract: Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.

...read moreread less

890 citations

Posted Content•

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

[...]

Lindasalwa Muda, Mumtaj Begam, Irraivan Elamvazuthi

22 Mar 2010-arXiv: Multimedia

TL;DR: This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.

...read moreread less

Abstract: — Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patterns

...read moreread less

846 citations

Journal Article•DOI•

Hidden semi-Markov models

[...]

Shun-Zheng Yu¹•Institutions (1)

Sun Yat-sen University¹

01 Feb 2010-Artificial Intelligence

TL;DR: An overview of HSMMs is presented, including modelling, inference, estimation, implementation and applications, which has been applied in thirty scientific and engineering areas, including speech recognition/synthesis, human activity recognition/prediction, handwriting recognition, functional MRI brain mapping, and network anomaly detection.

...read moreread less

734 citations

Proceedings Article•DOI•

Torchvision the machine-vision package of torch

[...]

Sébastien Marcel¹, Yann Rodriguez¹•Institutions (1)

Idiap Research Institute¹

25 Oct 2010

TL;DR: This paper presents Torchvision an open source machine vision package for Torch that provides additional functionalities to manipulate and process images with standard image processing algorithms.

...read moreread less

Abstract: This paper presents Torchvision an open source machine vision package for Torch. Torch is a machine learning library providing a series of the state-of-the-art algorithms such as Neural Networks, Support Vector Machines, Gaussian Mixture Models, Hidden Markov Models and many others. Torchvision provides additional functionalities to manipulate and process images with standard image processing algorithms. Hence, the resulting images can be used directly with the Torch machine learning algorithms as Torchvision is fully integrated with Torch. Both Torch and Torchvision are written in C++ language and are publicly available under the Free-BSD License.

...read moreread less

341 citations

Journal Article•DOI•

depmixS4: An R Package for Hidden Markov Models

[...]

Ingmar Visser, Maarten Speekenbrink

05 Aug 2010-Journal of Statistical Software

TL;DR: depmixS4 as discussed by the authors is a general framework for defining and estimating dependent mixture models in the R programming language, including standard Markov models, latent/hidden Markov model, and latent class and finite mixture distribution models.

...read moreread less

Abstract: depmixS4 implements a general framework for defining and estimating dependent mixture models in the R programming language. This includes standard Markov models, latent/hidden Markov models, and latent class and finite mixture distribution models. The models can be fitted on mixed multivariate data with distributions from the glm family, the (logistic) multinomial, or the multivariate normal distribution. Other distributions can be added easily, and an example is provided with the exgaus distribution. Parameters are estimated by the expectation-maximization (EM) algorithm or, when (linear) constraints are imposed on the parameters, by direct numerical optimization with the Rsolnp or Rdonlp2 routines.

...read moreread less

312 citations

Journal Article•DOI•

A driver fatigue recognition model based on information fusion and dynamic Bayesian network

[...]

Guosheng Yang¹, Yingzi Lin², Prabir Bhattacharya³•Institutions (3)

Minzu University of China¹, Northeastern University², University of Cincinnati³

01 May 2010-Information Sciences

TL;DR: The experimental validation shows the effectiveness of the proposed driver fatigue recognition model and indicates that the contact physiological features are significant factors for inferring the fatigue state of a driver.

...read moreread less

288 citations

Proceedings Article•DOI•

What's going on? Discovering spatio-temporal dependencies in dynamic scenes

[...]

Daniel Kuettel¹, Michael D. Breitenstein¹, Luc Van Gool¹, Vittorio Ferrari¹•Institutions (1)

ETH Zurich¹

13 Jun 2010

TL;DR: Two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes by employing Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models.

...read moreread less

Abstract: We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be learned. While the first method extracts rules based on a learned topic model, the second model called DDP-HMM jointly learns co-occurring activities and their time dependencies. To this end we employ Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models. In contrast to previous work, we build on state-of-the-art topic models that allow to automatically infer all parameters such as the optimal number of HMMs necessary to explain the rules governing a scene. The models are trained offline by Gibbs Sampling using unlabeled training data.

...read moreread less

267 citations

Proceedings Article•DOI•

On community outliers and their efficient detection in information networks

[...]

Jing Gao¹, Feng Liang¹, Wei Fan², Chi Wang¹, Yizhou Sun¹, Jiawei Han¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

25 Jul 2010

TL;DR: This paper proposes an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers, and applies the model on both synthetic data and DBLP data sets to demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.

...read moreread less

Abstract: Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called "information networks"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on both synthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.

...read moreread less

260 citations

Journal Article•DOI•

A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models

[...]

Sander Koelstra¹, Maja Pantic², Ioannis Patras¹•Institutions (2)

Queen Mary University of London¹, Imperial College London²

01 Nov 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos is proposed.

...read moreread less

Abstract: In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set.

...read moreread less

254 citations

Journal Article•DOI•

MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.

[...]

Yongchao Liu¹, Bertil Schmidt¹, Douglas L. Maskell¹•Institutions (1)

Nanyang Technological University¹

15 Aug 2010-Bioinformatics

TL;DR: The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities, and is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners.

...read moreread less

Abstract: Motivation: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge. Results: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile–profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners. Availability: The source code of MSAProbs, written in C++, is freely and publicly available from http://msaprobs.sourceforge.net.

...read moreread less

Proceedings Article•

Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition

[...]

Dong Yu¹, Li Deng¹, George E. Dahl²•Institutions (2)

Microsoft¹, University of Toronto²

01 Dec 2010

TL;DR: It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer.

...read moreread less

Abstract: Recently, deep learning techniques have been successfully applied to automatic speech recognition tasks -first to phonetic recognition with context-independent deep belief network (DBN) hidden Markov models (HMMs) and later to large vocabulary continuous speech recognition using context-dependent (CD) DBN-HMMs. In this paper, we report our most recent experiments designed to understand the roles of the two main phases of the DBN learning -pre-training and fine tuning -in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer. As expected, we show that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models. However, a moderate increase of the amount of unlabeled pre-training data has an insignificant effect on the final recognition results as long as the original training size is sufficiently large to initialize the DBN weights. On the other hand, with additional labeled training data, the fine-tuning phase of DBN training can significantly improve the recognition accuracy.

...read moreread less

Proceedings Article•

Hilbert Space Embeddings of Hidden Markov Models

[...]

Le Song¹, Byron Boots¹, Sajid M. Siddiqi², Geoffrey J. Gordon¹, Alexander J. Smola³ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, Google², Yahoo!³

21 Jun 2010

TL;DR: This work proposes a nonparametric HMM that extends traditional HMMs to structured and non-Gaussian continuous distributions, and derives a local-minimum-free kernel spectral algorithm for learning these HMMs.

...read moreread less

Abstract: Hidden Markov Models (HMMs) are important tools for modeling sequence data. However, they are restricted to discrete latent states, and are largely restricted to Gaussian and discrete observations. And, learning algorithms for HMMs have predominantly relied on local search heuristics, with the exception of spectral methods such as those described below. We propose a nonparametric HMM that extends traditional HMMs to structured and non-Gaussian continuous distributions. Furthermore, we derive a local-minimum-free kernel spectral algorithm for learning these HMMs. We apply our method to robot vision data, slot car inertial sensor data and audio event classification data, and show that in these applications, embedded HMMs exceed the previous state-of-the-art performance.

...read moreread less

Journal Article•DOI•

Compressive imaging using approximate message passing and a Markov-tree prior

[...]

Subhojit Som¹, Lee C. Potter¹, Philip Schniter¹•Institutions (1)

Ohio State University¹

01 Nov 2010

TL;DR: A novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images, based on loopy belief propagation (LBP).

...read moreread less

Abstract: We propose a novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images. Like other recent works, we model wavelet structure using a hidden Markov tree (HMT) but, unlike other works, ours is based on loopy belief propagation (LBP). For LBP, we adopt a recently proposed “turbo” message passing schedule that alternates between exploitation of HMT structure and exploitation of compressive-measurement structure. For the latter, we leverage Donoho, Maleki, and Montanari's recently proposed approximate message passing (AMP) algorithm. Experiments with a large image database suggest that, relative to existing schemes, our turbo LBP approach yields state-of-the-art reconstruction performance with substantial reduction in complexity.

...read moreread less

Journal Article•DOI•

An activity monitoring system for elderly care using generative and discriminative models

[...]

T. L. Kasteren¹, Gwenn Englebienne¹, Ben Kröse¹•Institutions (1)

University of Amsterdam¹

01 Sep 2010

TL;DR: A wireless sensor network for unintrusive observations in the home is presented and the potential of generative and discriminative models for recognizing activities from such observations are shown.

...read moreread less

Abstract: An activity monitoring system allows many applications to assist in care giving for elderly in their homes. In this paper we present a wireless sensor network for unintrusive observations in the home and show the potential of generative and discriminative models for recognizing activities from such observations. Through a large number of experiments using four real world datasets we show the effectiveness of the generative hidden Markov model and the discriminative conditional random fields in activity recognition.

...read moreread less

Proceedings Article•

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems.

[...]

Bo Li¹, Khe Chai Sim²•Institutions (2)

National University of Singapore¹, Institute for Infocomm Research Singapore²

01 Jan 2010

TL;DR: The discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints.

...read moreread less

Abstract: Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints. Experimental results show that the data-driven constrained discriminative transforms are much more robust for unsupervised adaptation.

...read moreread less

Proceedings Article•DOI•

Subspace Gaussian Mixture Models for speech recognition

[...]

Daniel Povey¹, Lukas Burget², Mohit Agarwal³, Pinar Akyazi, Kai Feng⁴, Arnab Ghoshal⁵, Ondrej Glembek², Nagendra Kumar Goel, Martin Karafiat², Ariya Rastrow⁶, Richard Rose⁷, Petr Schwarz², Samuel Thomas⁶ - Show less +9 more•Institutions (7)

Microsoft¹, Brno University of Technology², Indian Institute of Information Technology, Allahabad³, Hong Kong University of Science and Technology⁴, Saarland University⁵, Johns Hopkins University⁶, McGill University⁷

14 Mar 2010

TL;DR: An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.

...read moreread less

Abstract: We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.

...read moreread less

Proceedings Article•DOI•

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

[...]

Lukas Burget¹, Petr Schwarz¹, Mohit Agarwal², Pinar Akyazi, Kai Feng³, Arnab Ghoshal⁴, Ondrej Glembek¹, Nagendra Kumar Goel, Martin Karafiat¹, Daniel Povey⁵, Ariya Rastrow⁶, Richard Rose⁷, Samuel Thomas⁶ - Show less +9 more•Institutions (7)

Brno University of Technology¹, Indian Institute of Information Technology, Allahabad², Hong Kong University of Science and Technology³, Saarland University⁴, Microsoft⁵, Johns Hopkins University⁶, McGill University⁷

14 Mar 2010

TL;DR: This work reports experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages.

...read moreread less

Abstract: Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.

...read moreread less

Journal Article•DOI•

Real-time driving danger-level prediction

[...]

Jinjun Wang, Wei Xu, Yihong Gong

01 Dec 2010-Engineering Applications of Artificial Intelligence

TL;DR: Experimental results showed that using reinforcement learning based method with the vehicle dynamic parameters feature outperforms the rest algorithms, and adding the other two features could further improve the prediction accuracy.

...read moreread less

Proceedings Article•

Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression using Bidirectional LSTM Modeling

[...]

Martin Wöllmer¹, Angeliki Metallinou¹, Florian Eyben¹, Björn Schuller², Shrikanth S. Narayanan² - Show less +1 more•Institutions (2)

Technische Universität München¹, University of Southern California²

01 Jan 2010

TL;DR: A context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues is applied, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database.

...read moreread less

Abstract: In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long ShortTerm Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72 %, 65 %, and 55 % for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively.

...read moreread less

Journal Article•DOI•

Hand gesture recognition based on dynamic Bayesian network framework

[...]

Heung-Il Suk¹, Bong-Kee Sin², Seong-Whan Lee¹•Institutions (2)

Korea University¹, Pukyong National University²

01 Sep 2010-Pattern Recognition

TL;DR: The proposed DBN-based hand gesture model and the design of a gesture network model are believed to have a strong potential for successful applications to other related problems such as sign language recognition although it is a bit more complicated requiring analysis of hand shapes.

...read moreread less

Journal Article•DOI•

Real-world acoustic event detection

[...]

Xiaodan Zhuang¹, Xi Zhou¹, Mark Hasegawa-Johnson¹, Thomas S. Huang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2010-Pattern Recognition Letters

TL;DR: This work proposes extracting discriminative features for AED using a boosting approach, which outperform classical speech perceptual features, such as Mel-frequency Cepstral Coefficients and log frequency filterbank parameters, and leverages statistical models better fitting the task.

...read moreread less

Journal Article•DOI•

A probabilistic multimodal approach for predicting listener backchannels

[...]

Louis-Philippe Morency¹, Iwan Kok², Jonathan Gratch¹•Institutions (2)

University of Southern California¹, University of Twente²

01 Jan 2010-Autonomous Agents and Multi-Agent Systems

TL;DR: This paper shows how sequential probabilistic models can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze).

...read moreread less

Abstract: During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

...read moreread less

Proceedings Article•

Phoneme Recognition with Large Hierarchical Reservoirs

[...]

Fabian Triefenbach¹, Azarakhsh Jalalvand¹, Benjamin Schrauwen¹, Jean-Pierre Martens¹•Institutions (1)

Ghent University¹

06 Dec 2010

TL;DR: It is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology, and in a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built.

...read moreread less

Abstract: Automatic speech recognition has gradually improved over the years, but the reliable recognition of unconstrained speech is still not within reach. In order to achieve a breakthrough, many research groups are now investigating new methodologies that have potential to outperform the Hidden Markov Model technology that is at the core of all present commercial systems. In this paper, it is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology. In a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built. The system already achieves a state-of-the-art performance, and there is evidence that the margin for further improvements is still significant.

...read moreread less

Journal Article•DOI•

A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment

[...]

Arshia Cont¹•Institutions (1)

IRCAM¹

01 Jun 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a design for a real-time music-to-score alignment system that is capable of following the musician in real time within the score and decoding the tempo (or pace) of its performance.

...read moreread less

Abstract: The capacity for real-time synchronization and coordination is a common ability among trained musicians performing a music score that presents an interesting challenge for machine intelligence. Compared to speech recognition, which has influenced many music information retrieval systems, music's temporal dynamics and complexity pose challenging problems to common approximations regarding time modeling of data streams. In this paper, we propose a design for a real-time music-to-score alignment system. Given a live recording of a musician playing a music score, the system is capable of following the musician in real time within the score and decoding the tempo (or pace) of its performance. The proposed design features two coupled audio and tempo agents within a unique probabilistic inference framework that adaptively updates its parameters based on the real-time context. Online decoding is achieved through the collaboration of the coupled agents in a Hidden Hybrid Markov/semi-Markov framework, where prediction feedback of one agent affects the behavior of the other. We perform evaluations for both real-time alignment and the proposed temporal model. An implementation of the presented system has been widely used in real concert situations worldwide and the readers are encouraged to access the actual system and experiment the results.

...read moreread less

Journal Article•DOI•

Health-State Estimation and Prognostics in Machining Processes

[...]

Fatih Camci¹, Ratna Babu Chinnam²•Institutions (2)

Fatih University¹, Wayne State University²

19 Jan 2010-IEEE Transactions on Automation Science and Engineering

TL;DR: This paper demonstrates how this difficult problem can be addressed through Hidden Markov models that are able to estimate unobservable health-states using observable sensor signals, and implementation of HMM based models as dynamic Bayesian networks (DBNs) facilitates compact representation as well as additional flexibility with regard to model structure.

...read moreread less

Abstract: Failure mechanisms of electromechanical systems usually involve several degraded health-states. Tracking and forecasting the evolution of health-states and impending failures, in the form of remaining-useful-life (RUL), is a critical challenge and regarded as the Achilles' heel of condition-based-maintenance (CBM). This paper demonstrates how this difficult problem can be addressed through Hidden Markov models (HMMs) that are able to estimate unobservable health-states using observable sensor signals. In particular, implementation of HMM based models as dynamic Bayesian networks (DBNs) facilitates compact representation as well as additional flexibility with regard to model structure. Both regular HMM pools and hierarchical HMMs are employed here to estimate online the health-state of drill-bits as they deteriorate with use on a CNC drilling machine. Hierarchical HMM is composed of sub-HMMs in a pyramid structure, providing functionality beyond an HMM for modeling complex systems. In the case of regular HMMs, each HMM within the pool competes to represent a distinct health-state and adapts through competitive learning. In the case of hierarchical HMMs, health-states are represented as distinct nodes at the top of the hierarchy. Monte Carlo simulation, with state transition probabilities derived from a hierarchical HMM, is employed for RUL estimation. Detailed results on health-state and RUL estimation are very promising and are reported in this paper. Hierarchical HMMs seem to be particularly effective and efficient and outperform other HMM methods from literature.

...read moreread less

Proceedings Article•DOI•

Accelerometer-based gesture recognition via dynamic-time warping, affinity propagation, & compressive sensing

[...]

Ahmad Akl¹, Shahrokh Valaee¹•Institutions (1)

University of Toronto¹

14 Mar 2010

TL;DR: A gesture recognition system based primarily on a single 3-axis accelerometer that achieves almost perfect user-dependent recognition and a user-independent recognition accuracy that is competitive with the statistical methods that require significantly a large number of training samples and with the other accelerometer-based gesture recognition systems available in literature.

...read moreread less

Abstract: We propose a gesture recognition system based primarily on a single 3-axis accelerometer. The system employs dynamic time warping and affinity propagation algorithms for training and utilizes the sparse nature of the gesture sequence by implementing compressive sensing for gesture recognition. A dictionary of 18 gestures is defined and a database of over 3,700 repetitions is created from 7 users. Our dictionary of gestures is the largest in published studies related to acceleration-based gesture recognition, to the best of our knowledge. The proposed system achieves almost perfect user-dependent recognition and a user-independent recognition accuracy that is competitive with the statistical methods that require significantly a large number of training samples and with the other accelerometer-based gesture recognition systems available in literature.

...read moreread less

Journal Article•DOI•

EEG-based estimation of mental fatigue by using KPCA–HMM and complexity parameters

[...]

Jian-Ping Liu, Chong Zhang¹, Chong-Xun Zheng¹•Institutions (1)

Xi'an Jiaotong University¹

01 Apr 2010-Biomedical Signal Processing and Control

TL;DR: The investigation suggests that ApEn and Kc can effectively describe the dynamic complexity of EEG, which is strongly correlated with mental fatigue.

...read moreread less

Journal Article•DOI•

Dirichlet process Gaussian mixture models: choice of the base distribution

[...]

Dilan Gorur¹, Carl Edward Rasmussen²•Institutions (2)

University College London¹, University of Cambridge²

01 Jul 2010-Journal of Computer Science and Technology

TL;DR: This paper compares the choice of conjugate and non-conjugate base distributions on a particular class of DPM models, the Dirichlet process Gaussian mixture model (DPGMM), and shows that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.

...read moreread less

Abstract: In the Bayesian mixture modeling framework it is possible to infer the necessary number of components to model the data and therefore it is unnecessary to explicitly restrict the number of components. Nonparametric mixture models sidestep the problem of finding the "correct" number of mixture components by assuming infinitely many components. In this paper Dirichlet process mixture (DPM) models are cast as infinite mixture models and inference using Markov chain Monte Carlo is described. The specification of the priors on the model parameters is often guided by mathematical and practical convenience. The primary goal of this paper is to compare the choice of conjugate and non-conjugate base distributions on a particular class of DPM models which is widely used in applications, the Dirichlet process Gaussian mixture model (DPGMM). We compare computational efficiency and modeling performance of DPGMM defined using a conjugate and a conditionally conjugate base distribution. We show that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.

...read moreread less

Journal Article•DOI•

Theory and inference for a markov switching garch model

[...]

Luc Bauwens¹, Arie Preminger²•Institutions (2)

Université catholique de Louvain¹, University of Haifa²

01 Jul 2010-Econometrics Journal

TL;DR: In this article, a Markov-switching GARCH model (MS-GARCH) is proposed, where the conditional mean and variance switch in time from one GARCH process to another.

...read moreread less

Abstract: We develop a Markov-switching GARCH model (MS-GARCH) wherein the conditional mean and variance switch in time from one GARCH process to another. The switching is governed by a hidden Markov chain. We provide su‐cient conditions for geometric ergodicity and existence of moments of the process. Because of path dependence, maximum likelihood estimation is not feasible. By enlarging the parameter space to include the state variables, Bayesian estimation using a Gibbs sampling algorithm is feasible. We illustrate the model on SP500 daily returns.

...read moreread less

Collapse