scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 2006"


Journal ArticleDOI
TL;DR: This paper overviews emotional speech recognition having in mind three goals to provide an up-to-date record of the available emotional speech data collections, and examines separately classification techniques that exploit timing information from which that ignore it.

907 citations


Journal ArticleDOI
TL;DR: Introduction to Linear Models and Statistical Inference is not meant to compete with these texts—rather, its audience is primarily those taking a statistics course within a mathematics department.
Abstract: of the simple linear regression model. Multiple linear regression for two variables is discussed in Chapter 8, and that for more than two variables is covered in Chapter 9. Chapter 10, on model building, is perhaps the book’s strongest chapter. The authors provide one of the most intuitive discussions on variable transformations that I have seen. Nice presentations of indicator variables, variable selection, and influence diagnostics are also provided. The final chapter covers a wide variety of topics, including analysis of variance models, logistic regression, and robust regression. The coverage of regression is not matrix-based, but optional linear algebra sections at the end of each chapter are useful for one wishing to use matrices. In general, the writing is clear and conceptual. A good number of exercises (about 20 on average) at the end of each chapter are provided. The exercises emphasize derivations and computations. It is difficult to name some comparison texts. Certainly, the text by Ott and Longnecker (2001) would be more suitable for a statistical methods course for an interdisciplinary audience. The regression texts of Montgomery, Peck, and Vining (2001) and Mendenhall and Sincich (2003) are more comprehensive in the regression treatment than the reviewed text. However, Introduction to Linear Models and Statistical Inference is not meant to compete with these texts—rather, its audience is primarily those taking a statistics course within a mathematics department.

802 citations


Journal ArticleDOI
TL;DR: An analysis scheme is developed that casts single-molecule time-binned FRET trajectories as hidden Markov processes, allowing one to determine, based on probability alone, the most likely FRET-value distributions of states and their interconversion rates while simultaneously determining the mostlikely time sequence of underlying states for each trajectory.

742 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper derives a discriminative sequence model with a hidden state structure, and demonstrates its utility both in a detection and in a multi-way classification formulation.
Abstract: We introduce a discriminative hidden-state approach for the recognition of human gestures. Gesture sequences often have a complex underlying structure, and models that can incorporate hidden structures have proven to be advantageous for recognition tasks. Most existing approaches to gesture recognition with hidden states employ a Hidden Markov Model or suitable variant (e.g., a factored or coupled state model) to model gesture streams; a significant limitation of these models is the requirement of conditional independence of observations. In addition, hidden states in a generative model are selected to maximize the likelihood of generating all the examples of a given gesture class, which is not necessarily optimal for discriminating the gesture class against other gestures. Previous discriminative approaches to gesture sequence recognition have shown promising results, but have not incorporated hidden states nor addressed the problem of predicting the label of an entire sequence. In this paper, we derive a discriminative sequence model with a hidden state structure, and demonstrate its utility both in a detection and in a multi-way classification formulation. We evaluate our method on the task of recognizing human arm and head gestures, and compare the performance of our method to both generative hidden state and discriminative fully-observable models.

521 citations


01 Jan 2006
TL;DR: A full account of the algorithms needed to carry out a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different channels and the practical limitations that will be encountered if these algorithms are implemented on very large data sets are discussed.
Abstract: We give a full account of the algorithms needed to carry out a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different channels and we discuss the practical limitations that will be encountered if these algorithms are implemented on very large data sets. This article is intended as a companion to (1) where we presented a new type of likelihood ratio statistic for speaker verification which is designed principally to deal with the problem of inter-session variability, that is the variability among recordings of a given speaker. This likelihood ratio statistic is based on a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different channels (such as one of the Switchboard II databases). Our purpose in the current article is to give detailed algorithms for carrying out such a factor analysis. Although we have only experimented with the applications of this model in speaker recognition we will also explain how it could serve as an integrated framework for progressive speaker-adaptation and on-line channel adaptation of HMM-based speech recognizers operating in situations where speaker identities are known. II. OVERVIEW OF THE JOINT FACTOR ANALYSIS MODEL The joint factor analysis model can be viewed Gaussian distribution on speaker- and channel-dependent (or, more accurately, session-dependent) HMM supervectors in which most (but not all) of the variance in the supervector population is assumed to be accounted for by a small number of hidden variables which we refer to as speaker and channel factors. The speaker factors and the channel factors play different roles in that, for a given speaker, the values of the speaker factors are assumed to be the same for all recordings of the speaker but the channel factors are assumed to vary from one recording to another. For example, the Gaussian distribution on speaker-dependent supervectors used in eigenvoice MAP (2) is a special case of the factor analysis model in which there are no channel factors and all of the variance in the speaker- dependent HMM supervectors is assumed to be accounted The authors are with the Centre de recherche informatique de Montr´

440 citations


Journal ArticleDOI
TL;DR: This paper investigates the feasibility of an audio-based context recognition system developed and compared to the accuracy of human listeners in the same task, with particular emphasis on the computational complexity of the methods.
Abstract: The aim of this paper is to investigate the feasibility of an audio-based context recognition system. Here, context recognition refers to the automatic classification of the context or an environment around a device. A system is developed and compared to the accuracy of human listeners in the same task. Particular emphasis is placed on the computational complexity of the methods, since the application is of particular interest in resource-constrained portable devices. Simplistic low-dimensional feature vectors are evaluated against more standard spectral features. Using discriminative training, competitive recognition accuracies are achieved with very low-order hidden Markov models (1-3 Gaussian components). Slight improvement in recognition accuracy is observed when linear data-driven feature transformations are applied to mel-cepstral features. The recognition rate of the system as a function of the test sequence length appears to converge only after about 30 to 60 s. Some degree of accuracy can be achieved even with less than 1-s test sequence lengths. The average reaction time of the human listeners was 14 s, i.e., somewhat smaller, but of the same order as that of the system. The average recognition accuracy of the system was 58% against 69%, obtained in the listening tests in recognizing between 24 everyday contexts. The accuracies in recognizing six high-level classes were 82% for the system and 88% for the subjects.

436 citations


Book ChapterDOI
07 May 2006
TL;DR: This work decomposes the high dimensional 3-D joint space into a set of feature spaces where each feature corresponds to the motion of a single joint or combination of related multiple joints.
Abstract: Our goal is to automatically segment and recognize basic human actions, such as stand, walk and wave hands, from a sequence of joint positions or pose angles. Such recognition is difficult due to high dimensionality of the data and large spatial and temporal variations in the same action. We decompose the high dimensional 3-D joint space into a set of feature spaces where each feature corresponds to the motion of a single joint or combination of related multiple joints. For each feature, the dynamics of each action class is learned with one HMM. Given a sequence, the observation probability is computed in each HMM and a weak classifier for that feature is formed based on those probabilities. The weak classifiers with strong discriminative power are then combined by the Multi-Class AdaBoost (AdaBoost.M2) algorithm. A dynamic programming algorithm is applied to segment and recognize actions simultaneously. Results of recognizing 22 actions on a large number of motion capture sequences as well as several annotated and automatically tracked sequences show the effectiveness of the proposed algorithms.

418 citations


Journal ArticleDOI
TL;DR: Algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random field (CRF) and maximum entropy Markov models (MEMM) are presented, which outperform HMMs in classifying not only diverse human activities like walking, jumping.

335 citations


Journal ArticleDOI
01 Oct 2006-Genetics
TL;DR: A new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets based on the concept of hidden Markov random field, which models the spatial dependencies at the cluster membership level is introduced.
Abstract: We introduce a new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets. The algorithm is based on the concept of hidden Markov random field, which models the spatial dependencies at the cluster membership level. We argue that (i) a Markov chain Monte Carlo procedure can implement the algorithm efficiently, (ii) it can detect significant geographical discontinuities in allele frequencies and regulate the number of clusters, (iii) it can check whether the clusters obtained without the use of spatial priors are robust to the hypothesis of discontinuous geographical variation in allele frequencies, and (iv) it can reduce the number of loci required to obtain accurate assignments. We illustrate and discuss the implementation issues with the Scandinavian brown bear and the human CEPH diversity panel data set.

302 citations


Journal ArticleDOI
TL;DR: A metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier is described, and it is found that discriminative models generally outperform generative models.
Abstract: Effective human and automatic processing of speech requires recovery of more than just the words. It also involves recovering phenomena such as sentence boundaries, filler words, and disfluencies, referred to as structural metadata. We describe a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, as well as the predominant hidden Markov model (HMM) approach, and find that discriminative models generally outperform generative models. We report system performance on both broadcast news and conversational telephone speech tasks, illustrating significant performance differences across tasks and as a function of recognizer performance. The results represent the state of the art, as assessed in the NIST RT-04F evaluation

297 citations


Journal ArticleDOI
TL;DR: This paper presents an original hidden Markov model (HMM) approach for online beat segmentation and classification of electrocardiograms, and the results obtained validate the approach for real world application.
Abstract: This paper presents an original hidden Markov model (HMM) approach for online beat segmentation and classification of electrocardiograms. The HMM framework has been visited because of its ability of beat detection, segmentation and classification, highly suitable to the electrocardiogram (ECG) problem. Our approach addresses a large panel of topics some of them never studied before in other HMM related works: waveforms modeling, multichannel beat segmentation and classification, and unsupervised adaptation to the patient's ECG. The performance was evaluated on the two-channel QT database in terms of waveform segmentation precision, beat detection and classification. Our waveform segmentation results compare favorably to other systems in the literature. We also obtained high beat detection performance with sensitivity of 99.79% and a positive predictivity of 99.96%, using a test set of 59 recordings. Moreover, premature ventricular contraction beats were detected using an original classification strategy. The results obtained validate our approach for real world application

Proceedings ArticleDOI
17 Jun 2006
TL;DR: In this article, an efficient motion vs non-motion classifier is trained to operate directly and jointly on intensity-change and contrast, and its output is then fused with colour information.
Abstract: This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences. Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and temporal priors to infer layers accurately and efficiently. Central to our algorithm is the fact that pixel velocities are not needed, thus removing the need for optical flow estimation, with its tendency to error and computational expense. Instead, an efficient motion vs nonmotion classifier is trained to operate directly and jointly on intensity-change and contrast. Its output is then fused with colour information. The prior on segmentation is represented by a second order, temporal, Hidden Markov Model, together with a spatial MRF favouring coherence except where contrast is high. Finally, accurate layer segmentation and explicit occlusion detection are efficiently achieved by binary graph cut. The segmentation accuracy of the proposed algorithm is quantitatively evaluated with respect to existing groundtruth data and found to be comparable to the accuracy of a state of the art stereo segmentation algorithm. Foreground/ background segmentation is demonstrated in the application of live background substitution and shown to generate convincingly good quality composite video.

Journal ArticleDOI
TL;DR: A Dynamically Multi-Linked Hidden Markov Model (DML-HMM) is developed based on the discovery of salient dynamic interlinks among multiple temporal processes corresponding to multiple event classes resulting in its topology being intrinsically determined by the underlying causality and temporal order among events.
Abstract: In this work, we present a unified bottom-up and top-down automatic model selection based approach for modelling complex activities of multiple objects in cluttered scenes. An activity of multiple objects is represented based on discrete scene events and their behaviours are modelled by reasoning about the temporal and causal correlations among different events. This is significantly different from the majority of the existing techniques that are centred on object tracking followed by trajectory matching. In our approach, object-independent events are detected and classified by unsupervised clustering using Expectation-Maximisation (EM) and classified using automatic model selection based on Schwarz's Bayesian Information Criterion (BIC). Dynamic Probabilistic Networks (DPNs) are formulated for modelling the temporal and causal correlations among discrete events for robust and holistic scene-level behaviour interpretation. In particular, we developed a Dynamically Multi-Linked Hidden Markov Model (DML-HMM) based on the discovery of salient dynamic interlinks among multiple temporal processes corresponding to multiple event classes. A DML-HMM is built using BIC based factorisation resulting in its topology being intrinsically determined by the underlying causality and temporal order among events. Extensive experiments are conducted on modelling activities captured in different indoor and outdoor scenes. Our experimental results demonstrate that the performance of a DML-HMM on modelling group activities in a noisy and cluttered scene is superior compared to those of other comparable dynamic probabilistic networks including a Multi-Observation Hidden Markov Model (MOHMM), a Parallel Hidden Markov Model (PaHMM) and a Coupled Hidden Markov Model (CHMM).

Journal ArticleDOI
TL;DR: The proposed multistream HMM facial expression system, which utilizes stream reliability weights, achieves relative reduction of the facial expression recognition error of 44% compared to the single-stream HMM system.
Abstract: The performance of an automatic facial expression recognition system can be significantly improved by modeling the reliability of different streams of facial expression information utilizing multistream hidden Markov models (HMMs). In this paper, we present an automatic multistream HMM facial expression recognition system and analyze its performance. The proposed system utilizes facial animation parameters (FAPs), supported by the MPEG-4 standard, as features for facial expression classification. Specifically, the FAPs describing the movement of the outer-lip contours and eyebrows are used as observations. Experiments are first performed employing single-stream HMMs under several different scenarios, utilizing outer-lip and eyebrow FAPs individually and jointly. A multistream HMM approach is proposed for introducing facial expression and FAP group dependent stream reliability weights. The stream weights are determined based on the facial expression recognition results obtained when FAP streams are utilized individually. The proposed multistream HMM facial expression system, which utilizes stream reliability weights, achieves relative reduction of the facial expression recognition error of 44% compared to the single-stream HMM system.

Journal ArticleDOI
TL;DR: This paper concludes that, in search of a model to supersede the HMM (say for ASR), rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.
Abstract: Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems---today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial article analyzes HMMs by exploring a definition of HMMs in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM (say for ASR), rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.

Journal ArticleDOI
TL;DR: In this paper, a detailed explanation of methods to work to overcome the difficulties of Bayesian inference for hidden Markov chain models is given. But the authors do not consider the model specification issues that apply particularly to structural vector autoregressions.
Abstract: The inference for hidden Markov chain models in which the structure is a multiple-equation macroeconomic model raises a number of difficulties that are not as likely to appear in smaller models. One is likely to want to allow for many states in the Markov chain without allowing the number of free parameters in the transition matrix to grow as the square of the number of states but also without losing a convenient form for the posterior distribution of the transition matrix. Calculation of marginal data densities for assessing model fit is often difficult in high-dimensional models and seems particularly difficult in these models. This paper gives a detailed explanation of methods we have found to work to overcome these difficulties. It also makes suggestions for maximizing posterior density and initiating Markov chain Monte Carlo simulations that provide some robustness against the complex shape of the likelihood in these models. These difficulties and remedies are likely to be useful generally for Bayesian inference in large time-series models. The paper includes some discussion of model specification issues that apply particularly to structural vector autoregressions with a Markov-switching structure.

Proceedings Article
04 Dec 2006
TL;DR: This work proposes a learning algorithm based on the goal of margin maximization in continuous density hidden Markov models for automatic speech recognition (ASR) using Gaussian mixture models, and obtains competitive results for phonetic recognition on the TIMIT speech corpus.
Abstract: We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: Preliminary results are presented that demonstrate the usefulness of the segmentation approach for distinguishing between a few common activities, specifically with fall detection in mind.
Abstract: A major problem among the elderly involves falling. The recognition of falls from video first requires the segmentation of the individual from the background. To ensure privacy, segmentation should result in a silhouette that is a binary map indicating only the body position of the individual in an image. We have previously demonstrated a segmentation method based on color that can recognize the silhouette and detect and remove shadows. After the silhouettes are obtained, we extract features and train hidden Markov models to recognize future performances of these known activities. In this paper, we present preliminary results that demonstrate the usefulness of this approach for distinguishing between a few common activities, specifically with fall detection in mind. and dynamic environments. High-level knowledge was fused with low-level feature-based classification to handle a time-varying background, and a decision process based on a fuzzy logic inference system was used to detach the moving objects from the human silhouette. One central task in silhouette extraction is background modeling. Once a background model is established, those image regions with significantly different characteristics from the background are considered foreground objects. In (5), a Least Median of Squares method was used to construct a background model, and in (6), a differencing function was proposed to extract moving human silhouettes. Elgammal et al. (7) demonstrated a non-parametric kernel density estimation method to model and subtract the human from a background. The adaptive background updating handles small motion in the background scene, such as moving tree branches. A slightly different approach outlined in (8) is based on processing in the YUV color space, and is capable of detecting shadows and extracting

Proceedings ArticleDOI
01 Dec 2006
TL;DR: A model of the human upper body is created to simulate the reproduction of dual-arm movements and generate natural-looking joint configurations from tracked hand paths and shows how HMM can be used to detect temporal dependencies between both arms in dual- arm tasks.
Abstract: In this paper, we deal with imitation learning of arm movements in humanoid robots. Hidden Markov Models (HMM) are used to generalize movements demonstrated to a robot multiple times. They are trained with the characteristic features (key points) of each demonstration. Using the same HMM, key points that are common to all demonstrations are identified; only those are considered when reproducing a movement. We also show how HMM can be used to detect temporal dependencies between both arms in dual-arm tasks. We created a model of the human upper body to simulate the reproduction of dual-arm movements and generate natural-looking joint configurations from tracked hand paths. Results are presented and discussed.

Journal ArticleDOI
TL;DR: A two-layer hidden Markov model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works, is proposed that is easier to interpret and easier to improve.
Abstract: We address the problem of recognizing sequences of human interaction patterns in meetings, with the goal of structuring them in semantic terms. The investigated patterns are inherently group-based (defined by the individual activities of meeting participants, and their interplay), and multimodal (as captured by cameras and microphones). By defining a proper set of individual actions, group actions can be modeled as a two-layer process, one that models basic individual activities from low-level audio-visual (AV) features,and another one that models the interactions. We propose a two-layer hidden Markov model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works. First, by decomposing the problem hierarchically, learning is performed on low-dimensional observation spaces, which results in simpler models. Second, our framework is easier to interpret, as both individual and group actions have a clear meaning, and thus easier to improve. Third, different HMMs can be used in each layer, to better reflect the nature of each subproblem. Our framework is general and extensible, and we illustrate it with a set of eight group actions, using a public 5-hour meeting corpus. Experiments and comparison with a single-layer HMM baseline system show its validity.

Journal ArticleDOI
TL;DR: A system for human behaviour recognition in video sequences that combines Bayesian networks and belief propagation, non-parametric sampling from a previously learned database of actions, and Hidden Markov Models which encode scene rules are used to smooth sequences of actions.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: The results with simulated crowds show the effectiveness of the proposed approach on detecting abnormalities in dense crowds, and in order to increase the detection sensitivity a local modelling approach is used.
Abstract: This paper presents an event detector for emergencies in crowds. Assuming a single camera and a dense crowd we rely on optical flow instead of tracking statistics as a feature to extract information from the crowd video data. The optical flow features are encoded with Hidden Markov Models to allow for the detection of emergency or abnormal events in the crowd. In order to increase the detection sensitivity a local modelling approach is used. The results with simulated crowds show the effectiveness of the proposed approach on detecting abnormalities in dense crowds.

Journal ArticleDOI
TL;DR: A new method for segmenting array comparative genomic hybridization data into states with the same underlying copy number by utilizing a heterogeneous hidden Markov model that incorporates relevant biological factors in the segmentation process.
Abstract: Summary: We have developed a new method (BioHMM) for segmenting array comparative genomic hybridization data into states with the same underlying copy number. By utilizing a heterogeneous hidden Markov model, BioHMM incorporates relevant biological factors (e.g. the distance between adjacent clones) in the segmentation process. Availability: BioHMM is available as part of the R library snapCGH which can be downloaded from http://www.bioconductor.org/packages/bioc/1.8/html/snapCGH.html Contact: J.Marioni@damtp.cam.ac.uk Supplementary information: Supplementary information is available at http://www.damtp.cam.ac.uk/user/jcm68/BioHMM.html

Proceedings ArticleDOI
04 Sep 2006
TL;DR: A novel method to detect smoke in video using a Hidden Markov model mimicking the temporal behavior of the smoke and boundary of smoke regions are represented in wavelet domain and high frequency nature of the boundaries of Smoke regions is used as a clue to model the smoke flicker.
Abstract: This paper proposes a novel method to detect smoke in video. It is assumed the camera monitoring the scene is stationary. The smoke is semi-transparent at the early stages of a fire. Therefore edges present in image frames start loosing their sharpness and this leads to a decrease in the high frequency content of the image. The background of the scene is estimated and decrease of high frequency energy of the scene is monitored using the spatial wavelet transforms of the current and the background images. Edges of the scene produce local extrema in the wavelet domain and a decrease in the energy content of these edges is an important indicator of smoke in the viewing range of the camera. Moreover, scene becomes grayish when there is smoke and this leads to a decrease in chrominance values of pixels. Periodic behavior in smoke boundaries is also analyzed using a Hidden Markov model (HMM) mimicking the temporal behavior of the smoke. In addition, boundary of smoke regions are represented in wavelet domain and high frequency nature of the boundaries of smoke regions is also used as a clue to model the smoke flicker. All these clues are combined to reach a final decision.

Journal ArticleDOI
TL;DR: It is shown that this stylized fact of the slow decay in the autocorrelation function of the squared returns can be described much better by means of hidden semi-Markov models.

Journal ArticleDOI
TL;DR: A flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context, and a Bayesian network-based approach is proposed to further discover the high-level semantics of a auditory context by integrating prior knowledge and statistical learning.
Abstract: Key audio effects are those special effects that play critical roles in human's perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.

Journal Article
TL;DR: This paper examines two alternative approaches known as independent value prediction and parameter prediction, which builds a separate model for each prediction step using the values observed in the past to fit a parametric function to the time series.
Abstract: Multistep-ahead prediction is the task of predicting a sequence of values in a time series. A typical approach, known as multi-stage prediction, is to apply a predictive model step-by-step and use the predicted value of the current time step to determine its value in the next time step. This paper examines two alternative approaches known as independent value prediction and parameter prediction. The first approach builds a separate model for each prediction step using the values observed in the past. The second approach fits a parametric function to the time series and builds models to predict the parameters of the function. We perform a comparative study on the three approaches using multiple linear regression, recurrent neural networks, and a hybrid of hidden Markov model with multiple linear regression. The advantages and disadvantages of each approach are analyzed in terms of their error accumulation, smoothness of prediction, and learning difficulty.

Journal ArticleDOI
TL;DR: Experimental results show that T2 FHMMs can effectively handle noise and dialect uncertainties in speech signals besides a better classification performance than the classical HMMs.
Abstract: This paper presents an extension of hidden Markov models (HMMs) based on the type-2 (T2) fuzzy set (FS) referred to as type-2 fuzzy HMMs (T2 FHMMs). Membership functions (MFs) of T2 FSs are three-dimensional, and this new third dimension offers additional degrees of freedom to evaluate the HMMs fuzziness. Therefore, T2 FHMMs are able to handle both random and fuzzy uncertainties existing universally in the sequential data. We derive the T2 fuzzy forward-backward algorithm and Viterbi algorithm using T2 FS operations. In order to investigate the effectiveness of T2 FHMMs, we apply them to phoneme classification and recognition on the TIMIT speech database. Experimental results show that T2 FHMMs can effectively handle noise and dialect uncertainties in speech signals besides a better classification performance than the classical HMMs.

Journal ArticleDOI
TL;DR: This work proposes to use a new fuzzy version of hidden Markov chains (HMCs) to address fuzzy change detection with a statistical approach, and to simultaneously use Dirac and Lebesgue measures at the class chain level.
Abstract: This work deals with unsupervised change detection in temporal sets of synthetic aperture radar (SAR) images. We focus on one of the most widely used change detector in the SAR context, the so-called log-ratio. In order to deal with the classification issue, we propose to use a new fuzzy version of hidden Markov chains (HMCs), and thus to address fuzzy change detection with a statistical approach. The main characteristic of the proposed model is to simultaneously use Dirac and Lebesgue measures at the class chain level. This allows the coexistence of hard pixels (obtained with the classical HMC segmentation) and fuzzy pixels (obtained with the fuzzy measure) in the same image. The quality assessment of the proposed method is achieved with several bidate sets of simulated images, and comparisons with classical HMC are also provided. Experimental results on real European Remote Sensing 2 Precision Image (ERS-2 PRI) images confirm the effectiveness of the proposed approach.

Proceedings Article
01 Jan 2006
TL;DR: The Minimum Bayes Risk framework has been a successful strategy for the training of hidden Markov models for large vocabulary speech recognition but use of phoneme-based criteria appears to be more successful.
Abstract: The Minimum Bayes Risk (MBR) framework has been a successful strategy for the training of hidden Markov models for large vocabulary speech recognition. Practical implementations of MBR must select an appropriate hypothesis space and loss function. The set of word sequences and a word-based Levenshtein distance may be assumed to be the optimal choice but use of phoneme-based criteria appears to be more successful. This paper compares the use of different hypothesis spaces and loss functions defined using the system constituents of word, phone, physical triphone, physical state and physical mixture component. For practical reasons the competing hypotheses are constrained by sampling. The impact of the sampling technique on the performance of MBR training is also examined.