scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2013"


Journal ArticleDOI
TL;DR: This work proposes to use the Fisher Kernel framework as an alternative patch encoding strategy: it describes patches by their deviation from an “universal” generative Gaussian mixture model, and reports experimental results showing that the FV framework is a state-of-the-art patch encoding technique.
Abstract: A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements This leads to the popular Bag-of-Visual words representation In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an "universal" generative Gaussian mixture model This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization We report experimental results on five standard datasets--PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K--with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique

1,594 citations


Posted Content
TL;DR: Expectation Propagation (EP) as mentioned in this paper is a deterministic approximation technique in Bayesian networks that unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation.
Abstract: This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation", unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. All three algorithms try to recover an approximate distribution which is close in KL divergence to the true distribution. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining certain expectations, such as mean and variance, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Expectation Propagation also extends belief propagation in the opposite direction - it can propagate richer belief states that incorporate correlations between nodes. Experiments with Gaussian mixture models show Expectation Propagation to be convincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers.

1,365 citations


Journal ArticleDOI
TL;DR: A general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence Relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations.
Abstract: We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, nonoriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: 1) They efficiently model articulation by sharing computation across similar warps, 2) they efficiently model an exponentially large set of global mixtures through composition of local mixtures, and 3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets while being orders of magnitude faster.

888 citations


Journal ArticleDOI
TL;DR: A message passing interface version of PhyloBayes is introduced, implementing the Dirichlet process mixture models as well as more classical empirical matrices and finite mixtures, allowing faster phylogenetic reconstruction under complex mixture models.
Abstract: Modeling across site variation of the substitution process is increasingly recognized as important for obtaining more accurate phylogenetic reconstructions. Both finite and infinite mixture models have been proposed and have been shown to significantly improve on classical single-matrix models. Compared with their finite counterparts, infinite mixtures have a greater expressivity. However, they are computationally more challenging. This has resulted in practical compromises in the design of infinite mixture models. In particular, a fast but simplified version of a Dirichlet process model over equilibrium frequency profiles implemented in PhyloBayes has often been used in recent phylogenomics studies, while more refined model structures, more realistic and empirically more fit, have been practically out of reach. We introduce a message passing interface version of PhyloBayes, implementing the Dirichlet process mixture models as well as more classical empirical matrices and finite mixtures. The parallelization is made efficient thanks to the combination of two algorithmic strategies: a partial Gibbs sampling update of the tree topology and the use of a truncated stick-breaking representation for the Dirichlet process prior. The implementation shows close to linear gains in computational speed for up to 64 cores, thus allowing faster phylogenetic reconstruction under complex mixture models. PhyloBayes MPI is freely available from our website www.phylobayes.org.

653 citations


Journal ArticleDOI
TL;DR: A flexible model-based approach is proposed to empirically derive and summarize the class-dependent density functions of distal outcomes with categorical, continuous, or count distributions and is demonstrated empirically: latent classes of adolescent depression are used to predict smoking, grades, and delinquency.
Abstract: Although prediction of class membership from observed variables in latent class analysis is well understood, predicting an observed distal outcome from latent class membership is more complicated. A flexible model-based approach is proposed to empirically derive and summarize the class-dependent density functions of distal outcomes with categorical, continuous, or count distributions. A Monte Carlo simulation study is conducted to compare the performance of the new technique to two commonly used classify-analyze techniques: maximum-probability assignment and multiple pseudo-class draws. Simulation results show that the model-based approach produces substantially less biased estimates of the effect compared to either classify-analyze technique, particularly when the association between the latent class variable and the distal outcome is strong. In addition, we show that only the model-based approach is consistent. The approach is demonstrated empirically: latent classes of adolescent depression are used to predict smoking, grades, and delinquency. SAS syntax for implementing this approach using PROC LCA and a corresponding macro are provided.

526 citations


Journal ArticleDOI
TL;DR: An empirical-Bayesian technique is proposed that simultaneously learns the signal distribution while MMSE-recovering the signal-according to the learned distribution-using AMP, and model the non-zero distribution as a Gaussian mixture, and learn its parameters through expectation maximization, using AMP to implement the expectation step.
Abstract: When recovering a sparse signal from noisy compressive linear measurements, the distribution of the signal's non-zero coefficients can have a profound effect on recovery mean-squared error (MSE). If this distribution was a priori known, then one could use computationally efficient approximate message passing (AMP) techniques for nearly minimum MSE (MMSE) recovery. In practice, however, the distribution is unknown, motivating the use of robust algorithms like LASSO-which is nearly minimax optimal-at the cost of significantly larger MSE for non-least-favorable distributions. As an alternative, we propose an empirical-Bayesian technique that simultaneously learns the signal distribution while MMSE-recovering the signal-according to the learned distribution-using AMP. In particular, we model the non-zero distribution as a Gaussian mixture and learn its parameters through expectation maximization, using AMP to implement the expectation step. Numerical experiments on a wide range of signal classes confirm the state-of-the-art performance of our approach, in both reconstruction error and runtime, in the high-dimensional regime, for most (but not all) sensing operators.

375 citations


Journal ArticleDOI
TL;DR: Novel cooperative spectrum sensing algorithms for cognitive radio (CR) networks based on machine learning techniques which are used for pattern classification outperform the existing state-of-the-art CSS techniques.
Abstract: We propose novel cooperative spectrum sensing (CSS) algorithms for cognitive radio (CR) networks based on machine learning techniques which are used for pattern classification. In this regard, unsupervised (e.g., K-means clustering and Gaussian mixture model (GMM)) and supervised (e.g., support vector machine (SVM) and weighted K-nearest-neighbor (KNN)) learning-based classification techniques are implemented for CSS. For a radio channel, the vector of the energy levels estimated at CR devices is treated as a feature vector and fed into a classifier to decide whether the channel is available or not. The classifier categorizes each feature vector into either of the two classes, namely, the "channel available class" and the "channel unavailable class". Prior to the online classification, the classifier needs to go through a training phase. For classification, the K-means clustering algorithm partitions the training feature vectors into K clusters, where each cluster corresponds to a combined state of primary users (PUs) and then the classifier determines the class the test energy vector belongs to. The GMM obtains a mixture of Gaussian density functions that well describes the training feature vectors. In the case of the SVM, the support vectors (i.e., a subset of training vectors which fully specify the decision function) are obtained by maximizing the margin between the separating hyperplane and the training feature vectors. Furthermore, the weighted KNN classification technique is proposed for CSS for which the weight of each feature vector is calculated by evaluating the area under the receiver operating characteristic (ROC) curve of that feature vector. The performance of each classification technique is quantified in terms of the average training time, the sample classification delay, and the ROC curve. Our comparative results clearly reveal that the proposed algorithms outperform the existing state-of-the-art CSS techniques.

353 citations


Proceedings ArticleDOI
Daniel Hsu1, Sham M. Kakade1
09 Jan 2013
TL;DR: In this article, a simple spectral decomposition technique was used to obtain consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationally efficient estimation procedures.
Abstract: This work provides a computationally efficient and statistically consistent moment-based estimator for mixtures of spherical Gaussians. Under the condition that component means are in general position, a simple spectral decomposition technique yields consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationally efficient estimation procedures. Thus computational and information-theoretic barriers to efficient estimation in mixture models are precluded when the mixture components have means in general position and spherical covariances. Some connections are made to estimation problems related to independent component analysis.

285 citations


Journal ArticleDOI
TL;DR: This work proposes a variant of the EM algorithm that iteratively maximizes the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations.
Abstract: We consider the problem of parameter estimation in statistical models in the case where data are uncertain and represented as belief functions. The proposed method is based on the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations. We propose a variant of the EM algorithm that iteratively maximizes this criterion. As an illustration, the method is applied to uncertain data clustering using finite mixture models, in the cases of categorical and continuous attributes.

249 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy.
Abstract: Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-of-the-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.

232 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: A low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise, which is a universal approximator for any continuous distribution, and hence is able to model a wider range of real noise distributions.
Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from high-dimensional visual data. Factorization approaches to low-rank subspace estimation minimize a loss function between the observed measurement matrix and a bilinear factorization. Most popular loss functions include the L1 and L2 losses. While L1 is optimal for Laplacian distributed noise, L2 is optimal for Gaussian noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of real noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive synthetic, structure from motion, face modeling and background subtraction experiments.

Proceedings ArticleDOI
Thad Hughes1, Keir Banks Mierle1
26 May 2013
TL;DR: This work presents a novel recurrent neural network model for voice activity detection, in which nodes compute quadratic polynomials and outperforms a much larger baseline system composed of Gaussian mixture models and a hand-tuned state machine for temporal smoothing.
Abstract: We present a novel recurrent neural network (RNN) model for voice activity detection. Our multi-layer RNN model, in which nodes compute quadratic polynomials, outperforms a much larger baseline system composed of Gaussian mixture models (GMMs) and a hand-tuned state machine (SM) for temporal smoothing. All parameters of our RNN model are optimized together, so that it properly weights its preference for temporal continuity against the acoustic features in each frame. Our RNN uses one tenth the parameters and outperforms the GMM+SM baseline system by 26% reduction in false alarms, reducing overall speech recognition computation time by 17% while reducing word error rate by 1% relative.

Journal ArticleDOI
TL;DR: A systematic classification of the existing skew symmetric distributions into four types is presented, thereby clarifying their close relationships and aiding in understanding the link between some of the proposed expectation-maximization based algorithms for the computation of the maximum likelihood estimates of the parameters of the models.
Abstract: Finite mixtures of skew distributions have emerged as an effective tool in modelling heterogeneous data with asymmetric features. With various proposals appearing rapidly in the recent years, which are similar but not identical, the connection between them and their relative performance becomes rather unclear. This paper aims to provide a concise overview of these developments by presenting a systematic classification of the existing skew symmetric distributions into four types, thereby clarifying their close relationships. This also aids in understanding the link between some of the proposed expectation-maximization based algorithms for the computation of the maximum likelihood estimates of the parameters of the models. The final part of this paper presents an illustration of the performance of these mixture models in clustering a real dataset, relative to other non-elliptically contoured clustering methods and associated algorithms for their implementation.

Journal ArticleDOI
TL;DR: A new kernel-based paradigm that relies on the assumption that the mixing mechanism can be described by a linear mixture of endmember spectra, with additive nonlinear fluctuations defined in a reproducing kernel Hilbert space is formulated.
Abstract: Spectral unmixing is an important issue to analyze remotely sensed hyperspectral data. Although the linear mixture model has obvious practical advantages, there are many situations in which it may not be appropriate and could be advantageously replaced by a nonlinear one. In this paper, we formulate a new kernel-based paradigm that relies on the assumption that the mixing mechanism can be described by a linear mixture of endmember spectra, with additive nonlinear fluctuations defined in a reproducing kernel Hilbert space. This family of models has clear interpretation, and allows to take complex interactions of endmembers into account. Extensive experiment results, with both synthetic and real images, illustrate the generality and effectiveness of this scheme compared with state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this paper, a mixture procedure is proposed to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one another.
Abstract: We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one another. Observations are assumed initially to be independent standard normal random variables. After a change-point the observations in a subset of the streams of data have nonzero mean values. The subset and the post-change means are unknown. The procedure we study uses stream specific generalized likelihood ratio statistics, which are combined to form an overall detection statistic in a mixture model that hypothesizes an assumed fraction $p_{0}$ of affected data streams. An analytic expression is obtained for the average run length (ARL) when there is no change and is shown by simulations to be very accurate. Similarly, an approximation for the expected detection delay (EDD) after a change-point is also obtained. Numerical examples are given to compare the suggested procedure to other procedures for unstructured problems and in one case where the problem is assumed to have a well-defined geometric structure. Finally we discuss sensitivity of the procedure to the assumed value of $p_{0}$ and suggest a generalization.

Journal ArticleDOI
TL;DR: An improved clustering method is integrated with an existing re-segmentation algorithm and an iterative optimization scheme is implemented that demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner.
Abstract: In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors developed an object detection framework using a discriminatively trained mixture model, which is mainly composed of two stages: model training and object detection, where multi-scale histogram of oriented gradients (HOG) feature pyramids of all training samples are constructed.
Abstract: Automatically detecting objects with complex appearance and arbitrary orientations in remote sensing imagery (RSI) is a big challenge. To explore a possible solution to the problem, this paper develops an object detection framework using a discriminatively trained mixture model. It is mainly composed of two stages: model training and object detection. In the model training stage, multi-scale histogram of oriented gradients (HOG) feature pyramids of all training samples are constructed. A mixture of multi-scale deformable part-based models is then trained for each object category by training a latent Support Vector Machine (SVM), where each part-based model is composed of a coarse root filter, a set of higher resolution part filters, and a set of deformation models. In the object detection stage, given a test imagery, its multi-scale HOG feature pyramid is firstly constructed. Then, object detection is performed by computing and thresholding the response of the mixture model. The quantitative comparisons with state-of-the-art approaches on two datasets demonstrate the effectiveness of the developed framework.

Posted Content
TL;DR: RNADE as mentioned in this paper calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters, and it outperforms mixture models in all but one case.
Abstract: We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case.

Journal ArticleDOI
TL;DR: A new way to incorporate spatial information between neighboring pixels into the Gaussian mixture model based on Markov random field (MRF) to demonstrate its robustness, accuracy, and effectiveness, compared with other mixture models.
Abstract: In this paper, a new mixture model for image segmentation is presented. We propose a new way to incorporate spatial information between neighboring pixels into the Gaussian mixture model based on Markov random field (MRF). In comparison to other mixture models that are complex and computationally expensive, the proposed method is fast and easy to implement. In mixture models based on MRF, the M-step of the expectation-maximization (EM) algorithm cannot be directly applied to the prior distribution ${\pi_{ij}}$ for maximization of the log-likelihood with respect to the corresponding parameters. Compared with these models, our proposed method directly applies the EM algorithm to optimize the parameters, which makes it much simpler. Experimental results obtained by employing the proposed method on many synthetic and real-world grayscale and colored images demonstrate its robustness, accuracy, and effectiveness, compared with other mixture models.

Proceedings ArticleDOI
06 May 2013
TL;DR: A novel uncertainty measure for sparse RGD-B features based on a Gaussian mixture model for the filtering stage and the registration algorithm is capable of closing small-scale loops in indoor environments online without any additional SLAM back-end techniques.
Abstract: An RGB-D camera is a sensor which outputs color and depth and information about the scene it observes. In this paper, we present a real-time visual odometry and mapping system for RGB-D cameras. The system runs at frequencies of 30Hz and higher in a single thread on a desktop CPU with no GPU acceleration required. We recover the unconstrained 6-DoF trajectory of a moving camera by aligning sparse features observed in the current RGB-D image against a model of previous features. The model is persistent and dynamically updated from new observations using a Kalman Filter. We formulate a novel uncertainty measure for sparse RGD-B features based on a Gaussian mixture model for the filtering stage. Our registration algorithm is capable of closing small-scale loops in indoor environments online without any additional SLAM back-end techniques.

Journal ArticleDOI
TL;DR: An extensive study of the behavior of OCRF is proposed, that includes experiments on various UCI public datasets and comparison to reference one class namely, Gaussian density models, Parzen estimators,Gaussian mixture models and One Class SVMs-with statistical significance.

Journal ArticleDOI
TL;DR: In this paper, the convergence behavior of latent mixing measures that arise in finite and infinite mixture models, using transportation distances (i.e., Wasserstein metrics), is investigated in detail using various identifiability conditions.
Abstract: This paper studies convergence behavior of latent mixing measures that arise in finite and infinite mixture models, using transportation distances (i.e., Wasserstein metrics). The relationship between Wasserstein distances on the space of mixing measures and $f$-divergence functionals such as Hellinger and Kullback–Leibler distances on the space of mixture distributions is investigated in detail using various identifiability conditions. Convergence in Wasserstein metrics for discrete measures implies convergence of individual atoms that provide support for the measures, thereby providing a natural interpretation of convergence of clusters in clustering applications where mixture models are typically employed. Convergence rates of posterior distributions for latent mixing measures are established, for both finite mixtures of multivariate distributions and infinite mixtures based on the Dirichlet process.

Journal ArticleDOI
TL;DR: A novel patient-specific seizure prediction method based on the analysis of positive zero-crossing intervals in scalp electroencephalogram (EEG) based on a variational Bayesian Gaussian mixture model of the data is proposed.
Abstract: A novel patient-specific seizure prediction method based on the analysis of positive zero-crossing intervals in scalp electroencephalogram (EEG) is proposed. In a moving-window analysis, the histogram of these intervals for the current EEG epoch is computed, and the values corresponding to specific bins are selected as an observation. Then, the set of observations from the last 5 min is compared with two reference sets of data points (preictal and interictal) through novel measures of similarity and dissimilarity based on a variational Bayesian Gaussian mixture model of the data. A combined index is then computed and compared with a patient-specific threshold, resulting in a cumulative measure which is utilized to form an alarm sequence for each channel. Finally, this channel-based information is used to generate a seizure prediction alarm. The proposed method was evaluated using ~ 561 h of scalp EEG including a total of 86 seizures in 20 patients. A high sensitivity of 88.34% was achieved with a false prediction rate of 0.155 h-1 and an average prediction time of 22.5 min for the test dataset. The proposed method was also tested against a Poisson-based random predictor.

Proceedings ArticleDOI
02 Sep 2013
TL;DR: Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM.
Abstract: Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

Journal ArticleDOI
TL;DR: The factor mixture model (FMM) is explored by studying a real data example on conduct disorder and the different formulations of the FMM are explained, the various steps in building a FMM, and how to decide between an FMM and alternative models are explained.
Abstract: The factor mixture model (FMM) uses a hybrid of both categorical and continuous latent variables. The FMM is a good model for the underlying structure of psychopathology because the use of both categorical and continuous latent variables allows the structure to be simultaneously categorical and dimensional. This is useful because both diagnostic class membership and the range of severity within and across diagnostic classes can be modeled concurrently. While the conceptualization of the FMM has been explained in the literature, the use of the FMM is still not prevalent. One reason is that there is little research about how such models should be applied in practice and, once a well fitting model is obtained, how it should be interpreted. In this paper, the FMM will be explored by studying a real data example on conduct disorder. By exploring this example, this paper aims to explain the different formulations of the FMM, the various steps in building a FMM, as well as how to decide between a FMM and alternative models.

Journal ArticleDOI
TL;DR: This work develops a method that employs a probabilistic interpretation of theAdmissible region and approximates the admissible region by a Gaussian mixture to formulate an initial orbit determination solution.
Abstract: The most complete description of the state of a system at any time is given by knowledge of the probability density function, which describes the locus of possible states conditioned on any available measurement information. When employing optical data, the concept of the admissible region provides a physics-based region of the range/range-rate space that produces Earth-bound orbit solutions. This work develops a method that employs a probabilistic interpretation of the admissible region and approximates the admissible region by a Gaussian mixture to formulate an initial orbit determination solution. The Gaussian mixture representation of the probability density function is then forecast and updated with subsequent data to iteratively refine the region of uncertainty. Simulation results are presented using synthetic data over a range of orbits, in which it is shown that the new method is consistently able to initialize a probabilistic orbit solution and provide iterative refinement via follow-on tracking.

Journal ArticleDOI
TL;DR: The achieved relative figure of merits using the collected data validates the reliability of the proposed methods for the desired applications and permits the potential application of the current study in camera-aided inertial navigation for positioning and personal assistance for future research works.
Abstract: This paper presents a method for pedestrian activity classification and gait analysis based on the microelectromechanical-systems inertial measurement unit (IMU). The work targets two groups of applications, including the following: 1) human activity classification and 2) joint human activity and gait-phase classification. In the latter case, the gait phase is defined as a substate of a specific gait cycle, i.e., the states of the body between the stance and swing phases. We model the pedestrian motion with a continuous hidden Markov model (HMM) in which the output density functions are assumed to be Gaussian mixture models. For the joint activity and gait-phase classification, motivated by the cyclical nature of the IMU measurements, each individual activity is modeled by a “circular HMM.” For both the proposed classification methods, proper feature vectors are extracted from the IMU measurements. In this paper, we report the results of conducted experiments where the IMU was mounted on the humans' chests. This permits the potential application of the current study in camera-aided inertial navigation for positioning and personal assistance for future research works. Five classes of activity, including walking, running, going upstairs, going downstairs, and standing, are considered in the experiments. The performance of the proposed methods is illustrated in various ways, and as an objective measure, the confusion matrix is computed and reported. The achieved relative figure of merits using the collected data validates the reliability of the proposed methods for the desired applications.

Proceedings ArticleDOI
27 Aug 2013
TL;DR: This paper proposes a generalisation of the algorithm for removing background using mixture of Gaussian distributions where the spatial relationship between pixels is taken into account, and model regions as mixture distributions rather than individual pixels.
Abstract: Modelling pixels using mixture of Gaussian distributions is a popular approach for removing background in video sequences. This approach works well for static backgrounds because the pixels are assumed to be independent of each other. However, when the background is dynamic, this is not very effective. In this paper, we propose a generalisation of the algorithm where the spatial relationship between pixels is taken into account. In essence, we model regions as mixture distributions rather than individual pixels. Using experimental verification on various video sequences, we show that our method is able to model and subtract backgrounds effectively in scenes with complex dynamic textures.

Journal ArticleDOI
TL;DR: This framework starts by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outlines the importance of the estimation of the activities of the speakers.
Abstract: We propose a new framework for joint multichannel speech source separation and acoustic noise reduction. In this framework, we start by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outline the importance of the estimation of the activities of the speakers. The latter is accurately achieved by introducing a latent variable that takes N+1 possible discrete states for a mixture of N speech signals plus additive noise. Each state characterizes the dominance of one of the N+1 signals. We determine the posterior probability of this latent variable, and show how it plays a twofold role in the MMSE-based speech enhancement. First, it allows the extraction of the second order statistics of the noise and each of the speech signals from the noisy data. These statistics are needed to formulate the multichannel Wiener-based filters (including the minimum variance distortionless response). Second, it weighs the outputs of these linear filters to shape the spectral contents of the signals' estimates following the associated target speakers' activities. We use the spatial and spectral cues contained in the multichannel recordings of the sound mixtures to compute the posterior probability of this latent variable. The spatial cue is acquired by using the normalized observation vector whose distribution is well approximated by a Gaussian-mixture-like model, while the spectral cue can be captured by using a pre-trained Gaussian mixture model for the log-spectra of speech. The parameters of the investigated models and the speakers' activities (posterior probabilities of the different states of the latent variable) are estimated via expectation maximization. Experimental results including comparisons with the well-known independent component analysis and masking are provided to demonstrate the efficiency of the proposed framework.

Proceedings Article
05 Dec 2013
TL;DR: This work introduces RNADE, a new model for joint density estimation of real-valued vectors that calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters.
Abstract: We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case.