scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2002"


Journal ArticleDOI
TL;DR: The novelty of the approach is that it does not use a model selection criterion to choose one among a set of preestimated candidate models; instead, it seamlessly integrate estimation and model selection in a single algorithm.
Abstract: This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective "unsupervised" is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

2,182 citations


Book ChapterDOI
01 Jan 2002
TL;DR: This paper presents a method which improves this adaptive background mixture model by reinvestigating the update equations at different phases, which allows the system learn faster and more accurately as well as adapts effectively to changing environment.
Abstract: Real-time segmentation of moving regions in image sequences is a fundamental step in many vision systems including automated visual surveillance, human-machine interface, and very low-bandwidth telecommunications A typical method is background subtraction Many background models have been introduced to deal with different problems One of the successful solutions to these problems is to use a multi-colour background model per pixel proposed by Grimson et al [1, 2,3] However, the method suffers from slow learning at the beginning, especially in busy environments In addition, it can not distinguish between moving shadows and moving objects This paper presents a method which improves this adaptive background mixture model By reinvestigating the update equations, we utilise different equations at different phases This allows our system learn faster and more accurately as well as adapts effectively to changing environment A shadow detection scheme is also introduced in this paper It is based on a computational colour space that makes use of our background model A comparison has been made between the two algorithms The results show the speed of learning and the accuracy of the model using our update algorithm over the Grimson et al’s tracker When incorporate with the shadow detection, our method results in far better segmentation than The Thirteenth Conference on Uncertainty in Artificial Intelligence that of Grimson et al

1,638 citations


Journal ArticleDOI
TL;DR: To overcome the limitations of the individual models, a joint decision logic is developed, based on a maximum entropy probability model and the GLRT, that utilizes multiple decision statistics, and this approach is applied using the detection statistics derived from the three clutter models.
Abstract: We develop anomaly detectors, i.e., detectors that do not presuppose a signature model of one or more dimensions, for three clutter models: the local normal model, the global normal mixture model, and the global linear mixture model. The local normal model treats the neighborhood of a pixel as having a normal probability distribution. The normal mixture model considers the observation from each pixel as arising from one of several possible classes such that each class has a normal probability distribution. The linear mixture model considers each observation to be a linear combination of fixed spectra, known as endmembers, that are, or may be, associated with materials in the scene, and the coefficients, interpreted as fractional abundance, are constrained to be nonnegative and sum to one. We show how the generalized likelihood ratio test (GLRT) may be used to derive anomaly detectors for the local normal and global normal mixture models. The anomaly detector applied with the linear mixture approach proceeds by identifying target like endmembers based on properties of the histogram of the abundance estimates and employing a matched filter in the space of abundance estimates. To overcome the limitations of the individual models, we develop a joint decision logic, based on a maximum entropy probability model and the GLRT, that utilizes multiple decision statistics, and we apply this approach using the detection statistics derived from the three clutter models. Examples demonstrate that the joint decision logic can improve detection performance in comparison with the individual anomaly detectors. We also describe the application of linear prediction filters to repeated images of the same area to detect changes that occur within the scene over time.

733 citations


Journal ArticleDOI
TL;DR: It is shown how recursive computing allows the statistically efficient use of MCMC output when estimating the hidden states, and the use of log-likelihood for assessing MCMC convergence is illustrated.
Abstract: Markov chain Monte Carlo (MCMC) sampling strategies can be used to simulate hidden Markov model (HMM) parameters from their posterior distribution given observed data. Some MCMC methods used in practice (for computing likelihood, conditional probabilities of hidden states, and the most likely sequence of states) can be improved by incorporating established recursive algorithms. The most important of these is a set of forward-backward recursions calculating conditional distributions of the hidden states given observed data and model parameters. I show how to use the recursive algorithms in an MCMC context and demonstrate mathematical and empirical results showing a Gibbs sampler using the forward-backward recursions mixes more rapidly than another sampler often used for HMMs. Iintroduce an augmented variables technique for obtaining unique state labels in HMMs and finite mixture models. I show how recursive computing allows the statistically efficient use of MCMC output when estimating the hidden states. I...

583 citations


Journal ArticleDOI
TL;DR: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues, and relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classified tissues or with background and biological knowledge of these sets.
Abstract: Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

571 citations


Proceedings Article
01 Jan 2002
TL;DR: Two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems are described.
Abstract: Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels of performance. This paper describes two GMM-based approaches to language identification that use shifted delta cepstra (SDC) feature vectors to achieve LID performance comparable to that of the best phone-based systems. The approaches include both acoustic scoring and a recently developed GMM tokenization system that is based on a variation of phonetic recognition and language modeling. System performance is evaluated on both the CallFriend and OGI corpora.

459 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed speckle reduction algorithm outperforms standard wavelet denoising techniques in terms of the signal-to-noise ratio and the equivalent-number-of-looks measures in most cases and achieves better performance than the refined Lee filter.
Abstract: The granular appearance of speckle noise in synthetic aperture radar (SAR) imagery makes it very difficult to visually and automatically interpret SAR data. Therefore, speckle reduction is a prerequisite for many SAR image processing tasks. In this paper, we develop a speckle reduction algorithm by fusing the wavelet Bayesian denoising technique with Markov-random-field-based image regularization. Wavelet coefficients are modeled independently and identically by a two-state Gaussian mixture model, while their spatial dependence is characterized by a Markov random field imposed on the hidden state of Gaussian mixtures. The Expectation-Maximization algorithm is used to estimate hyperparameters and specify the mixture model, and the iterated-conditional-modes method is implemented to optimize the state configuration. The noise-free wavelet coefficients are finally estimated by a shrinkage function based on local weighted averaging of the Bayesian estimator. Experimental results show that the proposed method outperforms standard wavelet denoising techniques in terms of the signal-to-noise ratio and the equivalent-number-of-looks measures in most cases. It also achieves better performance than the refined Lee filter.

414 citations


Proceedings Article
01 Jan 2002
TL;DR: It is shown that the proposed probabilistic generative models, called parametric mixture models (PMMs), could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages.
Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages.

345 citations


01 Jan 2002
TL;DR: This tutorial paper describes a practical implementation of the Stauffer-Grimson algorithm and provides values for all model parameters and shows what approximations to the theory were made and how to improve the standard algorithm by redefining those approximation.
Abstract: The seminal video surveillance papers on moving object segmentation through adaptive Gaussian mixture models of the background image do not provide adequate information for easy replication of the work. They also do not explicitly base their algorithms on the underlying statistical theory and sometimes even suffer from errors of derivation. This tutorial paper describes a practical implementation of the Stauffer-Grimson algorithm and provides values for all model parameters. It also shows what approximations to the theory were made and how to improve the standard algorithm by redefining those approximations.

316 citations


Journal ArticleDOI
TL;DR: A clustering procedure based on the Bayesian infinite mixture model and applied to clustering gene expression profiles that allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles.
Abstract: MOTIVATION The biologic significance of results obtained through cluster analyses of gene expression data generated in microarray experiments have been demonstrated in many studies. In this article we focus on the development of a clustering procedure based on the concept of Bayesian model-averaging and a precise statistical model of expression data. RESULTS We developed a clustering procedure based on the Bayesian infinite mixture model and applied it to clustering gene expression profiles. Clusters of genes with similar expression patterns are identified from the posterior distribution of clusterings defined implicitly by the stochastic data-generation model. The posterior distribution of clusterings is estimated by a Gibbs sampler. We summarized the posterior distribution of clusterings by calculating posterior pairwise probabilities of co-expression and used the complete linkage principle to create clusters. This approach has several advantages over usual clustering procedures. The analysis allows for incorporation of a reasonable probabilistic model for generating data. The method does not require specifying the number of clusters and resulting optimal clustering is obtained by averaging over models with all possible numbers of clusters. Expression profiles that are not similar to any other profile are automatically detected, the method incorporates experimental replicates, and it can be extended to accommodate missing data. This approach represents a qualitative shift in the model-based cluster analysis of expression data because it allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles. We also demonstrated the importance of incorporating the information on experimental variability into the clustering model. AVAILABILITY The MS Windows(TM) based program implementing the Gibbs sampler and supplemental material is available at http://homepages.uc.edu/~medvedm/BioinformaticsSupplement.htm CONTACT medvedm@email.uc.edu

312 citations


Journal ArticleDOI
01 Oct 2002
TL;DR: A new clustering algorithm is proposed, based on the expectation-maximization (EM) identification of Gaussian mixture models, which is applied to two well-known benchmark problems: the MPG prediction and a simulated second-order nonlinear process.
Abstract: The construction of interpretable Takagi-Sugeno (TS) fuzzy models by means of clustering is addressed. First, it is shown how the antecedent fuzzy sets and the corresponding consequent parameters of the TS model can be derived from clusters obtained by the Gath-Geva (GG) algorithm. To preserve the partitioning of the antecedent space, linearly transformed input variables can be used in the model. This may, however, complicate the interpretation of the rules. To form an easily interpretable model that does not use the transformed input variables, a new clustering algorithm is proposed, based on the expectation-maximization (EM) identification of Gaussian mixture models. This new technique is applied to two well-known benchmark problems: the MPG (miles per gallon) prediction and a simulated second-order nonlinear process. The obtained results are compared with results from the literature.

Journal ArticleDOI
TL;DR: In this paper, a greedy algorithm for learning a Gaussian mixture is proposed, which uses a combination of global and local search each time a new component is added to the mixture and achieves solutions superior to EM with k components in terms of the likelihood of a test set.
Abstract: Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get trapped in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number k, the algorithm is capable of achieving solutions superior to EM with k components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture.

Journal ArticleDOI
TL;DR: In this article, a nonparametric analysis of the finite normal mixture model is obtained by working with a precise truncation approximation of the Dirichlet process, which is carried out by a simple Gibbs sampling algorithm that directly samples the non-parametric posterior.
Abstract: A rich nonparametric analysis of the finite normal mixture model is obtained by working with a precise truncation approximation of the Dirichlet process. Model fitting is carried out by a simple Gibbs sampling algorithm that directly samples the nonparametric posterior. The proposed sampler mixes well, requires no tuning parameters, and involves only draws from simple distributions, including the draw for the mass parameter that controls clustering, and the draw for the variances with the use of a nonconjugate uniform prior. Working directly with the nonparametric prior is conceptually appealing and among other things leads to graphical methods for studying the posterior mixing distribution as well as penalized MLE procedures for deriving point estimates. We discuss methods for automating selection of priors for the mean and variance components to avoid over or undersmoothing the data. We also look at the effectiveness of incorporating prior information in the form of frequentist point estimates.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: Evaluation on five different databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to---or better than---the best results achieved using a single smoothed method and exhaustive parameter search on the test data.
Abstract: The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the query and document collection on the optimal settings of retrieval parameters. As a special case, we present a two-stage smoothing method that allows us to estimate the smoothing parameters completely automatically. In the first stage, the document language model is smoothed using a Dirichlet prior with the collection language model as the reference model. In the second stage, the smoothed document language model is further interpolated with a query background language model. We propose a leave-one-out method for estimating the Dirichlet parameter of the first stage, and the use of document mixture models for estimating the interpolation parameter of the second stage. Evaluation on five different databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to---or better than---the best results achieved using a single smoothing method and exhaustive parameter search on the test data.

Proceedings ArticleDOI
13 May 2002
TL;DR: Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification, but this technique is generalized by using Gaussian mixture models as the basis for tokenizing.
Abstract: Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification. In this paper, this technique is generalized by using Gaussian mixture models as the basis for tokenizing. Performance results are presented for a system employing a GMM tokenizer in conjunction with multiple language processing and score combination techniques. On the 1996 CallFriend LID evaluation set, a 12-way closed set error rate of 17% was obtained.

Journal ArticleDOI
01 Sep 2002-Extremes
TL;DR: A new dynamically weighted mixture model, where one term of the mixture is the GPD, and the other is a light-tailed density distribution, which can be useful in unsupervised tail estimation, especially in heavy tailed situations and for small percentiles.
Abstract: Exceedances over high thresholds are often modeled by fitting a generalized Pareto distribution (GPD) on R+. It is difficult to select the threshold, above which the GPD assumption is enough solid and enough data is available for inference. We suggest a new dynamically weighted mixture model, where one term of the mixture is the GPD, and the other is a light-tailed density distribution. The weight function varies on R+ in such a way that for large values the GPD component is predominant and thus takes the role of threshold selection. The full data set is used for inference on the parameters present in the two component distributions and in the weight function. Maximum likelihood provides estimates with approximate standard deviations. Our approach has been successfully applied to simulated data and to the (previously studied) Danish fire loss data set. We compare the new dynamic mixture method to Dupuis' robust thresholding approach in peaks-over-threshold inference. We discuss robustness with respect to the choice of the light-tailed component and the form of the weight function. We present encouraging simulation results that indicate that the new approach can be useful in unsupervised tail estimation, especially in heavy tailed situations and for small percentiles.

Journal ArticleDOI
TL;DR: A variational Bayes (VB) learning algorithm for generalized autoregressive (GAR) models that reduces to the Bayesian evidence framework for Gaussian noise and uninformative priors and weight precisions and is applied to synthetic and real data with encouraging results.
Abstract: We describe a variational Bayes (VB) learning algorithm for generalized autoregressive (GAR) models. The noise is modeled as a mixture of Gaussians rather than the usual single Gaussian. This allows different data points to be associated with different noise levels and effectively provides robust estimation of AR coefficients. The VB framework is used to prevent overfitting and provides model-order selection criteria both for AR order and noise model order. We show that for the special case of Gaussian noise and uninformative priors on the noise and weight precisions, the VB framework reduces to the Bayesian evidence framework. The algorithm is applied to synthetic and real data with encouraging results.

Journal ArticleDOI
TL;DR: This work shows that the training error surface realized by the neural network model in the feature space is useful to study the characteristics of the distribution of the input data, and proposes a method of obtaining an error surface to match the Distribution capturing ability of AANN models.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: Performance evaluations exhibit clear superiority of the proposed method with its improved document clustering and model selection accuracies.
Abstract: In this paper, we propose a document clustering method that strives to achieve: (1) a high accuracy of document clustering, and (2) the capability of estimating the number of clusters in the document corpus (i.e. the model selection capability). To accurately cluster the given document corpus, we employ a richer feature set to represent each document, and use the Gaussian Mixture Model (GMM) together with the Expectation-Maximization (EM) algorithm to conduct an initial document clustering. From this initial result, we identify a set of discriminative featuresfor each cluster, and refine the initially obtained document clusters by voting on the cluster label of each document using this discriminative feature set. This self-refinement process of discriminative feature identification and cluster label voting is iteratively applied until the convergence of document clusters. On the other hand, the model selection capability is achieved by introducing randomness in the cluster initialization stage, and then discovering a value C for the number of clusters N by which running the document clustering process for a fixed number of times yields sufficiently similar results. Performance evaluations exhibit clear superiority of the proposed method with its improved document clustering and model selection accuracies. The evaluations also demonstrate how each feature as well as the cluster refinement process contribute to the document clustering accuracy.

Journal ArticleDOI
TL;DR: This paper presents a deterministic algorithm to approximately optimize the objective function by using the idea of the split and merge operations which was previously proposed within the maximum likelihood framework and applies the method to mixture of expers models to experimentally show that the proposed method can find the optimal number of experts of a MoE while avoiding local maxima.

MonographDOI
16 May 2002
TL;DR: Basic Cluster Modelling Approaches A Spatio-Temporal Hidden Process Model Model Development The Posterior Sampling Algorithm Data Example: Scottish Birth Abnormalities Discussion.
Abstract: SPATIAL CLUSTER MODELLING: AN OVERVIEW Introduction Historical Development Notation and Model Development I. POINT PROCESS CLUSTER MODELLING SIGNIFICANCE IN SCALE-SPACE FOR CLUSTERING Introduction Overview New Method Future Directions STATISTICAL INFERENCE FOR COX PROCESSES Introduction Poisson Processes Cox Processes Summary Statistics Parametric Models of Cox Processes Estimation for Parametric Models of Cox Processes Prediction Discussion EXTRAPOLATING AND INTERPOLATING SPATIAL PATTERNS Introduction Formulation and Notation Spatial Cluster Processes Bayesian Cluster Analysis Summary and Conclusion PERFECT SAMPLING FOR POINT PROCESS CLUSTER MODELLING Introduction Bayesian Cluster Model Sampling from the Posterior Specialized Examples Leukemia Incidence in Upstate New York Redwood Seedlings Data BAYESIAN ESTIMATION AND SEGMENTATION OF SPATIAL POINT PROCESSES USING VORONOI TILINGS Introduction Proposed Solution Framework Intensity Estimation Intensity Segmentation Examples Discussion II. SPATIAL PROCESS CLUSTER MODELLING PARTITION MODELLING Introduction Partition Models Piazza Road Dataset Spatial Count Data Discussion Further Reading CLUSTER MODELLING FOR DISEASE RATE MAPPING Introduction Statistical Model Posterior Calculation Example: U.S. Cancer Mortality Atlas Conclusions ANALYZING SPATIAL DATA USING SKEW-GAUSSIAN PROCESSES Introduction Skew-Gaussian Processes Real Data Illustration: Spatial Potential Data Prediction Discussion ACCOUNTING FOR ABSORPTION LINES IN IMAGES OBTAINED WITH THE CHANDRA X-RAY OBSERVATORY Statistical Challenges of the Chandra X-Ray Observatory Modeling the Image Absorption Lines Spectral Models with Absorption Lines Discussion SPATIAL MODELLING OF COUNT DATA: A CASE STUDY IN MODELLING BREEDING BIRD SURVEY DATA ON LARGE SPATIAL DOMAINS Introduction The Poisson Random Effects Model Results Conclusion III. SPATIO-TEMPORAL CLUSTER MODELLING MODELLING STRATEGIES FOR SPATIAL-TEMPORAL DATA Introduction Modelling Strategy D-D (Drift-Drift) Models D-C (Drift-Correlation) Models C-C (Correlation-Correlation) Models A Unified Analysis on the Circle Discussion SPATIO-TEMPORAL PARTITION MODELLING: AN EXAMPLE FROM NEUROPHYSIOLOGY Introduction The Neurophysiological Experiment The Linear Inverse Solution The Mixture Model Classification of the Inverse Solution Discussion SPATIO-TEMPORAL CLUSTER MODELLING OF SMALL AREA HEALTH DATA Introduction Basic Cluster Modelling Approaches A Spatio-Temporal Hidden Process Model Model Development The Posterior Sampling Algorithm Data Example: Scottish Birth Abnormalities Discussion REFERENCES INDEX AUTHOR INDEX

Journal Article
TL;DR: In this paper, a survey of models for clumped-at-zero or zero-inflated cross-sectional data is presented, where the response for the non-zero observations is continuous and in which it is discrete.
Abstract: Applications in which data take nonnegative values but have a substantial proportion of values at zero occur in many dis- ciplines. The modeling of such "clumped-at-zero" or "zero-inflated" data is challenging. We survey models that have been proposed. We consider cases in which the response for the non-zero observations is continuous and in which it is discrete. For the continuous and then the discrete case, we review models for analyzing cross-sectional data. We then summarize extensions for repeated measurement analyses (e.g., in longitudinal studies), for which the literature is still sparse. We also mention applications in which more than one clump can oc- cur and we suggest problems for future research.

Journal Article
TL;DR: In this paper, a simple adjustment to the Pa- rameters in the prior induces a random probability measure that approximates the Dirichlet process and yields a posterior that is strongly consistent for density and weakly consistent for the unknown mixing distribution.
Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consis- tency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive use of the popular uniform Dirichlet prior leads to an inconsistent posterior. However, a simple adjustment to the pa- rameters in the prior induces a random probability measure that approximates the Dirichlet process and yields a posterior that is strongly consistent for the density and weakly consistent for the unknown mixing distribution. The dimension of the resulting sieve can be selected easily in practice and a simple and efficient Gibbs sampler can be used to sample the posterior of the mixing distribution.

Journal ArticleDOI
TL;DR: This paper demonstrates that the unsupervised classification method, derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities, was effective in classifying complex image textures such as natural scenes and text.
Abstract: An unsupervised classification algorithm is derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities. The algorithm estimates the data density in each class by using parametric nonlinear functions that fit to the non-Gaussian structure of the data. This improves classification accuracy compared with standard Gaussian mixture models. When applied to images, the algorithm can learn efficient codes (basis functions) for images that capture the statistically significant structure intrinsic in the images. We apply this technique to the problem of unsupervised classification, segmentation, and denoising of images. We demonstrate that this method was effective in classifying complex image textures such as natural scenes and text. It was also useful for denoising and filling in missing pixels in images with complex structures. The advantage of this model is that image codes can be learned with increasing numbers of classes thus providing greater flexibility in modeling structure and in finding more image features than in either Gaussian mixture models or standard independent component analysis (ICA) algorithms.

Journal ArticleDOI
TL;DR: A method is proposed that makes it possible to infer the number of subpopulations by a mixture model, using a set of independent genetic markers and then testing the association between a genetic marker and a trait.
Abstract: Association mapping for complex diseases using unrelated individuals can be more powerful than family-based analysis in many settings. In addition, this approach has major practical advantages, including greater efficiency in sample recruitment. Association mapping may lead to false-positive findings, however, if population stratification is not properly considered. In this paper, we propose a method that makes it possible to infer the number of subpopulations by a mixture model, using a set of independent genetic markers and then testing the association between a genetic marker and a trait. The proposed method can be effectively applied in the analysis of both qualitative and quantitative traits. Extensive simulations demonstrate that the method is valid in the presence of a population structure.

Journal ArticleDOI
TL;DR: The authors developed mixture models for spatially indexed data and applied them to disease mapping in a Bayesian framework, with the Poisson parameters drawn from gamma priors, and an unknown number of components.
Abstract: The paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our specific focus is on Poisson-distributed data, and applications in disease mapping. We work in a Bayesian framework, with the Poisson parameters drawn from gamma priors, and an unknown number of components. We propose two alternative models for spatially dependent weights, based on transformations of autoregressive Gaussian processes: in one (the logistic normal model), the mixture component labels are exchangeable; in the other (the grouped continuous model), they are ordered. Reversible jump Markov chain Monte Carlo algorithms for posterior inference are developed. Finally, the performances of both of these formulations are examined on synthetic data and real data on mortality from a rare disease.

Journal ArticleDOI
TL;DR: Further advances on BYY harmony learning by considering modular inner representations are presented, with details on three issues, namely, adaptive learning algorithms, especially elliptic, subspace, and structural rival penalized competitive learning algorithms with model selection made automatically during learning.

Journal ArticleDOI
Shy Shoham1
TL;DR: New robust clustering algorithms are presented, which significantly improve upon the noise and initialization sensitivity of traditional mixture decomposition algorithms, and simplify the determination of the optimal number of clusters in the data set.

Journal ArticleDOI
TL;DR: This paper considers an application of ICA to the LSMA, referred to as ICA-based linear spectral random mixture analysis (LSRMA), which describes an image pixel as a random source resulting from a random composition of multiple spectral signatures of distinct materials in the image.
Abstract: Independent component analysis (ICA) has shown success in blind source separation and channel equalization. Its applications to remotely sensed images have been investigated in recent years. Linear spectral mixture analysis (LSMA) has been widely used for subpixel detection and mixed pixel classification. It models an image pixel as a linear mixture of materials present in an image where the material abundance fractions are assumed to be unknown and nonrandom parameters. This paper considers an application of ICA to the LSMA, referred to as ICA-based linear spectral random mixture analysis (LSRMA), which describes an image pixel as a random source resulting from a random composition of multiple spectral signatures of distinct materials in the image. It differs from the LSMA in that the abundance fractions of the material spectral signatures in the LSRMA are now considered to be unknown but random independent signal sources. Two major advantages result from the LSRMA. First, it does not require prior knowledge of the materials to be used in the linear mixture model, as required for the LSMA. Second, and most importantly, the LSRMA models the abundance fraction of each material spectral signature as an independent random signal source so that the spectral variability of materials can be described by their corresponding abundance fractions and captured more effectively in a stochastic manner.

Journal ArticleDOI
TL;DR: This paper provides a theoretical underpinning to the Bayesian implementation by demonstrating consistency of the posterior distribution and demonstrates the usefulness of the mixture of linear regressions framework in Bayesian robust regression.
Abstract: Consider data (x1,y1),…,(xn,yn), where each xi may be vector valued, and the distribution of yi given xi is a mixture of linear regressions. This provides a generalization of mixture models which do not include covariates in the mixture formulation. This mixture of linear regressions formulation has appeared in the computer science literature under the name “Hierarchical Mixtures of Experts” model. This model has been considered from both frequentist and Bayesian viewpoints. We focus on the Bayesian formulation. Previously, estimation of the mixture of linear regression model has been done through straightforward Gibbs sampling with latent variables. This paper contributes to this field in three major areas. First, we provide a theoretical underpinning to the Bayesian implementation by demonstrating consistency of the posterior distribution. This demonstration is done by extending results in Barron, Schervish and Wasserman (Annals of Statistics 27: 536–561, 1999) on bracketing entropy to the regression setting. Second, we demonstrate through examples that straightforward Gibbs sampling may fail to effectively explore the posterior distribution and provide alternative algorithms that are more accurate. Third, we demonstrate the usefulness of the mixture of linear regressions framework in Bayesian robust regression. The methods described in the paper are applied to two examples.