scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 1997"


Book
01 Jan 1997
TL;DR: 1. Basic Principles: The Operating Regime Approach 2. Modelling: Fuzzy Set Methods for Local Modelling Identification 3. Modelled of Electrically Stimulated Muscle
Abstract: 1. Basic Principles: The Operating Regime Approach 2. Modelling: Fuzzy Set Methods for Local Modelling Identification 3. Modelling of Electrically Stimulated Muscle 4. Process Modelling Using a Functional State Approach 5. Markov Mixtures of Experts 6. Active Learning With Mixture Models 7. Local Learning in Local Model Networks 8. Side Effects of Normalising Basic Functions 9. Control: Heterogeneous Control Laws 10. Local Laguerre Models 11. Multiple Model Adaptive Control 12. H Control Using Multiple Linear Models 13. Synthesis of Fuzzy Control Systems Based on Linear Takagi-Sugeno Fuzzy Models

816 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach to shrinkage is proposed by placing priors on the wavelet coefficients, where the prior for each coefficient consists of a mixture of two normal distributions with different standard deviations.
Abstract: When fitting wavelet based models, shrinkage of the empirical wavelet coefficients is an effective tool for denoising the data. This article outlines a Bayesian approach to shrinkage, obtained by placing priors on the wavelet coefficients. The prior for each coefficient consists of a mixture of two normal distributions with different standard deviations. The simple and intuitive form of prior allows us to propose automatic choices of prior parameters. These parameters are chosen adaptively according to the resolution level of the coefficients, typically shrinking high resolution (frequency) coefficients more heavily. Assuming a good estimate of the background noise level, we obtain closed form expressions for the posterior means and variances of the unknown wavelet coefficients. The latter may be used to assess uncertainty in the reconstruction. Several examples are used to illustrate the method, and comparisons are made with other shrinkage methods.

577 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a finite mixture negative binomial count model that accommodates unobserved heterogeneity in an intuitive and analytically tractable manner for six measures of medical care demand by the elderly.
Abstract: SUMMARY In this article we develop a finite mixture negative binomial count model that accommodates unobserved heterogeneity in an intuitive and analytically tractable manner. This model, the standard negative binomial model, and its hurdle extension are estimated for six measures of medical care demand by the elderly using a sample from the 1987 National Medical Expenditure Survey. The finite mixture model is preferred overall by statistical model selection criteria. Two points of support adequately describe the distribution of the unobserved heterogeneity, suggesting two latent populations, the ‘healthy’ and the ‘ill’ whose fitted distributions diAer substantially from each other. #1997 by John Wiley & Sons, Ltd.

526 citations


Journal ArticleDOI
TL;DR: In this paper, a general finite mixture structural equation model is proposed to detect heterogeneity in the data and to detect moderating factors which account for heterogeneity. But the model is not suitable for the analysis of large data sets.
Abstract: Two endemic problems face researchers in the social sciences e.g., Marketing, Economics, Psychology, and Finance: unobserved heterogeneity and measurement error in data. Structural equation modeling is a powerful tool for dealing with these difficulties using a simultaneous equation framework with unobserved constructs and manifest indicators which are error-prone. When estimating structural equation models, however, researchers frequently treat the data as if they were collected from a single population Muthen [Muthen, Bengt O. 1989. Latent variable modeling in heterogeneous populations. Psychometrika54 557--585.]. This assumption of homogeneity is often unrealistic. For example, in multidimensional expectancy value models, consumers from different market segments can have different belief structures Bagozzi [Bagozzi, Richard P. 1982. A field investigation of causal relations among cognitions, affect, intentions, and behavior. J. Marketing Res.19 562--584.]. Research in satisfaction suggests that consumer decision processes vary across segments Day [Day, Ralph L. 1977. Extending the concept of consumer satisfaction. W. D. Perreault, ed. Advances in Consumer Research, Vol. 4. Association for Consumer Research, Atlanta, 149--154.]. This paper shows that aggregate analysis which ignores heterogeneity in structural equation models produces misleading results and that traditional fit statistics are not useful for detecting unobserved heterogeneity in the data. Furthermore, sequential analyses that first form groups using cluster analysis and then apply multigroup structural equation modeling are not satisfactory. We develop a general finite mixture structural equation model that simultaneously treats heterogeneity and forms market segments in the context of a specified model structure where all the observed variables are measured with error. The model is considerably more general than cluster analysis, multigroup confirmatory factor analysis, and multigroup structural equation modeling. In particular, the model subsumes several specialized models including finite mixture simultaneous equation models, finite mixture confirmatory factor analysis, and finite mixture second-order factor analysis. The finite mixture structural equation model should be of interest to academics in a wide range of disciplines e.g., Consumer Behavior, Marketing, Economics, Finance, Psychology, and Sociology where unobserved heterogeneity and measurement error are problematic. In addition, the model should be of interest to market researchers and product managers for two reasons. First, the model allows the manager to perform response-based segmentation using a consumer decision process model, while explicitly allowing for both measurement and structural error. Second, the model allows managers to detect unobserved moderating factors which account for heterogeneity. Once managers have identified the moderating factors, they can link segment membership to observable individual-level characteristics e.g., socioeconomic and demographic variables and improve marketing policy. We applied the finite mixture structural equation model to a direct marketing study of customer satisfaction and estimated a large model with 8 unobserved constructs and 23 manifest indicators. The results show that there are three consumer segments that vary considerably in terms of the importance they attach to the various dimensions of satisfaction. In contrast, aggregate analysis is misleading because it incorrectly suggests that except for price all dimensions of satisfaction are significant for all consumers. Methodologically, the finite mixture model is robust; that is, the parameter estimates are stable under double cross-validation and the method can be used to test large models. Furthermore, the double cross-validation results show that the finite mixture model is superior to sequential data analysis strategies in terms of goodness-of-fit and interpretability. We performed four simulation experiments to test the robustness of the algorithm using both recursive and nonrecursive model specifications. Specifically, we examined the robustness of different model selection criteria e.g., CAIC, BIC, and GFI in choosing the correct number of clusters for exactly identified and overidentified models assuming that the distributional form is correctly specified. We also examined the effect of distributional misspecification i.e., departures from multivariate normality on model performance. The results show that when the data are heterogeneous, the standard goodness-of-fit statistics for the aggregate model are not useful for detecting heterogeneity. Furthermore, parameter recovery is poor. For the finite mixture model, however, the BIC and CAIC criteria perform well in detecting heterogeneity and in identifying the true number of segments. In particular, parameter recovery for both the measurement and structural models is highly satisfactory. The finite mixture method is robust to distributional misspecification; in addition, the method significantly outperforms aggregate and sequential data analysis methods when the form of heterogeneity is misspecified i.e., the true model has random coefficients. Researchers and practitioners should only use the mixture methodology when substantive theory supports the structural equation model, a priori segmentation is infeasible, and theory suggests that the data are heterogeneous and belong to a finite number of unobserved groups. We expect these conditions to hold in many social science applications and, in particular, market segmentation studies. Future research should focus on large-scale simulation studies to test the structural equation mixture model using a wide range of models and statistical distributions. Theoretical research should extend the model by allowing the mixing proportions to depend on prior information and/or subject-specific variables. Finally, in order to provide a fuller treatment of heterogeneity, we need to develop a general random coefficient structural equation model. Such a model is presently unavailable in the statistical and psychometric literatures.

411 citations


Proceedings Article
01 Jan 1997
TL;DR: The shape variation displayed by a class of objects can be represented as probability density function, allowing us to determine plausible and implausible examples of the class, and how this distribution can be used in image search to locateExamples of the modelled object in new images.
Abstract: The shape variation displayed by a class of objects can be represented as probability density function, allowing us to determine plausible and implausible examples of the class. Given a training set of example shapes we can align them into a common co-ordinate frame and use kernel-based density estimation techniques to represent this distribution. Such an estimate is complex and expensive, so we generate a simpler approximation using a mixture of gaussians. We show how to calculate the distribution, and how it can be used in image search to locate examples of the modelled object in new images.

326 citations


Proceedings ArticleDOI
25 Mar 1997
TL;DR: A new image compression paradigm that combines compression efficiency with speed, and is based on an independent "infinite" mixture model which accurately captures the space-frequency characterization of the wavelet image representation, is introduced.
Abstract: We introduce a new image compression paradigm that combines compression efficiency with speed, and is based on an independent "infinite" mixture model which accurately captures the space-frequency characterization of the wavelet image representation. Specifically, we model image wavelet coefficients as being drawn from an independent generalized Gaussian distribution field, of fixed unknown shape for each subband, having zero mean and unknown slowly spatially-varying variances. Based on this model, we develop a powerful "on the fly" estimation-quantization (EQ) framework that consists of: (i) first finding the maximum-likelihood estimate of the individual spatially-varying coefficient field variances based on causal and quantized spatial neighborhood contexts; and (ii) then applying an off-line rate-distortion (R-D) optimized quantization/entropy coding strategy, implemented as a fast lookup table, that is optimally matched to the derived variance estimates. A distinctive feature of our paradigm is the dynamic switching between forward and backward adaptation modes based on the reliability of causal prediction contexts. The performance of our coder is extremely competitive with the best published results in the literature across diverse classes of images and target bitrates of interest, in both compression efficiency and processing speed. For example, our coder exceeds the objective performance of the best zerotree-based wavelet coder based on space-frequency-quantization at all bit rates for all tested images at a fraction of its complexity.

319 citations


Journal ArticleDOI
TL;DR: In this paper, various types of finite mixtures of confirmatory factor-analysis models are proposed for handling data heterogeneity, and three different sampling schemes for these mixture models are distinguished.
Abstract: In this paper, various types of finite mixtures of confirmatory factor-analysis models are proposed for handling data heterogeneity. Under the proposed mixture approach, observations are assumed to be drawn from mixtures of distinct confirmatory factor-analysis models. But each observation does not need to be identified to a particular model prior to model fitting. Several classes of mixture models are proposed. These models differ by their unique representations of data heterogeneity. Three different sampling schemes for these mixture models are distinguished. A mixed type of the these three sampling schemes is considered throughout this article. The proposed mixture approach reduces to regular multiple-group confirmatory factor-analysis under a restrictive sampling scheme, in which the structural equation model for each observation is assumed to be known. By assuming a mixture of multivariate normals for the data, maximum likelihood estimation using the EM (Expectation-Maximization) algorithm and the AS (Approximate-Scoring) method are developed, respectively. Some mixture models were fitted to a real data set for illustrating the application of the theory. Although the EM algorithm and the AS method gave similar sets of parameter estimates, the AS method was found computationally more efficient than the EM algorithm. Some comments on applying the mixture approach to structural equation modeling are made.

207 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: An approximate maximum likelihood method for blind source separation and deconvolution of noisy signal is proposed, which is able to capture some salient features of the input signal distribution and performs generally much better than third-order or fourth-order cumulant based techniques.
Abstract: An approximate maximum likelihood method for blind source separation and deconvolution of noisy signal is proposed. This technique relies upon a data augmentation scheme, where the (unobserved) input are viewed as the missing data. In the technique described, the input signal distribution is modeled by a mixture of Gaussian distributions, enabling the use of explicit formula for computing the posterior density and conditional expectation and thus avoiding Monte-Carlo integrations. Because this technique is able to capture some salient features of the input signal distribution, it performs generally much better than third-order or fourth-order cumulant based techniques.

200 citations


Proceedings ArticleDOI
07 Jul 1997
TL;DR: This paper shows how PCA can be derived from a maximum-likelihood procedure, based on a specialisation of factor analysis, to develop a well-defined mixture model of principal component analyzers, and an expectation-maximisation algorithm for estimating all the model parameters is given.
Abstract: Principal component analysis (PCA) is a ubiquitous technique for data analysis but one whose effective application is restricted by its global linear character. While global nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data nonlinearity by a mixture of local PCA models. However, existing techniques are limited by the absence of a probabilistic formalism with an appropriate likelihood measure and so require an arbitrary choice of implementation strategy. This paper shows how PCA can be derived from a maximum-likelihood procedure, based on a specialisation of factor analysis. This is then extended to develop a well-defined mixture model of principal component analyzers, and an expectation-maximisation algorithm for estimating all the model parameters is given.

189 citations


Journal ArticleDOI
TL;DR: In this article, three different soft classifiers (fuzzy c-means classifier, linear mixture model, and probability values from a maximum likelihood classification) were used for unmixing of coarse pixel signatures to identify four land cover classes (i.e., supervised classifications).
Abstract: Three different 'soft' classifiers (fuzzy c-means classifier, linear mixture model, and probability values from a maximum likelihood classification) were used for unmixing of coarse pixel signatures to identify four land cover classes (i.e., supervised classifications). The coarse images were generated from a 30m Thematic Mapper (TM) image; one set by mean filtering, and another using an asymmetric filter kernel to simulate Multi-Spectral Scanner (MSS) sensor sampling. These filters collapsed together windows of up to 11 11 pixels. The fractional maps generated by the three classifiers were compared to truth maps at the corresponding scales, and to the results of a hard maximum likelihood classification. Overall, the fuzzy c-means classifier gave the best predictions of sub-pixel landcover areas, followed by the linear mixture model. The probabilities differed little from the hard classification, suggesting that the clusters should be modelled more loosely. This paper demonstrates successful meth...

166 citations


Journal ArticleDOI
TL;DR: This method of outlier handling combined with the classifier is applied to the well-known problem of automatic, constrained classification of chromosomes into their biological classes and it is shown that it decreases the error rate relative to the classical, normal, model by more than 50%.

Journal ArticleDOI
TL;DR: In this paper, a class of nonlinear population models with nonparametric second-stage priors for analyzing such data is proposed, based on a pharmacodynamic study involving longitudinal data consisting of hematologic profiles.
Abstract: Population pharmacokinetic and pharmacodynamic studies require analyzing nonlinear growth curves fit to multiple measurements from study subjects. We propose a class of nonlinear population models with nonparametric second-stage priors for analyzing such data. The proposed models apply a flexible class of mixtures to implement the nonparametric second stage. The discussion is based on a pharmacodynamic study involving longitudinal data consisting of hematologic profiles (i.e., blood counts measured over time) of cancer patients undergoing chemotherapy. We describe a full posterior analysis in a Bayesian framework. This includes prediction of future observations (profiles and end points for new patients), estimation of the mean response function for observed individuals, and inference on population characteristics. The mixture model is specified and given a hyperprior distribution by means of a Dirichlet processes prior on the mixing measure. Estimation is implemented by a combination of various M...

Posted Content
TL;DR: The statistical theory, simulation evidence on the per­formance of the EM estimation algorithm, and apply the model to a psychological study on the role of emotion in goal-directed behavior are described.
Abstract: This paper provides a general Structural Equation finite Mixture Model and algorithm (STEMM). Substantively, the model allows the researcher to simul­taneously treat heterogeneity and form groups in the context of a postulated causal (i.e., simultaneous equation regression) structure in which all the observables are measured with error. Methodologically, the model is more general than such sta­tistical methods as cluster analysis, confirmatory multigroup factor analysis, and multigroup structural equation models. In particular the general finite mixture model includes, as special cases, finite mixtures of simultaneous equations with feedback, confirmatory factor analysis, and confirmatory second-order factor models. We describe the statistical theory, present simulation evidence on the per­formance of the EM estimation algorithm, and apply the model to a psychological study on the role of emotion in goal-directed behavior. Finally we discuss several avenues for future research.

Journal ArticleDOI
TL;DR: An a posterior least squares orthogonal subspace projection (LSOSP) derived from OSP is presented on the basis of an a posteriori model so that the abundances of signatures can be estimated through observations rather than assumed to be known as in the a priori model.
Abstract: One of the primary goals of imaging spectrometry in Earth remote sensing applications is to determine identities and abundances of surface materials. In a recent study, an orthogonal subspace projection (OSP) was proposed for image classification. However, it was developed for an a priori linear spectral mixture model which did not take advantage of a posteriori knowledge of observations. In this paper, an a posterior least squares orthogonal subspace projection (LSOSP) derived from OSP is presented on the basis of an a posteriori model so that the abundances of signatures can be estimated through observations rather than assumed to be known as in the a priori model. In order to evaluate the OSP and LSOSP approaches, a Neyman-Pearson detection theory is developed where a receiver operating characteristic (ROC) curve is used for performance analysis. In particular, a locally optimal Neyman-Pearson's detector is also designed for the case where the global abundance is very small with energy close to zero a case to which both LSOSP and OSP cannot be applied. It is shown through computer simulations that the presented LSOSP approach significantly improves the performance of OSP.

Journal ArticleDOI
TL;DR: This paper solves completely the problem of testing the size of the mixture using maximum likelihood statistics in non identifiable models and derives the asymptotic distribution of the maximum likelihood statistic ratio which takes an unexpected form.
Abstract: In this paper, we address the problem of testing hypotheses using maximum likelihood statistics in non identifiable models. We derive the asymptotic distribution under very general assumptions. The key idea is a local reparameterization, depending on the underlying distribution, which is called locally conic. This method enlights how the general model induces the structure of the limiting distribution in terms of dimensionality of some derivative space. We present various applications of the theory. The main application is to mixture models. Under very general assumptions, we solve completely the problem of testing the size of the mixture using maximum likelihood statistics. We derive the asymptotic distribution of the maximum likelihood statistic ratio which takes an unexpected form.

Journal Article
TL;DR: In this paper, the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm is considered.
Abstract: In this paper use consider the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm. Two methods of estimation of the standard errors are considered: the standard information-based method and the computationally-intensive bootstrap method. They are compared empirically by their application to three real data sets and by a small-scale Monte Carlo experiment.

Journal ArticleDOI
TL;DR: In this paper, a general Structural Equation finite mixture model and algorithm (STEMM) is proposed, which allows the researcher to simul-taneously treat heterogeneity and form groups in the context of a postulated causal (i.e., simultaneous equation regression) structure in which all the observables are measured with error.
Abstract: This paper provides a general Structural Equation finite Mixture Model and algorithm (STEMM). Substantively, the model allows the researcher to simul­taneously treat heterogeneity and form groups in the context of a postulated causal (i.e., simultaneous equation regression) structure in which all the observables are measured with error. Methodologically, the model is more general than such sta­tistical methods as cluster analysis, confirmatory multigroup factor analysis, and multigroup structural equation models. In particular the general finite mixture model includes, as special cases, finite mixtures of simultaneous equations with feedback, confirmatory factor analysis, and confirmatory second-order factor models. We describe the statistical theory, present simulation evidence on the per­formance of the EM estimation algorithm, and apply the model to a psychological study on the role of emotion in goal-directed behavior. Finally we discuss several avenues for future research.

Journal ArticleDOI
TL;DR: Viewing the error as a combination of two terms, the approximation error measuring the adequacy of the model, and the estimation error resulting from the finiteness of the sample size, upper bounds are derived to the expected total error, thus obtaining bounds for the rate of convergence.

Proceedings Article
01 Jan 1997
TL;DR: A statistical model of pitch is developed that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving and which argues by a simple correlation model and empirically demonstrate that “clean” pitch is distributed with a lognormal distribution rather than the often assumed normal distribution.
Abstract: Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that “clean” pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the “one-session” condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1% miss rate and 11% reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

Proceedings ArticleDOI
07 Jul 1997
TL;DR: Empirical data drawn from English and Japanese text, as well as conversational speech, reveals that the "attraction" between words decays exponentially, while stylistic and syntactic contraints create a "repulsion" between Words that discourages close co-occurrence.
Abstract: This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as conversational speech, reveals that the "attraction" between words decays exponentially, while stylistic and syntactic contraints create a "repulsion" between words that discourages close co-occurrence. We show that these characteristics are well described by simple mixture models based on two-stage exponential distributions which can be trained using the EM algorithm. The resulting distance distributions can then be incorporated as penalizing features in an exponential language model.

01 Jan 1997
TL;DR: A learning based approach to speech recognition and person recognition from image sequences and it is shown that, besides speech information, the recovered model parameters also contain person dependent information and a novel method for person recognition is presented which is based on these features.
Abstract: This thesis presents a learning based approach to speech recognition and person recognition from image sequences. An appearance based model of the articulators is learned from example images and is used to locate, track, and recover visual speech features. A major difficulty in model based approaches is to develop a scheme which is general enough to account for the large appearance variability of objects but which does not lack in specificity. The method described here decomposes the lip shape and the intensities in the mouth region into weighted sums of basis shapes and basis intensities, respectively, using a Karhunen-Loeve expansion. The intensities deform with the shape model to provide shape independent intensity information. This information is used in image search, which is based on a similarity measure between the model and the image. Visual speech features can be recovered from the tracking results and represent shape and intensity information. A speechreading (lip-reading) system is presented which models these features by Gaussian distributions and their temporal dependencies by hidden Markov models. The models are trained using the EM-algorithm and speech recognition is performed based on maximum posterior probability classification. It is shown that, besides speech information, the recovered model parameters also contain person dependent information and a novel method for person recognition is presented which is based on these features. Talking persons are represented by spatio-temporal models which describe the appearance of the articulators and their temporal changes during speech production. Two different topologies for speaker models are described: Gaussian mixture models and hidden Markov models. The proposed methods were evaluated for lip localisation, lip tracking, speech recognition, and speaker recognition on an isolated digit database of 12 subjects, and on a continuous digit database of 37 subjects. The techniques were found to achieve good performance for all tasks listed above. For an isolated digit recognition task, the speechreading system outperformed previously reported systems and performed slightly better than untrained human speechreaders.

Journal ArticleDOI
TL;DR: A new learning algorithm for regression modeling based on deterministic annealing, which consistently and substantially outperformed the competing methods for training NRBF and HME regression functions over a variety of benchmark regression examples.
Abstract: We propose a new learning algorithm for regression modeling. The method is especially suitable for optimizing neural network structures that are amenable to a statistical description as mixture models. These include mixture of experts, hierarchical mixture of experts (HME), and normalized radial basis functions (NRBF). Unlike recent maximum likelihood (ML) approaches, we directly minimize the (squared) regression error. We use the probabilistic framework as means to define an optimization method that avoids many shallow local minima on the complex cost surface. Our method is based on deterministic annealing (DA), where the entropy of the system is gradually reduced, with the expected regression cost (energy) minimized at each entropy level. The corresponding Lagrangian is the system's "free-energy", and this annealing process is controlled by variation of the Lagrange multiplier, which acts as a "temperature" parameter. The new method consistently and substantially outperformed the competing methods for training NRBF and HME regression functions over a variety of benchmark regression examples.

Journal ArticleDOI
TL;DR: In this paper, the authors define the number of different populations of a large sample of a mixture of these populations is observed and propose a new method to estimate the population count when a large number of populations are observed.
Abstract: We propose a new method to estimate the number of different populations when a large sample of a mixture of these populations is observed. It is possible to define the number of different populations as the number of points in the support of the mixing distribution. For discrete distributions having a finite support, the number of support points can be characterized by Hankel matrices of the first algebraic moments, or Toeplitz matrices of the trigonometric moments. Namely, for one-dimensional distributions, the cardinality of the support may be proved to be the least integer such that the Hankel matrix (or the Toeplitz matrix) degenerates. Our estimator is based on this property. We first prove the convergence of the estimator, and then its exponential convergence under wide assumptions. The number of populations is not a priori bounded. Our method applies to a large number of models such as translation mixtures with known or unknown variance, scale mixtures, exponential families and various multivariate models. The method has an obvious computational advantage since it avoids any computation of estimates of the mixing parameters. Finally we give some numerical examples to illustrate the effectiveness of the method in the most popular cases.

Journal ArticleDOI
TL;DR: In this article, the authors used a mixture model to investigate pro-son bias in child health outcomes in Bangladesh and found that the mixture model revealed systematic differences in health outcomes between the two groups.

Journal ArticleDOI
TL;DR: The self-organizing map (SOM) algorithm for finite data is derived as an approximate maximum a posteriori estimation algorithm for a gaussian mixture model with a Gaussian smoothing prior, which is equivalent to a generalized deformable model (GDM).
Abstract: The self-organizing map (SOM) algorithm for finite data is derived as an approximate maximum a posteriori estimation algorithm for a gaussian mixture model with a gaussian smoothing prior, which is equivalent to a generalized deformable model (GDM). For this model, objective criteria for selecting hyperparameters are obtained on the basis of empirical Bayesian estimation and cross-validation, which are representative model selection methods. The properties of these criteria are compared by simulation experiments. These experiments show that the cross-validation methods favor more complex structures than the expected log likelihood supports, which is a measure of compatibility between a model and data distribution. On the other hand, the empirical Bayesian methods have the opposite bias.

Book ChapterDOI
TL;DR: The concept of restricted maximum likelihood estimation (REML), robust REML estimation, and Fellner's algorithmic approach are described in the chapter, which summarizes estimation based on maximising the Gaussian likelihood and discusses estimation basedon maximising a Student t likelihood and other modifications to the GaRussian likelihood.
Abstract: Publisher Summary This chapter discusses various approaches for the robust estimation of mixed models. It summarizes estimation based on maximising the Gaussian likelihood and discusses estimation based on maximising a Student t likelihood and other modifications to the Gaussian likelihood. The concept of restricted maximum likelihood estimation (REML), robust REML estimation, and Fellner's algorithmic approach are described in the chapter. The classical mixed linear model is obtained by assuming that the error components have Gaussian distributions. In this case, estimation is relatively straightforward. However, in practice, any of the observed error components can contain outliers, which result in non-Gaussian distributions for the error components and hence the response. Outlier contamination is often usefully represented by a mixture model in which a Gaussian kernel or core model is mixed with a contaminating distribution. Within this framework, various objectives can be entertained. Depending on the context, following objectives can be considered (1) estimating the parameters of the distributions of the error components, (2) estimating the variances of the distributions of the error components, whatever they happen to be, and (3) estimating the parameters of the core Gaussian distributions.

Book ChapterDOI
01 Jan 1997
TL;DR: In this paper, a discrete mixture distribution model is applied to item response data, where a particular IRT model does not hold for the entire sample but that different sets of model parameters (item parameters, ability parameters, etc.) are valid for different subpopulations.
Abstract: Discrete mixture distribution models (MDM) assume that observed data do not stem from a homogeneous population of individuals but are a mixture of data from two or more latent populations (Everitt and Hand, 1981; Titterington et al., 1985). Applied to item response data this means that a particular IRT model does not hold for the entire sample but that different sets of model parameters (item parameters, ability parameters, etc.) are valid for different subpopulations.

01 Jan 1997
TL;DR: A main theme throughout is to utilize independencies between attributes, to decrease the number of free parameters, and thus to increase the generalization capability of the method.
Abstract: This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a onelayer network implementing a naive Bayesian classifier. It is based on the assumption that different attributes of the objects appear independently of each other. This work has been aimed at extending the original Bayesian neural network model, mainly focusing on three different aspects. First the model is extended to a multi-layer network, to relax the independence requirement. This is done by introducing a hidden layer of complex columns, groups of units which take input from the same set of input attributes. Two different types of complex column structures in the hidden layer are studied and compared. An information theoretic measure is used to decide which input attributes to consider together in complex columns. Also used are ideas from Bayesian statistics, as a means to estimate the probabilities from data which are required to set up the weights and biases in the neural network. The use of uncertain evidence and continuous valued attributes in the Bayesian neural network are also treated. Both things require the network to handle graded inputs, i. e. probability distributions over some discrete attributes given as input. Continuous valued attributes can then be handled by using mixture models. In effect, each mixture model converts a set of continuous valued inputs to a discrete number of probabilities for the component densities in the mixture model. Finally a query-reply system based on the Bayesian neural network is described. It constitutes a kind of expert system shell on top of the network. Rather than requiring all attributes to be given at once, the system can ask for the attributes relevant for the classification. Information theory is used to select the attributes to ask for. The system also offers an explanatory mechanism, which can give simple explanations of the state of the network, in terms of which inputs mean the most for the outputs. These extensions to the Bayesian neural network model are evaluated on a set of different databases, both realistic and synthetic, and the classification results are compared to those of various other classification methods on the same databases. The conclusion is that the Bayesian neural network model compares favorably to other methods for classification. In this work much inspiration has been taken from various branches of machine learning. The goal has been to combine the different ideas into one consistent and useful neural network model. A main theme throughout is to utilize independencies between attributes, to decrease the number of free parameters, and thus to increase the generalization capability of the method. Significant contributions are the method used to combine the outputs from mixture models over different subspaces of the domain, and the use of Bayesian estimation of parameters in the expectation maximization method during training of the mixture models.

Proceedings Article
01 Dec 1997
TL;DR: The technique of stacking, previously only used for supervised learning, is applied to unsupervised learning and used for non-parametric multivariate density estimation, to combine finite mixture model and kernel density estimators.
Abstract: In this paper, the technique of stacking, previously only used for supervised learning, is applied to unsupervised learning. Specifically, it is used for non-parametric multivariate density estimation, to combine finite mixture model and kernel density estimators. Experimental results on both simulated data and real world data sets clearly demonstrate that stacked density estimation outperforms other strategies such as choosing the single best model based on cross-validation, combining with uniform weights, and even the single best model chosen by "cheating" by looking at the data used for independent testing.

Journal ArticleDOI
TL;DR: This paper shows how multiple shape hypotheses can be used to recognise complex line patterns using the expectation-maximisation algorithm, and illustrates the effectiveness of the recognition strategy by studying the registration of noisy radar data against a database of alternative cartographic maps for different locations.