scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 1994"


Proceedings Article
01 Jan 1994
TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.
Abstract: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset

4,978 citations


Journal ArticleDOI
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

2,418 citations


Journal ArticleDOI
TL;DR: In this paper, Gibbs sampling is used to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. And the data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters.
Abstract: SUMMARY A formal Bayesian analysis of a mixture model usually leads to intractable calculations, since the posterior distribution takes into account all the partitions of the sample. We present approximation methods which evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. The data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters. The fully conditional Gibbs alternative is shown to be ergodic and geometric convergence is established in the normal case. We also consider non-informative approximations associated with improper priors, assuming that the sample corresponds exactly to a k-component mixture.

895 citations


Journal ArticleDOI
TL;DR: In this article, a new class of pattern-mixture models for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables is described.
Abstract: SUMMARY Likelihood-based methods are developed for analyzing a random sample on two continuous variables when values of one of the variables are missing. Normal maximum likelihood estimates when values are missing completely at random were derived by Anderson (1957). They are also maximum likelihood providing the missing-data mechanism is ignorable, in Rubin's (1976) sense that the mechanism depends only on observed data. A new class of pattern-mixture models (Little, 1993) is described for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables. Maximum likelihood for models in this class is straightforward, and yields the estimates of Anderson (1957) when missingness depends solely on the completely observed variable, and the estimates of Brown (1990) when missingness depends solely on the incompletely observed variable. Another choice of linear combination yields estimates from complete-case analysis. Large-sample and Bayesian methods are described for this model. The data do not supply information about the ratio of the coefficients of the linear combination that controls missingness. If this ratio is not welldetermined based on prior knowledge, a prior distribution can be specified, and Bayesian inference is then readily accomplished. Alternatively, sensitivity of inferences can be displayed for a variety of choices of the ratio.

395 citations


Journal ArticleDOI
TL;DR: In this article, two approaches to estimating sub-pixel land cover composition are investigated, a linear mixture model and a regression model based on fuzzy membership functions, and significant correlation coefficients, all > 0·7, between the actual and predicted proportion of a land cover type within a pixel were obtained.
Abstract: Mixed pixels occur commonly in remotely-sensed imagery, especially those with a coarse spatial resolution. They are a problem in land-cover mapping applications since image classification routines assume ‘pure’ or homogeneous pixels. By unmixing a pixel into its component parts it is possible to enableinter alia more accurate estimation of the areal extent of different land cover classes. In this paper two approaches to estimating sub-pixel land cover composition are investigated. One is a linear mixture model the other is a regression model based on fuzzy membership functions. For both approaches significant correlation coefficients, all >0·7, between the actual and predicted proportion of a land cover type within a pixel were obtained. Additionally a case study is presented in which the accuracy of the estimation of tropical forest extent is increased significantly through the use of sub-pixel estimates of land-cover composition rather than a conventional image classification.

370 citations


ReportDOI
01 Dec 1994
TL;DR: A set of algorithms are described that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner that make two distinct appeals to the Expectation-Maximization principle.
Abstract: Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation

243 citations


Book ChapterDOI
01 Jan 1994
TL;DR: Analysis of clusters by means of mixture distribution, called mixture-model cluster analysis, has been one of the most difficult problems in statistics but theoretical work coupled with the development of new computational tools in the past ten years has made it possible to overcome some of the intractable technical and numerical issues.
Abstract: Analysis of clusters by means of mixture distribution, called mixture-model cluster analysis, has been one of the most difficult problems in statistics. But theoretical work, coupled with the development of new computational tools in the past ten years, has been made it possible to overcome some of the intractable technical and numerical issues that have limited the widespread applicability of mixture-model cluster analysis to complex real-word problems. The development of new objective analysis techniques had to wait the emergence of information-based model selection procedure to overcome difficulties with cinventional techniques within the context of the mixture-model cluster analysis. See, e.g., Bozdogan (1992), Windham and Cutler (1993) (in this volume)

204 citations


Journal ArticleDOI
TL;DR: A robust method for novelty detection is developed, which aims to minimize the number of heuristically chosen thresholds in the novelty decision process by growing a gaussian mixture model to form a representation of a training set of normal system states.
Abstract: The detection of novel or abnormal input vectors is of importance in many monitoring tasks, such as fault detection in complex systems and detection of abnormal patterns in medical diagnostics. We have developed a robust method for novelty detection, which aims to minimize the number of heuristically chosen thresholds in the novelty decision process. We achieve this by growing a gaussian mixture model to form a representation of a training set of "normal" system states. When previously unseen data are to be screened for novelty we use the same threshold as was used during training to define a novelty decision boundary. We show on a sample problem of medical signal processing that this method is capable of providing robust novelty decision boundaries and apply the technique to the detection of epileptic seizures within a data record.

195 citations


Posted Content
TL;DR: In finite mixture models, a sample of observations arises from a (initially specified) number of underlying classes of unknown proportions, and the purpose of the finite mixture approach is to decompose the sample into its mixture components.
Abstract: The development of mixture models can be historically traced back to the work of Newcomb (1886) and Pearson (1894) Mixture distributions have been of considerable interest in recent years leading to a vast number of methodological and applied papers, as well as to three dedicated mono­graphs (cf Everitt and Hand, 1981; Titterington, Smith, and Makov, 1985; and Mclachlan and Basford, 1988) In finite mixture models, it is assumed that a sample of observations arises from a (initially specified) number of underlying classes of unknown proportions A concrete form of the density of the observations in each of the underlying classes is specified, and the purpose of the finite mixture approach is to decompose the sample into its mixture components

170 citations


Journal ArticleDOI
Kathryn Roeder1
TL;DR: In this article, it was shown that a mixture of two normals divided by a normal density having the same mean and variance as the mixed density is always bimodal.
Abstract: When a population is assumed to be composed of a finite number of subpopulations, a natural model to choose is the finite mixture model. It will often be the case, however, that the number of component distributions is unknown and must be estimated. This problem can be difficult; for instance, the density of two mixed normals is not bimodal unless the means are separated by at least 2 standard deviations. Hence modality of the data per se can be an insensitive approach to component estimation. We demonstrate that a mixture of two normals divided by a normal density having the same mean and variance as the mixed density is always bimodal. This analytic result and other related results form the basis for a diagnostic and a test for the number of components in a mixture of normals. The density is estimated using a kernel density estimator. Under the null hypothesis, the proposed diagnostic can be approximated by a stationary Gaussian process. Under the alternative hypothesis, components in the mixtu...

131 citations


Journal ArticleDOI
TL;DR: In this article, a framework based on mixture methods is proposed for evaluating goodness of fit in the analysis of contingency tables, where a given model H applied to a contingency table P is considered, and the two-point mixture P = (1- r)I1 + rI2, with 7r the mixing proportion (O <, r < 1) and Hi and I2 the tables of probabilities for each latent class or component.
Abstract: SUMMARY A framework based on mixture methods is proposed for evaluating goodness of fit in the analysis of contingency tables. For a given model H applied to a contingency table P, we consider the two-point mixture P = (1- r)I1 + rI2, with 7r the mixing proportion (O <, r < 1) and Hi and I2 the tables of probabilities for each latent class or component. In the unstructured approach recommended here, the mixture model applies H to HI but does not impose any restrictions on I2. A contingency table P can generally be represented as such a two-point mixture for an interval of 7r-values. We define our index of lack of fit, ir*, to be the smallest such 7r, i.e. 7r* is the fraction of the population that cannot be described by model H. This approach can be contrasted with the structured approach that applies model H to both H, and I2 and leads to conventional latent class models when H is the hypothesis of independence. The case where H is the hypothesis of row-column independence and P is a two-way contingency table is covered in detail, but the procedure is quite general.

Journal ArticleDOI
01 Aug 1994
TL;DR: It is shown that RBF classifiers trained with error backpropagation give results almost identical to those obtained with a multilayer perceptron, and it is argued that the hidden-layer representation of such networks is much more powerful, especially if it is encoded in the form of a Gaussian mixture model.
Abstract: The paper considers a number of strategies for training radial basis function (RBF) classifiers. A benchmark problem is constructed using ten-dimensional input patterns which have to be classified into one of three classes. The RBF networks are trained using a two-phase approach (unsupervised clustering for the first layer followed by supervised learning for the second layer), error backpropagation (supervised learning for both layers) and a hybrid approach. It is shown that RBF classifiers trained with error backpropagation give results almost identical to those obtained with a multilayer perceptron. Although networks trained with the two-phase approach give slightly worse classification results, it is argued that the hidden-layer representation of such networks is much more powerful, especially if it is encoded in the form of a Gaussian mixture model. During training, the number of subclusters present within the training database can be estimated: during testing, the activities in the hidden layer of the classification network can be used to assess the novelty of input patterns and thereby help to validate network outputs.

Journal ArticleDOI
TL;DR: In this paper, a modification of the Laplace method is presented and applied to estimation of posterior functions in a Bayesian analysis of finite mixture distributions, which has high asymptotic accuracy for finite mixtures of certain exponential-family densities.
Abstract: An exact Bayesian analysis of finite mixture distributions is often computationally infeasible, because the number of terms in the posterior density grows exponentially with the sample size. A modification of the Laplace method is presented and applied to estimation of posterior functions in a Bayesian analysis of finite mixture distributions. The procedure, which involves computations similar to those required in maximum likelihood estimation, is shown to have high asymptotic accuracy for finite mixtures of certain exponential-family densities. For these mixture densities, the posterior density is also shown to be asymptotically normal. An approximation of the posterior density of the number of components is presented. The method is applied to Duncan's barley data and to a distribution of lake chemistry data for north-central Wisconsin.

Journal ArticleDOI
TL;DR: A mixture experiment involves varying the proportions of two or more ingredients, called components of the mixture, and studying the changes that occur in the measured properties (responses) of the components as mentioned in this paper.
Abstract: A mixture experiment involves varying the proportions of two or more ingredients, called components of the mixture, and studying the changes that occur in the measured properties (responses) of the...

Journal ArticleDOI
TL;DR: In this article, the authors provide sufficient conditions for the existence, consistency, and asymptotic normality of maximum likelihood estimators for the parameters in a useful parameterization of these models.

Journal ArticleDOI
TL;DR: In this paper, a sub-class of phase-type distributions is defined in terms of a Markov process with sequential transitions between transient states and transitions from these states to absorption, which can be fitted to data, and any structure revealed by the parameter estimates used to develop more parsimonious re-parametrizations.
Abstract: A sub–class of phase–type distributions is defined in terms of a Markov process with sequential transitions between transient states and transitions from these states to absorption. Such distributions form a very rich class; they can be fitted to data, and any structure revealed by the parameter estimates used to develop more parsimonious re–parametrizations. Several example data sets are used as illustrations. Copyright

Journal ArticleDOI
TL;DR: In this paper, a method for unmixing coarse resolution signals (of the NOAA-AVHRR type) through the use of multiple linear regression is presented, which allows the signal for each mixed coarse resolution pixel to be broken down thanks to a knowledge of land use and a linear mixture model.

Journal ArticleDOI
TL;DR: An important but difficult problem in practice is to determine the number of components in a normal mixture model with unequal variances, when the likelihood ratio test statistic--21og lambda is used, it is unbounded above and fails to satisfy standard regularity conditions.
Abstract: An important but difficult problem in practice is to determine the number of components in a normal mixture model with unequal variances. When the likelihood ratio test statistic--21og lambda is used, it is unbounded above and fails to satisfy standard regularity conditions. A restricted maximization procedure must therefore be used, which makes the procedure ad hoc. A consequence of this may explain the discrepancies among the simulation results of previous investigations.

Journal ArticleDOI
TL;DR: A hierarchical Maximum Likelihood Adaptive Neural System (MLANS), a new type of neural network that incorporates a model-based concept, leading to greatly increased learning efficiency compared to conventional, nonparametric neural networks is proposed for transient signal processing.

Journal ArticleDOI
01 Aug 1994
TL;DR: In this article, a partitioned mixture distribution is presented, which is essentially a set of overlapping mixture distributions, and an expectation maximization training algorithm is derived for optimising partitioned MMD according to the maximum likelihood description.
Abstract: Bayesian methods are used to analyse the problem of training a model to make predictions about the probability distribution of data that has yet to be received. Mixture distributions emerge naturally from this framework, but are not ideally matched to the density estimation problems that arise in image processing. An extension, called a partitioned mixture distribution is presented, which is essentially a set of overlapping mixture distributions. An expectation maximisation training algorithm is derived for optimising partitioned mixture distributions according to the maximum likelihood description. Finally, the results of some numerical simulations are presented, which demonstrate that lateral inhibition arises naturally in partitioned mixture distributions, and that the nodes in a partitioned mixture distribution network co-operate in such a way that each mixture distribution in the partitioned mixture distribution receives its necessary complement of computing machinery.

Journal ArticleDOI
TL;DR: In this article, a mixture model is proposed to estimate the location of the appropriate fractile directly and a formal Bayesian approach is derived and heuristic smoothing methods are developed.

Journal ArticleDOI
TL;DR: It is shown that even the seemingly innocuous assumption of equal variances for the components of the mixture can lead to surprisingly large asymptotic biases in the maximum likelihood estimators (MLEs) of the parameters.
Abstract: A finite mixture is a distribution where a given observation can come from any of a finite set of components. That is, the density of the random variable X is of the form f(x) = pi 1f1(x) + pi 2f2(x) + ... + pi kfk(x), where the pi i are the mixing proportions and the fi are the component densities. Mixture models are common in many areas of biology; the most commonly applied is a mixture of normal densities. Many of the problems with inference in the mixture setting are well known. Not so well documented, however, are the extreme biases that can occur in the maximum likelihood estimators (MLEs) when there is model misspecification. This paper shows that even the seemingly innocuous assumption of equal variances for the components of the mixture can lead to surprisingly large asymptotic biases in the MLEs of the parameters. Assuming normality when the underlying distributions are skewed can also lead to strong biases. We explicitly calculate the asymptotic biases when maximum likelihood is carried out assuming normality for several types of true underlying distribution. If the true distribution is a mixture of skewed components, then an application of the Box-Cox power transformation can reduce the asymptotic bias substantially. The power lambda in the Box-Cox transformation is in this case treated as an additional parameter to be estimated. In many cases the bias can be reduced to acceptable levels, thus leading to meaningful inference. A modest Monte Carlo study gives an indication of the small-sample performance of inference procedures (including the power and level of likelihood ratio tests) based on a likelihood that incorporates estimation of lambda. A real data example illustrates the method.


Book ChapterDOI
01 Jan 1994
TL;DR: In this paper, the problem of choosing the number of component clusters of individuals, determining the variables which are contributing to the differences between the clusters using all possible subset selection of variables, and detecting outliers or extreme observations across the clustering alternatives in one expert-system simultaneously within the context of the standard mixture of multivariate normal distributions is considered.
Abstract: This paper considers the problem of choosing the number of component clusters of individuals, determining the variables which are contributing to the differences between the clusters using all possible subset selection of variables, and detecting outliers or extreme observations across the clustering alternatives in one expert-system simultaneously within the context of the standard mixture of multivariate normal distributions. This is achieved by introducing and deriving a new informational measure of complexity (ICOMP) criterion of the estimated inverse-Fisher information matrix (IFIM) developed by Bozdogan as an alternative to Akaike’ s information criterion (AIC), and Bozdogan’s CAIC for the mixture-model. A numerical example is shown on a real data set to illustrate the significance of these validity functionals.

Proceedings ArticleDOI
15 Jul 1994
TL;DR: This paper introduces a two stage process for MR tissue classification which addresses both the statistical characteristics of tissue classes and retraining of any Bayesian classifier by utilizing techniques from both image processing and statistics.
Abstract: Inhomogeneities in the fields of magnetic resonance (MR) systems cause the statistical characteristics of tissue classes to vary within the resulting MR images. These inhomogeneities must be taken into consideration when designing an algorithm for automated tissue classification. The traditional approach in image processing would be to apply a gain field correction technique to remove the inhomogeneities from the images. Statistical solutions would most likely focus on including spatial information in the feature space of the classifier so that it can be trained to model and adjust for the inhomogeneities. This paper will prove that neither of these general approaches offer a complete and viable solution. This paper will prove that neither of these general approaches offers a complete and viable solution. This paper will in fact show that not only do the inhomogeneities modify the local mean and variance of a tissue class as is commonly accepted, but the inhomogeneities also induce a rotation of the covariance matrices. As a result, gain field correction techniques cannot compensate for all of the artifacts associated with inhomogeneities. Additionally, it will be demonstrated that while statistical methods can capture all of the anomalies, the across patient and across time variations of the inhomogeneities necessitate frequent and time consuming retraining of any Bayesian classifier. This paper introduces a two stage process for MR tissue classification which addresses both of these issues by utilizing techniques from both image processing and statistics. First, a band-pass mean field corrector is used to alleviate the mean and variance deformations in each image. Then, using a kernel mixture model classifier couple to an interactive data augmentation tool, the user can selectively refine and explore the class representations for localized regions of the image and thereby capture the rotation of the covariance matrices. This approach is shown to outperform Gaussian classifiers and 4D mixture modeling techniques when both the final accuracy and user time requirements are considered.

Book ChapterDOI
TL;DR: In this paper, a mixture of normal distributions and Markov-switching models is applied to model the leptokurtosis and heteroskedasticity of exchange-rate dynamics.
Abstract: Mixtures of normal distributions and Markov-switching models are applied to model the leptokurtosis and heteroskedasticity of exchange-rate dynamics. The mixtures of normal distributions capture well the leptokurtosis of the data whereas the Markovswitching models capture both the leptokurtosis and the heteroskedasticity. There is strong evidence against Gaussian white noise in high-frequency data. Foreign-currency option prices derived under the assumption of Gaussian white noise differ systematically and significantly from prices derived under the assumptions of a mixture model or a Markovswitching model. The typical ”smile effects” are caused by the implied peakedness and fat tails of the models.

Proceedings ArticleDOI
27 Jun 1994
TL;DR: Two approaches to determining the optimal number of component Gaussians to include in a Gaussian mixture model are compared, the Akaike information criterion and the Rissanen (1989) minimum description length method.
Abstract: We compare two approaches to determining the optimal number of component Gaussians to include in a Gaussian mixture model, the Akaike information criterion and the Rissanen (1989) minimum description length method. >

Journal ArticleDOI
Hal S. Stern1, Doreen Arcus1, Jerome Kagan1, Donald B. Rubin1, Nancy Snidman1 
TL;DR: In this paper, a finite mixture model was applied to two sets of longitudinal observations of infants and young children, and a measure of predictive efficacy was described for comparing the mixture model with competing models, principally a linear regression analysis.
Abstract: Temperamental characteristics can be conceptualized as either continuous dimensions or qualitative categories. The distinction concerns the underlying temperamental characteristics rather than the measured variables, which can usually be recorded as either continuous or categorical variables. A finite mixture model captures the categorical view, and we apply such a model here to two sets of longitudinal observations of infants and young children. A measure of predictive efficacy is described for comparing the mixture model with competing models, principally a linear regression analysis. The mixture model performs mildly better than the linear regression model with respect to this measure of fit to the data; however, the primary advantage of the mixture model relative to competing approaches, is that, because it matches our a priori theory, it can be easily used to address improvements and corrections to the theory, and to suggest extensions of the research.

Journal ArticleDOI
TL;DR: The expectation-maximization (EM) algorithm is developed to derive the model parameters by the method of maximum likelihood and an approximation of the information matrix when using the EM algorithm is given.
Abstract: The apparent conflict between the biometrician and Mendelian genetics has been recently resolved by the introduction of a genetic mixed model to analyze continuous traits: measured on human families and to elucidate the mechanism of underlying major genes. The mixed model formulated by Elston and Stewart (1971, Human Heredity 21, 523-542), extended by Morton and MacLean (1974; American Journal of Human Genetics 26, 489-503), and reviewed, with further extensions, by Boyle and Elston (1979, Biometrics 35, 55-68) has become an extremely useful tool of wide applicability in the field of genetic epidemiology. This model allows for segregation at a major locus, a polygenic effect, and a sibling environmental variation. The main concern of this paper is with estimating the model parameters by the method of maximum likelihood. The expectation-maximization (EM) algorithm is developed to derive the estimates iteratively. An approximation of the information matrix when using the EM algorithm is given. We illustrate the methodology by fitting the model to the arterial blood pressure data collected by Miall and Oldham (1955, Clinical Science 14, 459-487).

Proceedings ArticleDOI
06 Sep 1994
TL;DR: Two combining methods are evaluated for several speaker recognition tasks, including speaker verification and closed set speaker identification, and the results show the both methods to yield advantages for the speaker Recognition tasks.
Abstract: A new classification system for text-independent speaker recognition is presented. This system combines the output probabilities of distortion-based classifiers and a discriminant-based classifier. The distortion-based classifiers are the vector quantization (VQ) classifier and Gaussian mixture model (GMM). The discriminant-based classifier is the neural tree network (NTN). The VQ and GMM classifiers provide output probabilities that represent the distortion between the observation and the model. Hence, these probabilities provide an intraclass measure. The NTN classifier is based on discriminant training and provides output probabilities that represent an interclass measure. Since, these two classifiers base their decision on different criteria, they can be effectively combined to yield improved performance. Two combining methods are evaluated for several speaker recognition tasks, including speaker verification and closed set speaker identification. The results show the both methods to yield advantages for the speaker recognition tasks. >