scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 1996"


Journal ArticleDOI
T.K. Moon1
TL;DR: The EM (expectation-maximization) algorithm is ideally suited to problems of parameter estimation, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation.
Abstract: A common task in signal processing is the estimation of the parameters of a probability distribution function Perhaps the most frequently encountered estimation problem is the estimation of the mean of a signal in noise In many parameter estimation problems the situation is more complicated because direct access to the data necessary to estimate the parameters is impossible, or some of the data are missing Such difficulties arise when an outcome is a result of an accumulation of simpler outcomes, or when outcomes are clumped together, for example, in a binning or histogram operation There may also be data dropouts or clustering in such a way that the number of underlying data points is unknown (censoring and/or truncation) The EM (expectation-maximization) algorithm is ideally suited to problems of this sort, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation The EM algorithm is presented at a level suitable for signal processing practitioners who have had some exposure to estimation theory

2,573 citations


Journal ArticleDOI
TL;DR: The mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures is built up and an explicit expression for the matrix is provided.
Abstract: We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.

849 citations


Journal ArticleDOI
Y. Vardi1
TL;DR: In this article, the problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path between each directed pair of nodes) and Markovian (a random path between a pair of vertices, determined according to a known Markov chain fixed for that pair).
Abstract: The problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path between each directed pair of nodes) and Markovian (a random path between each directed pair of nodes, determined according to a known Markov chain fixed for that pair). Maximum likelihood estimation and related approximations are discussed, and computational difficulties are pointed out. A detailed methodology is presented for estimates based on the method of moments. The estimates are derived algorithmically, taking advantage of the fact that the first and second moment equations give rise to a linear inverse problem with positivity restrictions that can be approached by an EM algorithm, resulting in a particularly simple solution to a hard problem. A small simulation study is carried out.

801 citations


Journal ArticleDOI
TL;DR: This paper fits Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered.
Abstract: Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered. Low dimensional views are an important by-product of LDA-our new techniques inherit this feature. We can control the within-class spread of the subclass centres relative to the between-class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA.

791 citations


21 May 1996
TL;DR: This work presents an exact Expectation{Maximization algorithm for determining the parameters of this mixture of factor analyzers which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians.
Abstract: Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing di erent local factor models in di erent regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact Expectation{Maximization algorithm for tting the parameters of this mixture of factor analyzers.

705 citations


Journal Article
TL;DR: An extended EM algorithm is used to minimize the information divergence (maximize the relative entropy) in the density approximation case and fits to Weibull, log normal, and Erlang distributions are used as illustrations of the latter.
Abstract: Estimation from sample data and density approximation with phase-type distribu- tions are considered. Maximum likelihood estimation via the EM algorithm is discussed and performed for some data sets. An extended EM algorithm is used to minimize the information divergence (maximize the relative entropy) in the density approximation case. Fits to Weibull, log normal, and Erlang distributions are used as illustrations of the latter.

690 citations


Journal ArticleDOI
01 Apr 1996-Heredity
TL;DR: It is concluded that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.
Abstract: We generalize an approach suggested by Hill (Heredity, 33, 229-239, 1974) for testing for significant association among alleles at two loci when only genotype and not haplotype frequencies are available. The principle is to use the Expectation-Maximization (EM) algorithm to resolve double heterozygotes into haplotypes and then apply a likelihood ratio test in order to determine whether the resolutions of haplotypes are significantly nonrandom, which is equivalent to testing whether there is statistically significant linkage disequilibrium between loci. The EM algorithm in this case relies on the assumption that genotype frequencies at each locus are in Hardy-Weinberg proportions. This method can accommodate X-linked loci and samples from haplodiploid species. We use three methods for testing significance of the likelihood ratio: the empirical distribution in a large number of randomized data sets, the X2 approximation for the distribution of likelihood ratios, and the Z2 test. The performance of each method is evaluated by applying it to simulated data sets and comparing the tail probability with the tail probability from Fisher's exact test applied to the actual haplotype data. For realistic sample sizes (50-150 individuals) all three methods perform well with two or three alleles per locus, but only the empirical distribution is adequate when there are five to eight alleles per locus, as is typical of hypervariable loci such as microsatellites. The method is applied to a data set of 32 microsatellite loci in a Finnish population and the results confirm the theoretical predictions. We conclude that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.

556 citations


Journal ArticleDOI
TL;DR: A new, full Bayesian approach based on the method of Gibbs sampling is developed, and it is shown that the latent variables, one for each observation, can be simulated from their joint distribution given the data and the remaining parameters.

532 citations


Journal ArticleDOI
TL;DR: This work proposes a new approach to statistically optimal image reconstruction based on direct optimization of the MAP criterion, which requires approximately the same amount of computation per iteration as EM-based approaches, but the new method converges much more rapidly.
Abstract: Over the past years there has been considerable interest in statistically optimal reconstruction of cross-sectional images from tomographic data. In particular, a variety of such algorithms have been proposed for maximum a posteriori (MAP) reconstruction from emission tomographic data. While MAP estimation requires the solution of an optimization problem, most existing reconstruction algorithms take an indirect approach based on the expectation maximization (EM) algorithm. We propose a new approach to statistically optimal image reconstruction based on direct optimization of the MAP criterion. The key to this direct optimization approach is greedy pixel-wise computations known as iterative coordinate decent (ICD). We propose a novel method for computing the ICD updates, which we call ICD/Newton-Raphson. We show that ICD/Newton-Raphson requires approximately the same amount of computation per iteration as EM-based approaches, but the new method converges much more rapidly (in our experiments, typically five to ten iterations). Other advantages of the ICD/Newton-Raphson method are that it is easily applied to MAP estimation of transmission tomograms, and typical convex constraints, such as positivity, are easily incorporated.

493 citations


Journal ArticleDOI
01 Oct 1996
TL;DR: The authors present a row-action maximum likelihood algorithm (RAMLA) as an alternative to the EM algorithm for maximizing the Poisson likelihood in ECT and show that their modification converges to a ML solution whereas the standard OS-EM does not.
Abstract: The maximum likelihood (ML) approach to estimating the radioactive distribution in the body cross section has become very popular among researchers in emission computed tomography (ECT) since it has been shown to provide very good images compared to those produced with the conventional filtered backprojection (FBP) algorithm. The expectation maximization (EM) algorithm is an often-used iterative approach for maximizing the Poisson likelihood in ECT because of its attractive theoretical and practical properties. Its major disadvantage is that, due to its slow rate of convergence, a large amount of computation is often required to achieve an acceptable image. Here, the authors present a row-action maximum likelihood algorithm (RAMLA) as an alternative to the EM algorithm for maximizing the Poisson likelihood in ECT. The authors deduce the convergence properties of this algorithm and demonstrate by way of computer simulations that the early iterates of RAMLA increase the Poisson likelihood in ECT at an order of magnitude faster that the standard EM algorithm. Specifically, the authors show that, from the point of view of measuring total radionuclide uptake in simulated brain phantoms, iterations 1, 2, 3, and 4 of RAMLA perform at least as well as iterations 45, 60, 70, and 80, respectively, of EM. Moreover, the authors show that iterations 1, 2, 3, and 4 of RAMLA achieve comparable likelihood values as iterations 45, 60, 70, and 80, respectively, of EM. The authors also present a modified version of a recent fast ordered subsets EM (OS-EM) algorithm and show that RAMLA is a special case of this modified OS-EM. Furthermore, the authors show that their modification converges to a ML solution whereas the standard OS-EM does not.

434 citations


Journal ArticleDOI
TL;DR: In experiments with synthetic noise-free and additive noisy projection data of dental phantoms, it is found that both simultaneous iterative algorithms produce superior image quality as compared to filtered backprojection after linearly fitting projection gaps.
Abstract: Iterative deblurring methods using the expectation maximization (EM) formulation and the algebraic reconstruction technique (ART), respectively, are adapted for metal artifact reduction in medical computed tomography (CT). In experiments with synthetic noise-free and additive noisy projection data of dental phantoms, it is found that both simultaneous iterative algorithms produce superior image quality as compared to filtered backprojection after linearly fitting projection gaps. Furthermore, the EM-type algorithm converges faster than the ART-type algorithm in terms of either the I-divergence or Euclidean distance between ideal and reprojected data in the authors' simulation. Also, for a given iteration number, the EM-type deblurring method produces better image clarity but stronger noise than the ART-type reconstruction. The computational complexity of EM- and ART-based iterative deblurring is essentially the same, dominated by reprojection and backprojection. Relevant practical and theoretical issues are discussed.

Journal ArticleDOI
TL;DR: The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR).
Abstract: Researchers often face a dilemma: Should they collect little data and emphasize quality, or much data at the expense of quality? The utility of the 3-form design coupled with maximum likelihood methods for estimation of missing values was evaluated. In 3-form design surveys, four sets of items. X, A, B, and C are administered: Each third of the subjects receives X and one combination of two other item sets - AB, BC, or AC. Variances and covariances were estimated with pairwise deletion, mean replacement, single imputation, multiple imputation, raw data maximum likelihood, multiple-group covariance structure modeling, and Expectation-Maximization (EM) algorithm estimation. The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR). Pairwise deletion provided equally unbiased estimates but was less efficient than ML procedures. Further simulation results demonstrated that nun-maximum likelihood methods break down when data are not missing completely at random. Application of these methods with empirical drug use data resulted in similar covariance matrices for pairwise and EM estimation, however, ML estimation produced better and more efficient regression estimates. Maximum likelihood estimation or multiple imputation procedures. which are now becoming more readily available, are always recommended. In order to maximize the efficiency of the ML parameter estimates, it is recommended that scale items be split across forms rather than being left intact within forms.

Journal ArticleDOI
TL;DR: It is demonstrated that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem and able to map input sequences to output sequences, using the same processing style as recurrent neural networks.
Abstract: We consider problems of sequence processing and propose a solution based on a discrete-state model in order to represent past context. We introduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call input-output hidden Markov model (IOHMM). It can be trained by the estimation-maximization (EM) or generalized EM (GEM) algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: This work shows how to add spatial constraints to the mixture formulations and presents a variant of the EM algorithm that makes use of both the form and the motion constraints and estimates the number of segments given knowledge about the level of model failure expected in the sequence.
Abstract: Describing a video sequence in terms of a small number of coherently moving segments is useful for tasks ranging from video compression to event perception. A promising approach is to view the motion segmentation problem in a mixture estimation framework. However, existing formulations generally use only the motion, data and thus fail to make use of static cues when segmenting the sequence. Furthermore, the number of models is either specified in advance or estimated outside the mixture model framework. In this work we address both of these issues. We show how to add spatial constraints to the mixture formulations and present a variant of the EM algorithm that males use of both the form and the motion constraints. Moreover this algorithm estimates the number of segments given knowledge about the level of model failure expected in the sequence. The algorithm's performance is illustrated on synthetic and real image sequences.

Journal ArticleDOI
TL;DR: MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors, utilizing both the EM algorithm and a Fisher-scoring solution.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric maximum likelihood estimator (MLE) of the (N, F) pair is derived for any specified population size N and the MLE of the pair and thus for N is determined.
Abstract: We conduct nonparametric maximum likelihood estimation under two common heterogeneous closed population capture-recapture models. Our models specify mixture models (as did previous researchers' models) which have a common generating distribution, say F, for the capture probabilities. Using Lindsay and Roeder's (1992, Journal of the American Statistical Association 87, 785-794) mixture model results and the EM algorithm, a nonparametric maximum likelihood estimator (MLE) of F for any specified population size N is obtained. Then, the nonparametric MLE of the (N, F) pair and thus for N is determined. Perhaps most importantly, since our MLE pair maximizes the likelihood under the entire nonparametric probability model, it provides an excellent foundation for estimating properties of estimators, conducting a goodness-of-fit test, and performing a likelihood ratio test. These are illustrated in the paper.

Journal ArticleDOI
TL;DR: The methods are illustrated on a data set involving alternative dosage regimens for the treatment of schizophrenia using haloperidol and on a regression example, where the new methods are compared with complete-case analysis and maximum likelihood for a probit selection model.
Abstract: Pattern-mixture models stratify incomplete data by the pattern of missing values and formulate distinct models within each stratum. Pattern-mixture models are developed for analyzing a random sample on continuous variables y(1), y(2) when values of y(2) are nonrandomly missing. Methods for scalar y(1) and y(2) are here generalized to vector y(1) and y(2) with additional fixed covariates x. Parameters in these models are identified by alternative assumptions about the missing-data mechanism. Models may be underidentified (in which case additional assumptions are needed), just-identified, or overidentified. Maximum likelihood and Bayesian methods are developed for the latter two situations, using the EM and SEM algorithms, direct and interactive simulation methods. The methods are illustrated on a data set involving alternative dosage regimens for the treatment of schizophrenia using haloperidol and on a regression example. Sensitivity to alternative assumptions about the missing-data mechanism is assessed, and the new methods are compared with complete-case analysis and maximum likelihood for a probit selection model.

Journal ArticleDOI
TL;DR: A method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline using a novel elastic matching procedure based on the expectation maximization algorithm.
Abstract: We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the expectation maximization algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages: 1) the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style; 2) the generative models can perform recognition driven segmentation; 3) the method involves a relatively small number of parameters and hence training is relatively easy and fast; and 4) unlike many other recognition schemes, it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated that our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is that it requires much more computation than more standard OCR techniques.

Journal ArticleDOI
TL;DR: This paper considers new iterative multiuser receivers based on the expectation-maximization (EM) algorithm and related, more powerful "space-alternating" algorithms that alternately updates individual parameter components or treats them as probabilistic missing data.
Abstract: Maximum-likelihood detection for the multiuser code-division multiple-access (CDMA) channel is prohibitively complex. This paper considers new iterative multiuser receivers based on the expectation-maximization (EM) algorithm and related, more powerful "space-alternating" algorithms. The latter algorithms include the SAGE algorithm and a new "missing parameter" space-alternating algorithm that alternately updates individual parameter components or treats them as probabilistic missing data. Application of these EM-based algorithms to the problem of discrete parameter estimation (i.e., data detection) in the Gaussian multiple-access channel leads to a variety of convergent receiver structures that incorporate soft-decision feedback for interference cancellation and/or sequential updating of iterative bit estimates. Convergence and performance analyses are based on well-known properties of the EM algorithm and on numerical simulation.

Journal ArticleDOI
TL;DR: An EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion is presented, initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution.
Abstract: This paper presents an EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution This is of value because the ML estimates of the GLM parameters may be sensitive to the specification of a parametric form for the mixing distribution A listing of a GLIM4 algorithm for fitting the overdispersed binomial logit model is given in an appendix

Journal ArticleDOI
TL;DR: It is shown that, for some particular mixture situations, the SEM algorithm is almost always preferable to the EM and “simulated annealing” versions SAEM and MCEM.
Abstract: We compare three different stochastic versions of the EM algorithm: The Stochastic EM algorithm (SEM), the ''Simulated Annealing'' EM algorithm (SAEM) and the Monte Carlo EM algorithm (MCEM). We focus particularly on the mixture of distributions problem. In this context, we investigate the practical behaviour of these algorithms through intensive Monte Carlo numerical simulations and a real data study. We show that, for some particular mixture situations, the SEM algorithm is almost always preferable to the EM and ''simulated annealing'' versions SAEM and MCEM. For some severely overlapping mixtures, however, none of these algorithms can be confidently used. Then, SEM can be used as an efficient data exploratory tool for locating significant maxima of the likelihood function. In the real data case, we show that the SEM stationary distribution provides a contrasted view of the loglikelihood by emphasizing sensible maxima.

Journal ArticleDOI
Tobias Rydén1
TL;DR: This paper presents an EM algorithm for computing maximum-likelihood estimates of the parameters of a Markov-modulated Poisson process, and compares it to the Nelder-Mead downhill simplex algorithm.

Journal ArticleDOI
TL;DR: Two ways of implementing Monte Carlo Expectation Maximization algorithm to fit a FIIF model are illustrated, using the Gibbs sampler to carry out the computation for the E steps and how to use bridge sampling to simulate the likelihood ratios for monitoring the convergence of a Monte Carlo EM.
Abstract: Based on item response theory, Bock and Aitken introduced a method of item factor analysis, termed full-information item factor (FIIF) analysis by Bartholomew because it uses all distinct item response vectors as data. But a limitation of their fitting algorithm is its reliance on fixed-point Gauss—Hermite quadrature, which can produce appreciable numerical errors, especially in high-dimension problems. The first purpose of this article is to offer more reliable methods by using recent advances in statistical computation. Specifically, we illustrate two ways of implementing Monte Carlo Expectation Maximization (EM) algorithm to fit a FIIF model, using the Gibbs sampler to carry out the computation for the E steps. We also show how to use bridge sampling to simulate the likelihood ratios for monitoring the convergence of a Monte Carlo EM, a strategy that is useful in general. Simulations demonstrate substantial improvement over Bock and Aitken's algorithm in recovering known factor loadings in hig...

Journal ArticleDOI
TL;DR: In this paper, a conditional model for the covariate distribution was proposed to reduce the number of nuisance parameters for the distribution of the covariates in the E-step of the EM algorithm.
Abstract: SUMMARY Incomplete covariate data arise in many data sets. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990). This method requires the estimation of many nuisance parameters for the distribution of the covariates. Unfortunately, in data sets when the percentage of missing data is high, and the missing covariate patterns are highly non-monotone, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modelling advantages for the E-step and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples. We present a clinical trials example with six covariates, five of which have some missing values.

Journal ArticleDOI
TL;DR: In this paper, three alternative estimation procedures based on the EM algorithm are considered, two of them make use of numerical integration techniques (Gauss-Hermite or Monte Carlo), and the third one is a EM type algorithm based on posterior modes.

Journal ArticleDOI
TL;DR: The class of mixture transition distribution (MTD) time series models is extended to general non-Gaussian time series and the stationarity and autocorrelation properties of the models are derived.
Abstract: The class of mixture transition distribution (MTD) time series models is extended to general non-Gaussian time series. In these models the conditional distribution of the current observation given the past is a mixture of conditional distributions given each one of the last p observations. They can capture non-Gaussian and nonlinear features such as flat stretches, bursts of activity, outliers changepoints in a single unified model class. They can also represent time series defined on arbitrary state spaces, univariate or multivariate, continuous, discrete or mixed, which need not even be Euclidean. They perform well in the usual case of Gaussian time series without obvious nonstandard behaviors. The models are simple, analytically tractable, easy to simulate readily estimated. The stationarity and autocorrelation properties of the models are derived. A simple EM algorithm is given and shown to work well for estimation. The models are applied to several real and simulated datasets with satisfacto...

Posted Content
TL;DR: In this article, a two-stage maximum likelihood estimation technique is proposed for cointegrated vector autoregressive processes where Markovian shifts occur in the equilibrium mean and the drift of the system.
Abstract: This paper suggests a new methodological approach to the analysis of cointegrated linear systems subject to changes in regime. We consider cointegrated vector autoregressive processes where Markovian shifts occur in the equilibrium mean and the drift of the system. A two-stage maximum likelihood estimation technique is proposed. In the first stage, based on a finite order VAR approximation of the cointegrated VARMA representation, the Johansen cointegration analysis is invoked to determine the cointegration rank and to estimate the cointegration matrix. An EM algorithm delivers the maximum likelihood estimates of the remaining parameters. The methodology is illustrated with an investigation of international and global business cycles.

Journal ArticleDOI
TL;DR: In this article, a mixture model was used to estimate the parameters of a prospective logistic model in a case-control study with dichotomous response D that depends on a covariate X for a portion of the sample, both the gold standard X and a surrogate covariate W are available.
Abstract: Methods are devised for estimating the parameters of a prospective logistic model in a case-control study with dichotomous response D that depends on a covariate X For a portion of the sample, both the gold standard X and a surrogate covariate W are available; however, for the greater portion of the data, only the surrogate covariate W is available By using a mixture model, the relationship between the true covariate and the response can be modeled appropriately for both types of data The likelihood depends on the marginal distribution of X and the measurement error density (W|X, D) The latter is modeled parametrically based on the validation sample The marginal distribution of the true covariate is modeled using a nonparametric mixture distribution In this way we can improve the efficiency and reduce the bias of the parameter estimates The results also apply when there is no validation data provided the error distribution is known or estimated from an independent data source Many of the

Journal ArticleDOI
TL;DR: In this paper, an EM algorithm for exact maximum likelihood estimation of the population parameters for nonlinear random effects models was introduced, which can account for both within-and between-individual sources of variability and serial correlation within individual observations when analyzing unbalanced repeated measures data.
Abstract: The pharmaceutical industry is currently interested in the population approach and population models, also known as mixed effects models and random effects models depending on the precise form. Population models are useful in that they can account for both withinand between-individual sources of variability and serial correlation within individual observations when analyzing unbalanced repeated measures data. The modelling of population pharmacodynamic or pharmacokinetic profiles typically involves nonlinear random effects models. Each individual's observations are modelled by identical (up to unknown parameter values) nonlinear regression models, with the distribution of the observations, or a transformation of the observations, about expected responses taken to be normal, with the degree of variability described by a variance model. Between-individual variability is modelled by a population distribution for the individual regression parameter values (random effects). In a parametric analysis the population distribution is taken to be normal, the parameters of which, along with the parameters of the variance model, are known as the population parameters. Maximum likelihood estimation of the population parameters for nonlinear random effects models was pioneered by Beal and Sheiner (1979), and since then a number of algorithms have appeared for approximate maximum likelihood, including Steimer et al. (1984), Lindstrom and Bates (1990), Beal and Sheiner (1992), and Mentre and Gomeni (1995). All of these algorithms are approximate in some way. For a summary see Beal and Sheiner (1992), Wolfinger (1993), Pinheiro and Bates (1994), and Davidian and Giltinan (1995). In this paper an EM algorithm for exact maximum likelihood estimation is introduced. An EM algorithm obtaining maximum likelihood estimates for linear random effects models was introduced by Dempster, Laird, and Rubin (1977). Laird and Ware (1982), Lindstrom and Bates (1988), Jennrich and Schluchter (1986), and Liu and Rubin (1994) all describe hybrid EM algorithms for the linear random effects model. A true EM algorithm for the linear model is described by Jamshidian and Jennrich (1993). Mentre and Gomeni (1995) describe an approximate EM algorithm for nonlinear random effects models and, from the algorithm given in this paper, it can be seen clearly how their approximations arise. The present algorithm uses Monte Carlo methods to perform the E step, a strategy previously adopted in an altogether different model by Guo and Thompson (1994). Guo and Thompson require a Gibbs sampler, that is, a Markov chain Monte Carlo method for their E step, but the present algorithm uses independent samples. In Section 2 of this paper the nonlinear random effects model is described. Section 3 gives the EM algorithm without random effect covariates, while Section 4 gives the modified algorithm in the

Journal ArticleDOI
TL;DR: A new feature selection procedure based on the Kullback J-divergence between two class conditional density functions approximated by a finite mixture of parameterized densities of a special type is presented, which simultaneously yields a pseudo-Bayes decision rule.
Abstract: A new feature selection procedure based on the Kullback J-divergence between two class conditional density functions approximated by a finite mixture of parameterized densities of a special type is presented. This procedure is suitable especially for multimodal data. Apart from finding a feature subset of any cardinality without involving any search procedure, it also simultaneously yields a pseudo-Bayes decision rule. Its performance is tested on real data.