scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 1994"


Journal ArticleDOI
TL;DR: Ordered subsets EM (OS-EM) provides a restoration imposing a natural positivity condition and with close links to the EM algorithm, applicable in both single photon (SPECT) and positron emission tomography (PET).
Abstract: The authors define ordered subset processing for standard algorithms (such as expectation maximization, EM) for image restoration from projections. Ordered subsets methods group projection data into an ordered sequence of subsets (or blocks). An iteration of ordered subsets EM is defined as a single pass through all the subsets, in each subset using the current estimate to initialize application of EM with that data subset. This approach is similar in concept to block-Kaczmarz methods introduced by Eggermont et al. (1981) for iterative reconstruction. Simultaneous iterative reconstruction (SIRT) and multiplicative algebraic reconstruction (MART) techniques are well known special cases. Ordered subsets EM (OS-EM) provides a restoration imposing a natural positivity condition and with close links to the EM algorithm. OS-EM is applicable in both single photon (SPECT) and positron emission tomography (PET). In simulation studies in SPECT, the OS-EM algorithm provides an order-of-magnitude acceleration over EM, with restoration quality maintained. >

3,740 citations


Journal ArticleDOI
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

2,418 citations


Journal ArticleDOI
TL;DR: This article introduces an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights, and in many applications this new procedure works very well without the need for iterations.
Abstract: For missing data problems, Tanner and Wong have described a data augmentation procedure that approximates the actual posterior distribution of the parameter vector by a mixture of complete data posteriors. Their method of constructing the complete data sets is closely related to the Gibbs sampler. Both required iterations, and, similar to the EM algorithm, convergence can be slow. We introduce in this article an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights. In many applications this new procedure works very well without the need for iterations. Sensitivity analysis, influence analysis, and updating with new data can be performed cheaply. Bayesian prediction and model selection can also be incorporated. Examples taken from a wide range of applications are used for illustration.

1,166 citations


Journal ArticleDOI
TL;DR: The paper describes the space-alternating generalized EM (SAGE) method, which updates the parameters sequentially by alternating between several small hidden-data spaces defined by the algorithm designer, and proves that the sequence of estimates monotonically increases the penalized-likelihood objective, derive asymptotic convergence rates, and provide sufficient conditions for monotone convergence in norm.
Abstract: The expectation-maximization (EM) method can facilitate maximizing likelihood functions that arise in statistical estimation problems. In the classical EM paradigm, one iteratively maximizes the conditional log-likelihood of a single unobservable complete data space, rather than maximizing the intractable likelihood function for the measured or incomplete data. EM algorithms update all parameters simultaneously, which has two drawbacks: 1) slow convergence, and 2) difficult maximization steps due to coupling when smoothness penalties are used. The paper describes the space-alternating generalized EM (SAGE) method, which updates the parameters sequentially by alternating between several small hidden-data spaces defined by the algorithm designer. The authors prove that the sequence of estimates monotonically increases the penalized-likelihood objective, derive asymptotic convergence rates, and provide sufficient conditions for monotone convergence in norm. Two signal processing applications illustrate the method: estimation of superimposed signals in Gaussian noise, and image reconstruction from Poisson measurements. In both applications, the SAGE algorithms easily accommodate smoothness penalties and converge faster than the EM algorithms. >

1,083 citations


Journal ArticleDOI
TL;DR: In this paper, Gibbs sampling is used to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. And the data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters.
Abstract: SUMMARY A formal Bayesian analysis of a mixture model usually leads to intractable calculations, since the posterior distribution takes into account all the partitions of the sample. We present approximation methods which evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. The data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters. The fully conditional Gibbs alternative is shown to be ergodic and geometric convergence is established in the normal case. We also consider non-informative approximations associated with improper priors, assuming that the sample corresponds exactly to a k-component mixture.

895 citations


Journal ArticleDOI
TL;DR: In this article, a multidisciplinary review of empirical, statistical learning from a graphical model perspective is presented, including decomposition, differentiation, and manipulation of probability models from the exponential family.
Abstract: This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks frorn data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.

617 citations


Journal ArticleDOI
TL;DR: ECME as discussed by the authors is a generalization of the ECM algorithm, which is itself an extension of the EM algorithm (Dempster, Laird & Rubin, 1977), which can be obtained by replacing some CM-steps of ECM, which maximise the constrained expected complete-data loglikelihood function, with steps that maximize the correspondingly constrained actual likelihood function.
Abstract: A generalisation of the ECM algorithm (Meng & Rubin, 1993), which is itselfan extension of the EM algorithm (Dempster, Laird & Rubin, 1977), can be obtained by replacing some CM-steps of ECM, which maximise the constrained expected complete-data loglikelihood function, with steps that maximise the correspondingly constrained actual likelihood function. This algorithm, which we call ECME algorithm, for Expectation /Conditional Maximisation Either, shares with both EM and ECM their stable monotone convergence and basic simplicity of implementation relative to competing faster converging methods. Moreover, ECME can have a substantially faster convergence rate than either EM or ECM, measured using either the number of iterations or actual computer time

604 citations


Journal ArticleDOI
TL;DR: In this article, a new class of pattern-mixture models for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables is described.
Abstract: SUMMARY Likelihood-based methods are developed for analyzing a random sample on two continuous variables when values of one of the variables are missing. Normal maximum likelihood estimates when values are missing completely at random were derived by Anderson (1957). They are also maximum likelihood providing the missing-data mechanism is ignorable, in Rubin's (1976) sense that the mechanism depends only on observed data. A new class of pattern-mixture models (Little, 1993) is described for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables. Maximum likelihood for models in this class is straightforward, and yields the estimates of Anderson (1957) when missingness depends solely on the completely observed variable, and the estimates of Brown (1990) when missingness depends solely on the incompletely observed variable. Another choice of linear combination yields estimates from complete-case analysis. Large-sample and Bayesian methods are described for this model. The data do not supply information about the ratio of the coefficients of the linear combination that controls missingness. If this ratio is not welldetermined based on prior knowledge, a prior distribution can be specified, and Bayesian inference is then readily accomplished. Alternatively, sensitivity of inferences can be displayed for a variety of choices of the ratio.

395 citations


Journal ArticleDOI
TL;DR: The theory of expectation-maximization can be used as a basis for calculation of objective figures of merit for image quality over a wide range of conditions in emission tomography.
Abstract: The expectation-maximization (EM) algorithm is an important tool for maximum-likelihood (ML) estimation and image reconstruction, especially in medical imaging. It is a non-linear iterative algorithm that attempts to find the ML estimate of the object that produced a data set. The convergence of the algorithm and other deterministic properties are well established, but relatively little is known about how noise in the data influences noise in the final reconstructed image. In this paper we present a detailed treatment of these statistical properties. The specific application we have in mind is image reconstruction in emission tomography, but the results are valid for any application of the EM algorithm in which the data set can be described by Poisson statistics. We show that the probability density function for the grey level at a pixel in the image is well approximated by a log-normal law. An expression is derived for the variance of the grey level and for pixel-to-pixel covariance. The variance increases rapidly with iteration number at first, but eventually saturates as the ML estimate is approached. Moreover, the variance at any iteration number has a factor proportional to the square of the mean image (though other factors may also depend on the mean image), so a map of the standard deviation resembles the object itself. Thus low-intensity regions of the image tend to have low noise. By contrast, linear reconstruction methods, such as filtered back-projection in tomography, show a much more global noise pattern, with high-intensity regions of the object contributing to noise at rather distant low-intensity regions. The theoretical results of this paper depend on two approximations, but in the second paper in this series we demonstrate through Monte Carlo simulation that the approximations are justified over a wide range of conditions in emission tomography. The theory can, therefore, be used as a basis for calculation of objective figures of merit for image quality.

388 citations


Proceedings Article
01 Jan 1994
TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.
Abstract: We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.

344 citations


Journal ArticleDOI
TL;DR: The Logic of Maximum Likelihood (LML) as discussed by the authors is a general modeling framework using maximum likelihood methods for estimating the probability of a given model with a given set of parameters.
Abstract: Introduction The Logic of Maximum Likelihood A General Modeling Framework Using Maximum Likelihood Methods An Introduction to Basic Estimation Techniques Further Empirical Examples Additional Likelihoods Conclusions

Journal ArticleDOI
TL;DR: An iterative method for joint channel parameter estimation and symbol selection via the Baum-Welch algorithm, or equivalently the Expectation-Maximization (EM) algorithm, which can easily give optimum decisions on information symbols which minimize the symbol error probability.
Abstract: We present an iterative method for joint channel parameter estimation and symbol selection via the Baum-Welch algorithm, or equivalently the Expectation-Maximization (EM) algorithm. Channel parameters, including noise variance, are estimated using a maximum likelihood criterion. The Markovian properties of the channel state sequence enable us to calculate the required likelihood using a forward-backward algorithm. The calculated likelihood functions can easily give optimum decisions on information symbols which minimize the symbol error probability. The proposed receiver can be used for both linear and nonlinear channels. It improves the system throughput by making saving in the transmission of known symbols, usually employed for channel identification. Simulation results which show fast convergence are presented. >

Journal ArticleDOI
TL;DR: In this paper, the authors consider the case where branch probabilities are products of nonnegative integer powers in the parameters, and their complements, 1 - θs, and show that the EM algorithm necessarily converges to a local maximum.
Abstract: Multinomial processing tree models assume that an observed behavior category can arise from one or more processing sequences represented as branches in a tree. These models form a subclass of parametric, multinomial models, and they provide a substantively motivated alternative to loglinear models. We consider the usual case where branch probabilities are products of nonnegative integer powers in the parameters, 0≤θs≤1, and their complements, 1 - θs. A version of the EM algorithm is constructed that has very strong properties. First, the E-step and the M-step are both analytic and computationally easy; therefore, a fast PC program can be constructed for obtaining MLEs for large numbers of parameters. Second, a closed form expression for the observed Fisher information matrix is obtained for the entire class. Third, it is proved that the algorithm necessarily converges to a local maximum, and this is a stronger result than for the exponential family as a whole. Fourth, we show how the algorithm can handle quite general hypothesis tests concerning restrictions on the model parameters. Fifth, we extend the algorithm to handle the Read and Cressie power divergence family of goodness-of-fit statistics. The paper includes an example to illustrate some of these results.

Proceedings Article
01 Jan 1994
TL;DR: An alternative model for mixtures of experts which uses a different parametric form for the gating network, trained by the EM algorithm, and which yields faster convergence.
Abstract: We propose an alternative model for mixtures of experts which uses a different parametric form for the gating network. The modified model is trained by the EM algorithm. In comparison with earlier models--trained by either EM or gradient ascent--there is no need to select a learning stepsize. We report simulation experiments which show that the new architecture yields faster convergence. We also apply the new model to two problem domains: piecewise nonlinear function approximation and the combination of multiple previously trained classifiers.

ReportDOI
01 Dec 1994
TL;DR: A set of algorithms are described that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner that make two distinct appeals to the Expectation-Maximization principle.
Abstract: Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation

Journal ArticleDOI
TL;DR: In this article, the authors consider a class of probit-normal models for binary data and describe ML and REML estimation of variance components for that class as well as best prediction for the realized values of the random effects.
Abstract: We consider a class of probit-normal models for binary data and describe ML and REML estimation of variance components for that class as well as best prediction for the realized values of the random effects. ML estimates are calculated using an EM algorithm; for complicated models EM includes a Gibbs step. The computations are illustrated through two examples.

Journal ArticleDOI
TL;DR: Two solutions are proposed to solve the problem of model parameter estimation from incomplete data: a Monte Carlo scheme and a scheme related to Besag's (1986) iterated conditional mode (ICM) method, both of which make use of Markov random-field modeling assumptions.
Abstract: An unsupervised stochastic model-based approach to image segmentation is described, and some of its properties investigated. In this approach, the problem of model parameter estimation is formulated as a problem of parameter estimation from incomplete data, and the expectation-maximization (EM) algorithm is used to determine a maximum-likelihood (ML) estimate. Previously, the use of the EM algorithm in this application has encountered difficulties since an analytical expression for the conditional expectations required in the EM procedure is generally unavailable, except for the simplest models. In this paper, two solutions are proposed to solve this problem: a Monte Carlo scheme and a scheme related to Besag's (1986) iterated conditional mode (ICM) method. Both schemes make use of Markov random-field modeling assumptions. Examples are provided to illustrate the implementation of the EM algorithm for several general classes of image models. Experimental results on both synthetic and real images are provided. >

Journal ArticleDOI
TL;DR: In this paper, the authors apply standard convex optimization techniques to the analysis of interval censored data and provide easily verifiable conditions for the selfconsistent estimator proposed by Turnbull (1976) to be a maximum likelihood estimator and for checking whether the maximum likelihood estimate is unique.
Abstract: SUMMARY Standard convex optimization techniques are applied to the analysis of interval censored data. These methods provide easily verifiable conditions for the self-consistent estimator proposed by Turnbull (1976) to be a maximum likelihood estimator and for checking whether the maximum likelihood estimate is unique. A sufficient condition is given for the almost sure convergence of the maximum likelihood estimator to the true underlying distribution function.

Journal ArticleDOI
TL;DR: In this article, the authors generalized the EM algorithm to the ECM algorithm, a more flexible and applicable iterative algorithm proposed recently by Meng and Rubin, and showed that intuitions accurate for complete-data iterative algorithms may not be trustworthy in the presence of missing data.
Abstract: The fundamental result on the rate of convergence of the EM algorithm has proven to be theoretically valuable and practically useful. Here, this result is generalized to the ECM algorithm, a more flexible and applicable iterative algorithm proposed recently by Meng and Rubin. Results on the rate of convergence of variations of ECM are also presented. An example is given to show that intuitions accurate for complete-data iterative algorithms may not be trustworthy in the presence of missing data.

Journal ArticleDOI
TL;DR: In this article, it was shown that the EM algorithm converges to the unique solution (F) of the self-consistency equations; the consistency of every iterate of every iteration of the EM a...
Abstract: A ranked set sample consists entirely of independently distributed order statistics and can occur naturally in many experimental settings, including problems in reliability. When each ranked set from which an order statistic is drawn is of the same size, and when the statistic of each fixed order is sampled the same number of times, the ranked set sample is said to be balanced. Stokes and Sager have shown that the edf F n of a balanced ranked set sample from the cdf F is an unbiased estimator of F and is more precise than the edf of a simple random sample of the same size. The nonparametric maximum likelihood estimator (MLE) F of F is studied in this article. Its existence and uniqueness is demonstrated, and a general numerical procedure is presented and is shown to converge to F. If the ranked set sample is balanced, it is shown that the EM algorithm, with F n as a seed, converges to the unique solution (F) of the problem's self-consistency equations; the consistency of every iterate of the EM a...

Journal ArticleDOI
TL;DR: In this paper, a method is proposed for statistically evaluating the accuracy levels of maximum likelihood classifications and representing them graphically based on the concept that the heterogeneity of the membership probabilities can be taken as an indicator of the confidence for the classification, such a parameter is estimated for all pixels as relative probability entropy and represented in a separate channel.
Abstract: A method is proposed for statistically evaluating the accuracy levels of maximum likelihood classifications and representing them graphically. Based on the concept that the heterogeneity of maximum likelihood membership probabilities can be taken as an indicator of the confidence for the classification, such a parameter is estimated for all pixels as relative probability entropy and represented in a separate channel. After a brief presentation of the statistical basis of the methodology, this is applied to a conventional and two modified maximum likelihood classifications in a case study using Landsat TM scenes. The results demonstrate the efficiency of the approach and, particularly, its usefulness for operational applications.

Journal ArticleDOI
TL;DR: There are strong relationships between approaches to optmization and learning based on statistical physics or mixtures of experts, and the EM algorithm can be interpreted as converging either to a local maximum of the mixtures model or to a saddle point solution to the statistical physics system.
Abstract: We show that there are strong relationships between approaches to optmization and learning based on statistical physics or mixtures of experts. In particular, the EM algorithm can be interpreted as converging either to a local maximum of the mixtures model or to a saddle point solution to the statistical physics system. An advantage of the statistical physics approach is that it naturally gives rise to a heuristic continuation method, deterministic annealing, for finding good solutions.

Journal ArticleDOI
TL;DR: Evaluations using both Monte Carlo simulations and phantom studies on the Siemens 953B scanner suggest that the expectation-maximization algorithm yields unbiased images with significantly lower variances than filtered-backprojection when the images are reconstructed to the intrinsic resolution.
Abstract: The expectation-maximization (EM) algorithm for computing maximum-likelihood estimates of transmission images in positron-emission tomography (PET) (see K. Lange and R. Carson, J. Comput. Assist. Tomogr., vol.8, no.2, p.306-16, 1984) is extended to include measurement error, accidental coincidences and Compton scatter. A method for accomplishing the maximization step using one step of Newton's method is proposed. The algorithm is regularized with the method of sieves. Evaluations using both Monte Carlo simulations and phantom studies on the Siemens 953B scanner suggest that the algorithm yields unbiased images with significantly lower variances than filtered-backprojection when the images are reconstructed to the intrinsic resolution. Large features in the images converge in under 200 iterations while the smallest features required up to 2,000 iterations. All but the smallest features in typical transmission scans converge in approximately 250 iterations. The initial implementation of the algorithm requires 50 sec per iteration on a DECStation 5000. >

Journal ArticleDOI
TL;DR: In this paper, it was shown that the existence and uniqueness properties of maximum likelihood estimates of the location vector and scatter matrix for a multivariate t-distribution in p dimensions with v≥1 degrees of freedom can be identified with the maximum likelihood estimate for a scatter-only estimation problem from a (p+1)-dimensional multivariate the tdistribution with v−1>0 degrees offreedom.
Abstract: It is shown that maximum likelihood estimates of the location vector and scatter matrix for a multivariate t-distribution in p dimensions with v≥1 degrees of freedom. can be identified with the maximum likelihood estimates for a scatter-only estimation problem from a (p+1)-dimensional multivariate the t-distribution with v−1>0 degrees of freedom. The t-distribution is only distribution for which this dual formulation is possible. Since the existence and uniqueness properties of maximum likelihood estimates are straightforward to prove for general scatter-only problems. we are able to immediately deduce existence and uniqueness results for the trickier location-scatter problem in the special case of the t-distribution. Each of these two formulations gives rise to an EM algorithm to maximize the likelihood. though the two algorithms are slightly different. The limiting Cauchy case v=1 requires some special treatment.

Journal ArticleDOI
TL;DR: This article presents a general description on how and when the componentwise rates differ, as well as their relationships with the global rate, and provides an example, a standard contaminated normal model, to show that such phenomena are not necessarily pathological, but can occur in useful statistical models.

Proceedings Article
01 Jan 1994
TL;DR: This work presents a deterministic annealing variant of the EM algorithm for maximum likelihood parameter estimation problems, reformulated as the problem of minimizing the thermodynamic free energy by using the principle of maximum entropy and statistical mechanics analogy.
Abstract: We present a deterministic annealing variant of the EM algorithm for maximum likelihood parameter estimation problems. In our approach, the EM process is reformulated as the problem of minimizing the thermodynamic free energy by using the principle of maximum entropy and statistical mechanics analogy. Unlike simulated annealing approaches, this minimization is deterministically performed. Moreover, the derived algorithm, unlike the conventional EM algorithm, can obtain better estimates free of the initial parameter values.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a Bayesian method that uses smoothing constants to adjust the pseudo-observed cell frequencies so that the solution is not on the boundary, which produces boundary estimates for the expected cell frequencies of the nonrespondents.
Abstract: When categorical outcomes are subject to nonignorable nonresponse, log-linear models may be used to adjust for the nonresponse. The models are fitted to the data in an augmented frequency table in which one index corresponds to whether or not the subject is a respondent. The likelihood function is maximized over pseudo-observed cell frequencies with respect to this log-linear model using an EM algorithm. Each E step of the EM algorithm determines the pseudo-observed cell frequencies, and the M step yields the maximum likelihood estimators (MLE's) of these pseudo-observed cell frequencies. This approach may produce boundary estimates for the expected cell frequencies of the nonrespondents. In these cases the estimators of the log-linear model parameters are not uniquely determined and may be unstable. Following the approach of Clogg et al., we propose a Bayesian method that uses smoothing constants to adjust the pseudo-observed cell frequencies so that the solution is not on the boundary. The role...

Journal ArticleDOI
TL;DR: Brian Williams and Chris Dye argue that the method of maximum likelihood is generally preferable to least squares giving the best estimates of the parameters for data with any given error distribution, and the calculations are no more difficult than for least-squares fitting.

Journal ArticleDOI
TL;DR: In this article, the asymptotic distribution of the restricted maximum likelihood estimator of the variance components in a general mixed model is explored, and central limit theorems are obtained using elementary arguments with only mild conditions on the covariates in the fixed part of the model and without having to assume that the data are either normally or spherically symmetrically distributed.
Abstract: Summary This paper explores the asymptotic distribution of the restricted maximum likelihood estimator of the variance components in a general mixed model. Restricting attention to hierarchical models, central limit theorems are obtained using elementary arguments with only mild conditions on the covariates in the fixed part of the model and without having to assume that the data are either normally or spherically symmetrically distributed. Further, the REML and maximum likelihood estimators are shown to be asymptotically equivalent in this general framework, and the asymptotic distribution of the weighted least squares estimator (based on the REML estimator) of the fixed effect parameters is derived.

Journal ArticleDOI
TL;DR: The present paper addresses the problem of finding a maximum likelihood estimator (MLE) of an origin-destination matrix and of journey time statistics--when passage time data are available, in the multiple origin/destination case.
Abstract: Previous techniques for analysing partial registration plate data are firstly reviewed. These generally fall into one of two broad categories: statistically based methods for single origin, single destination problems; and simple-minded, deterministic approaches using vehicle passage time data ( i.e. , the times at which vehicles pass the observation points), for surveys with multiple origins and destinations. The present paper addresses the problem of finding a maximum likelihood estimator (MLE) of an origin-destination matrix and of journey time statistics—when passage time data are available, in the multiple origin/destination case. These estimators possess the well-known large sample properties of asymptotic unbiasedness, normality, and efficiency. The proposed approach also has the advantage over the deterministic methods (used in most existing registration plate matching packages) of simultaneously analysing all possible matches between all origins and destinations, rather than considering them in some arbitrary, priority order. Since the MLEs cannot be obtained analytically, alternative numerical techniques for determining them are evaluated, with respect to their convergence properties and computational efficiency. The most appropriate of these (based on a general-purpose statistical algorithm for “missing data” problems) is described in greater detail, including issues relevant to its computer implementation. Selected results from a more comprehensive simulation study are used to illustrate the performance of the maximum likelihood approach. In the (limited) results reported, the MLEs are seen to have considerably smaller mean square errors than the deterministic methods mentioned above, but are only marginally superior to the estimators produced by an efficient heuristic technique proposed previously by the author. Further empirical work would, however, be required to establish that the patterns observed in these simulations are examples of more general phenomena. Finally, possible extensions to the method and future research directions are discussed.