scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 2007"


01 Jan 2007
TL;DR: A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior.
Abstract: MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. A web page with related links including license information can be found at http://www.stat.washington.edu/mclust.

494 citations


Journal ArticleDOI
TL;DR: A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.
Abstract: Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation's worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC.

434 citations


Proceedings Article
22 Jul 2007
TL;DR: This paper proposes a novel transfer-learning algorithm for text classification based on an EM-based Naive Bayes classifiers and shows that the algorithm outperforms the traditional supervised and semi-supervised learning algorithms when the distributions of the training and test sets are increasingly different.
Abstract: A basic assumption in traditional machine learning is that the training and test data distributions should be identical. This assumption may not hold in many situations in practice, but we may be forced to rely on a different-distribution data to learn a prediction model. For example, this may be the case when it is expensive to label the data in a domain of interest, although in a related but different domain there may be plenty of labeled data available. In this paper, we propose a novel transfer-learning algorithm for text classification based on an EM-based Naive Bayes classifiers. Our solution is to first estimate the initial probabilities under a distribution Dl of one labeled data set, and then use an EM algorithm to revise the model for a different distribution Du of the test data which are unlabeled. We show that our algorithm is very effective in several different pairs of domains, where the distances between the different distributions are measured using the Kullback-Leibler (KL) divergence. Moreover, KL-divergence is used to decide the trade-off parameters in our algorithm. In the experiment, our algorithm outperforms the traditional supervised and semi-supervised learning algorithms when the distributions of the training and test sets are increasingly different.

392 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present several classes of semiparametric regression models, which extend the existing models in important directions, and construct appropriate likelihood functions involving both finite dimensional and infinite dimensional parameters.
Abstract: Summary. Semiparametric regression models play a central role in formulating the effects of covariates on potentially censored failure times and in the joint modelling of incomplete repeated measures and failure times in longitudinal studies. The presence of infinite dimensional parameters poses considerable theoretical and computational challenges in the statistical analysis of such models. We present several classes of semiparametric regression models, which extend the existing models in important directions. We construct appropriate likelihood functions involving both finite dimensional and infinite dimensional parameters. The maximum likelihood estimators are consistent and asymptotically normal with efficient variances. We develop simple and stable numerical techniques to implement the corresponding inference procedures. Extensive simulation experiments demonstrate that the inferential and computational methods proposed perform well in practical settings. Applications to three medical studies yield important new insights. We conclude that there is no reason, theoretical or numerical, not to use maximum likelihood estimation for semiparametric regression models.We discuss several areas that need further research.

314 citations


Journal Article
TL;DR: A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.
Abstract: Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

307 citations


Journal ArticleDOI
Coşkun Kuş1
TL;DR: The EM algorithm is used to determine the maximum likelihood estimates and the asymptotic variances and covariance of these estimates are obtained and the convergence of the proposed EM scheme is investigated.

297 citations


Journal ArticleDOI
TL;DR: A rigorous Bayesian framework is proposed for which it is proved asymptotic consistency of the maximum a posteriori estimate and which leads to an effective iterative estimation algorithm of the geometric and photometric parameters in the small sample setting.
Abstract: Summary. The problem of estimating probabilistic deformable template models in the field of computer vision or of probabilistic atlases in the field of computational anatomy has not yet received a coherent statistical formulation and remains a challenge. We provide a careful definition and analysis of a well-defined statistical model based on dense deformable templates for grey level images of deformable objects. We propose a rigorous Bayesian framework for which we prove asymptotic consistency of the maximum a posteriori estimate and which leads to an effective iterative estimation algorithm of the geometric and photometric parameters in the small sample setting. The model is extended to mixtures of finite numbers of such components leading to a fine description of the photometric and geometric variations of an object class. We illustrate some of the ideas with images of handwritten digits and apply the estimated models to classification through maximum likelihood.

261 citations


Book ChapterDOI
06 Sep 2007
TL;DR: A new hill climbing procedure for Gaussian kernels, which adjusts the step size automatically at no extra costs is introduced and it is proved that the procedure converges exactly towards a local maximum by reducing it to a special case of the expectation maximization algorithm.
Abstract: The Denclue algorithm employs a cluster model based on kernel density estimation A cluster is defined by a local maximum of the estimated density function Data points are assigned to clusters by hill climbing, ie points going to the same local maximum are put into the same cluster A disadvantage of Denclue 10 is, that the used hill climbing may make unnecessary small steps in the beginning and never converges exactly to the maximum, it just comes close We introduce a new hill climbing procedure for Gaussian kernels, which adjusts the step size automatically at no extra costs We prove that the procedure converges exactly towards a local maximum by reducing it to a special case of the expectation maximization algorithm We show experimentally that the new procedure needs much less iterations and can be accelerated by sampling based methods with sacrificing only a small amount of accuracy

242 citations


Journal ArticleDOI
TL;DR: A novel parametric and global image histogram thresholding method based on the estimation of the statistical parameters of ''object'' and ''background'' classes by the expectation-maximization (EM) algorithm, under the assumption that these two classes follow a generalized Gaussian (GG) distribution.

238 citations


Journal Article
TL;DR: In this paper, the problem of analyzing a mixture of skew nor-mal distributions from the likelihood-based and Bayesian perspectives is addressed, and a fully Bayesian approach using the Markov chain Monte Carlo method is developed to carry out posterior analyses.
Abstract: Normal mixture models provide the most popular framework for mod- elling heterogeneity in a population with continuous outcomes arising in a variety of subclasses. In the last two decades, the skew normal distribution has been shown beneficial in dealing with asymmetric data in various theoretic and applied prob- lems. In this article, we address the problem of analyzing a mixture of skew nor- mal distributions from the likelihood-based and Bayesian perspectives, respectively. Computational techniques using EM-type algorithms are employed for iteratively computing maximum likelihood estimates. Also, a fully Bayesian approach using the Markov chain Monte Carlo method is developed to carry out posterior analyses. Numerical results are illustrated through two examples.

205 citations


Journal ArticleDOI
TL;DR: It is shown that, when the kernel is Gaussian, mean-shift is an expectation-maximization (EM) algorithm and, whenThe kernel is non-Gaussian,mean- shift is a generalized EM algorithm and that, in general, its convergence is of linear order.
Abstract: The mean-shift algorithm, based on ideas proposed by Fukunaga and Hosteller, is a hill-climbing algorithm on the density defined by a finite mixture or a kernel density estimate Mean-shift can be used as a nonparametric clustering method and has attracted recent attention in computer vision applications such as image segmentation or tracking We show that, when the kernel is Gaussian, mean-shift is an expectation-maximization (EM) algorithm and, when the kernel is non-Gaussian, mean-shift is a generalized EM algorithm This implies that mean-shift converges from almost any starting point and that, in general, its convergence is of linear order For Gaussian mean-shift, we show: 1) the rate of linear convergence approaches 0 (superlinear convergence) for very narrow or very wide kernels, but is often close to 1 (thus, extremely slow) for intermediate widths and exactly 1 (sublinear convergence) for widths at which modes merge, 2) the iterates approach the mode along the local principal component of the data points from the inside of the convex hull of the data points, and 3) the convergence domains are nonconvex and can be disconnected and show fractal behavior We suggest ways of accelerating mean-shift based on the EM interpretation

Journal Article
TL;DR: It is shown that, given data from a mixture of k well-separated spherical Gaussians in ℜd, a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussian to near-optimal precision, if the dimension is high.
Abstract: We show that, given data from a mixture of k well-separated spherical Gaussians in ℜd, a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to near-optimal precision, if the dimension is high (d >> ln k). We relate this to previous theoretical and empirical work on the EM algorithm.

Posted Content
TL;DR: Two new algorithms for solving problems with at least a thousand nodes in the Gaussian case are presented, based on Nesterov's first order method, which yields a complexity estimate with a better dependence on problem size than existing interior point methods.
Abstract: We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse Our approach is to solve a maximum likelihood problem with an added l_1-norm penalty term The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case Our first algorithm uses block coordinate descent, and can be interpreted as recursive l_1-norm penalized regression Our second algorithm, based on Nesterov's first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods Using a log determinant relaxation of the log partition function (Wainwright & Jordan (2006)), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case We test our algorithms on synthetic data, as well as on gene expression and senate voting records data

Journal ArticleDOI
TL;DR: This article proposes a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings and presents analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates.
Abstract: A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.

Proceedings Article
03 Dec 2007
TL;DR: This paper presents an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm, and shows that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models.
Abstract: The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models.

Journal ArticleDOI
TL;DR: This work examines two natural bivariate von Mises distributions--referred to as Sine and Cosine models--which have five parameters and, for concentrated data, tend to a bivariate normal distribution, and sees that the Cosine model may be preferred.
Abstract: Summary A fundamental problem in bioinformatics is to characterize the secondary structure of a protein, which has traditionally been carried out by examining a scatterplot (Ramachandran plot) of the conformational angles. We examine two natural bivariate von Mises distributions—referred to as Sine and Cosine models—which have five parameters and, for concentrated data, tend to a bivariate normal distribution. These are analyzed and their main properties derived. Conditions on the parameters are established which result in bimodal behavior for the joint density and the marginal distribution, and we note an interesting situation in which the joint density is bimodal but the marginal distributions are unimodal. We carry out comparisons of the two models, and it is seen that the Cosine model may be preferred. Mixture distributions of the Cosine model are fitted to two representative protein datasets using the expectation maximization algorithm, which results in an objective partition of the scatterplot into a number of components. Our results are consistent with empirical observations; new insights are discussed.

Journal ArticleDOI
TL;DR: In this article, a generic online version of the Expectation-Maximization (EM) algorithm is proposed for latent variable models of independent observations, which is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete data distribution.
Abstract: In this contribution, we propose a generic online (also sometimes called adaptive or recursive) version of the Expectation-Maximisation (EM) algorithm applicable to latent variable models of independent observations. Compared to the algorithm of Titterington (1984), this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback-Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e., that of the maximum likelihood estimator. In addition, the proposed approach is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model.

01 Jan 2007
TL;DR: A systematic exploration of expectation maximization methods based both on the Lanczos algorithm and power iteration for recommenders based solely on low rank approximations of the rating matrix.
Abstract: We compare recommenders based solely on low rank approximations of the rating matrix. The key difficulty lies in the sparseness of the known ratings within the matrix that cause expactation maximization algorithms converge very slow. Among the prior publicly known attempts for this problem a gradient boosting approach proved most successful in spite of the fact that the resulting vectors are nonorthogonal and prone to numeric errors. We systematically explore expectation maximization methods based both on the Lanczos algorithm and power iteration; novel in this paper is the efficient handling of the dense estimate matrix used as input to a next iteration. We also compare sequence transformation methods to speed up convergence.

Journal ArticleDOI
TL;DR: A maximum likelihood/expectation maximization maximization tomographic reconstruction algorithm designed for the technique which exploits the multiple Coulomb scattering of muon particles to perform nondestructive inspection without the use of artificial radiation.
Abstract: Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation In prior work , we have described heuristic methods for processing muon data to create reconstructed images In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model

Journal ArticleDOI
TL;DR: A new family of smoothness priors for the label probabilities in spatially variant mixture models with Gauss-Markov random field-based priors is proposed, which allow all their parameters to be estimated in closed form via the maximum a posteriori (MAP) estimation using the expectation-maximization methodology.
Abstract: We propose a new approach for image segmentation based on a hierarchical and spatially variant mixture model. According to this model, the pixel labels are random variables and a smoothness prior is imposed on them. The main novelty of this work is a new family of smoothness priors for the label probabilities in spatially variant mixture models. These Gauss-Markov random field-based priors allow all their parameters to be estimated in closed form via the maximum a posteriori (MAP) estimation using the expectation-maximization methodology. Thus, it is possible to introduce priors with multiple parameters that adapt to different aspects of the data. Numerical experiments are presented where the proposed MAP algorithms were tested in various image segmentation scenarios. These experiments demonstrate that the proposed segmentation scheme compares favorably to both standard and previous spatially constrained mixture model-based segmentation

Journal ArticleDOI
TL;DR: An EM-based algorithm is developed for the fitting of mixtures of t-factor analyzers and its application is demonstrated in the clustering of some microarray gene-expression data.

Journal ArticleDOI
TL;DR: A multivariate point-process model in which the observed activity of a network of neurons depends on three terms: the experimentally-controlled stimulus; the spiking history of the observed neurons; and a hidden term that corresponds, for example, to common input from an unobserved population of neurons that is presynaptic to two or more cells in the observed population.
Abstract: Recent developments in multi-electrode recordings enable the simultaneous measurement of the spiking activity of many neurons. Analysis of such multineuronal data is one of the key challenge in computational neuroscience today. In this work, we develop a multivariate point-process model in which the observed activity of a network of neurons depends on three terms: (1) the experimentally-controlled stimulus; (2) the spiking history of the observed neurons; and (3) a hidden term that corresponds, for example, to common input from an unobserved population of neurons that is presynaptic to two or more cells in the observed population. We consider two models for the network firing-rates, one of which is computationally and analytically tractable but can lead to unrealistically high firing-rates, while the other with reasonable firing-rates imposes a greater computational burden. We develop an expectation-maximization algorithm for fitting the parameters of both the models. For the analytically tractable model the expectation step is based on a continuous-time implementation of the extended Kalman smoother, and the maximization step involves two concave maximization problems which may be solved in parallel. The other model that we consider necessitates the use of Monte Carlo methods for the expectation as well as maximization step. We discuss the trade-off involved in choosing between the two models and the associated methods. The techniques developed allow us to solve a variety of inference problems in a straightforward, computationally efficient fashion; for example, we may use the model to predict network activity given an arbitrary stimulus, infer a neuron's ring rate given the stimulus and the activity of the other observed neurons, and perform optimal stimulus decoding and prediction. We present several detailed simulation studies which explore the strengths and limitations of our approach.

Journal ArticleDOI
TL;DR: In this article, a particle filter approach for approximating the first-order moment of a joint, or probability hypothesis density (PHD), has demonstrated a feasible suboptimal method for tracking a time-varying number of targets in real-time.
Abstract: Particle filter approaches for approximating the first-order moment of a joint, or probability hypothesis density (PHD), have demonstrated a feasible suboptimal method for tracking a time-varying number of targets in real-time. We consider two techniques for estimating the target states at each iteration, namely k-means clustering and mixture modelling via the expectation-maximization (EM) algorithm. We present novel techniques for associating the targets between frames to enable track continuity.

Journal Article
TL;DR: A comparison of the two techniques for missing data imputation using datasets of an industrial power plant, an industrial winding process and HIV sero-prevalence survey data shows that the EM algorithm is more suitable and performs better in cases where there is little or no interdependency between the input variables.
Abstract: Two techniques have emerged from the recent litera-ture as candidate solutions to the problem of missing data imputation These are the expectation maximiza-tion (EM) algorithm and the auto-associative neural network and genetic algorithm (GA) combination Both these techniques have been discussed individually and their merits discussed at length in the available literature However, they have not been compared with each other This article provides a comparison of the two techniques using datasets of an industrial power plant, an industrial winding process and HIV sero-prevalence survey data Results show that the EM al-gorithm is more suitable and performs better in cases where there is little or no interdependency between the input variables, whereas the auto-associative neural network and GA combination is suitable when there are inherent nonlinear relationships between some of the given variables Keywords:

Proceedings ArticleDOI
20 Jun 2007
TL;DR: Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model and demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.
Abstract: Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.

Journal ArticleDOI
TL;DR: A new approach is proposed for estimating 3D head pose from a monocular image that employs general prior knowledge of face structure and the corresponding geometrical constraints provided by the location of a certain vanishing point to determine the pose of human faces.

Journal ArticleDOI
TL;DR: A novel spatially constrained generative model and an expectation-maximization (EM) algorithm for model-based image segmentation that achieves competitive segmentation results compared to other Markov-based methods and is in general faster.
Abstract: In this paper, we present a novel spatially constrained generative model and an expectation-maximization (EM) algorithm for model-based image segmentation. The generative model assumes that the unobserved class labels of neighboring pixels in the image are generated by prior distributions with similar parameters, where similarity is defined by entropic quantities relating to the neighboring priors. In order to estimate model parameters from observations, we derive a spatially constrained EM algorithm that iteratively maximizes a lower bound on the data log-likelihood, where the penalty term is data-dependent. Our algorithm is very easy to implement and is similar to the standard EM algorithm for Gaussian mixtures with the main difference that the labels posteriors are "smoothed" over pixels between each E- and M-step by a standard image filter. Experiments on synthetic and real images show that our algorithm achieves competitive segmentation results compared to other Markov-based methods, and is in general faster

Journal ArticleDOI
TL;DR: A novel method for Bayesian denoising of magnetic resonance (MR) images that bootstraps itself by inferring the prior, i.e., the uncorrupted-image statistics, from the corrupted input data and the knowledge of the Rician noise model is presented.
Abstract: This paper presents a novel method for Bayesian denoising of magnetic resonance (MR) images that bootstraps itself by inferring the prior, i.e., the uncorrupted-image statistics, from the corrupted input data and the knowledge of the Rician noise model. The proposed method relies on principles from empirical Bayes (EB) estimation. It models the prior in a nonparametric Markov random field (MRF) framework and estimates this prior by optimizing an information-theoretic metric using the expectation-maximization algorithm. The generality and power of nonparametric modeling, coupled with the EB approach for prior estimation, avoids imposing ill-fitting prior models for denoising. The results demonstrate that, unlike typical denoising methods, the proposed method preserves most of the important features in brain MR images. Furthermore, this paper presents a novel Bayesian-inference algorithm on MRFs, namely iterated conditional entropy reduction (ICER). This paper also extends the application of the proposed method for denoising diffusion-weighted MR images. Validation results and quantitative comparisons with the state of the art in MR-image denoising clearly depict the advantages of the proposed method.

Journal ArticleDOI
TL;DR: In this article, the authors examined finite mixtures of multivariate Poisson distributions as an alternative class of models for multivariate count data, allowing for both overdispersion in the marginal distributions and negative correlation, while they are computationally tractable using standard ideas from finite mixture modelling.

Journal ArticleDOI
TL;DR: A systematic probabilistic framework that leads to both optimal and near-optimal OFDM detection schemes in the presence of unknown PHN is presented and it is pointed out that the expectation-maximization algorithm is a special case of the variational-inference-based joint estimator.
Abstract: This paper studies the mitigation of phase noise (PHN) in orthogonal frequency-division multiplexing (OFDM) data detection. We present a systematic probabilistic framework that leads to both optimal and near-optimal OFDM detection schemes in the presence of unknown PHN. In contrast to the conventional approach that cancels the common (average) PHN, our aim is to jointly estimate the complete PHN sequence and the data symbol sequence. We derive a family of low-complexity OFDM detectors for this purpose. The theoretical foundation on which these detectors are based is called variational inference, an approximate probabilistic inference technique associated with the minimization of variational free energy. In deriving the proposed schemes, we also point out that the expectation-maximization algorithm is a special case of the variational-inference-based joint estimator. Further complexity reduction is obtained using the conjugate gradient (CG) method, and only a few CG iterations are needed to closely approach the ideal joint estimator output