scispace - formally typeset
Search or ask a question

Showing papers on "Maximum a posteriori estimation published in 2011"


Journal ArticleDOI
TL;DR: A new supervised Bayesian approach to hyperspectral image segmentation with active learning, which consists of a multinomial logistic regression model to learn the class posterior probability distributions and a new active sampling approach, called modified breaking ties, which is able to provide an unbiased sampling.
Abstract: This paper introduces a new supervised Bayesian approach to hyperspectral image segmentation with active learning, which consists of two main steps. First, we use a multinomial logistic regression (MLR) model to learn the class posterior probability distributions. This is done by using a recently introduced logistic regression via splitting and augmented Lagrangian algorithm. Second, we use the information acquired in the previous step to segment the hyperspectral image using a multilevel logistic prior that encodes the spatial information. In order to reduce the cost of acquiring large training sets, active learning is performed based on the MLR posterior probabilities. Another contribution of this paper is the introduction of a new active sampling approach, called modified breaking ties, which is able to provide an unbiased sampling. Furthermore, we have implemented our proposed method in an efficient way. For instance, in order to obtain the time-consuming maximum a posteriori segmentation, we use the α-expansion min-cut-based integer optimization algorithm. The state-of-the-art performance of the proposed approach is illustrated using both simulated and real hyperspectral data sets in a number of experimental comparisons with recently introduced hyperspectral image analysis methods.

414 citations


Journal ArticleDOI
TL;DR: In coefficient space, the analysis reveals that Type II is exactly equivalent to performing standard MAP estimation using a particular class of dictionary- and noise-dependent, nonfactorial coefficient priors.
Abstract: Many practical methods for finding maximally sparse coefficient expansions involve solving a regression problem using a particular class of concave penalty functions. From a Bayesian perspective, this process is equivalent to maximum a posteriori (MAP) estimation using a sparsity-inducing prior distribution (Type I estimation). Using variational techniques, this distribution can always be conveniently expressed as a maximization over scaled Gaussian distributions modulated by a set of latent variables. Alternative Bayesian algorithms, which operate in latent variable space leveraging this variational representation, lead to sparse estimators reflecting posterior information beyond the mode (Type II estimation). Currently, it is unclear how the underlying cost functions of Type I and Type II relate, nor what relevant theoretical properties exist, especially with regard to Type II. Herein a common set of auxiliary functions is used to conveniently express both Type I and Type II cost functions in either coefficient or latent variable space facilitating direct comparisons. In coefficient space, the analysis reveals that Type II is exactly equivalent to performing standard MAP estimation using a particular class of dictionary- and noise-dependent, nonfactorial coefficient priors. One prior (at least) from this class maintains several desirable advantages over all possible Type I methods and utilizes a novel, nonconvex approximation to the l0 norm with most, and in certain quantifiable conditions all, local minima smoothed away. Importantly, the global minimum is always left unaltered unlike standard l1-norm relaxations. This ensures that any appropriate descent method is guaranteed to locate the maximally sparse solution.

299 citations


Journal ArticleDOI
TL;DR: The present paper exploits fruitfully a priori information to improve performance of multiuser detectors based on a sparse symbol vector with entries drawn from a finite alphabet that is augmented by the zero symbol to capture user inactivity.
Abstract: The number of active users in code-division multiple access (CDMA) systems is often much lower than the spreading gain. The present paper exploits fruitfully this a priori information to improve performance of multiuser detectors. A low-activity factor manifests itself in a sparse symbol vector with entries drawn from a finite alphabet that is augmented by the zero symbol to capture user inactivity. The non-equiprobable symbols of the augmented alphabet motivate a sparsity-exploiting maximum a posteriori probability (S-MAP) criterion, which is shown to yield a cost comprising the l2 least-squares error penalized by the p-th norm of the wanted symbol vector (p = 0, 1, 2). Related optimization problems appear in variable selection (shrinkage) schemes developed for linear regression, as well as in the emerging field of compressive sampling (CS). The contribution of this work to such sparse CDMA systems is a gamut of sparsity-exploiting multiuser detectors trading off performance for complexity requirements. From the vantage point of CS and the least-absolute shrinkage selection operator (Lasso) spectrum of applications, the contribution amounts to sparsity-exploiting algorithms when the entries of the wanted signal vector adhere to finite-alphabet constraints.

280 citations


Book ChapterDOI
08 Mar 2011
TL;DR: In this paper, the success of variational expectation maximization (vEM) in simple probabilistic time series models is investigated, and it is shown that simpler variational approximations (such as mean-field) can lead to less bias than more complicated structured approximate.
Abstract: Introduction Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in latent variables, unlike maximum a posteriori methods, and yet generally requiring less computational time than Markov chain Monte Carlo methods. In particular the variational expectation maximisation (vEM) and variational Bayes algorithms, both involving variational optimisation of a free-energy, are widely used in time series modelling. Here, we investigate the success of vEM in simple probabilistic time series models. First we consider the inference step of vEM, and show that a consequence of the well-known compactness property of variational inference is a failure to propagate uncertainty in time, thus limiting the usefulness of the retained distributional information. In particular, the uncertainty may appear to be smallest precisely when the approximation is poorest. Second, we consider parameter learning and analytically reveal systematic biases in the parameters found by vEM. Surprisingly, simpler variational approximations (such as mean-field) can lead to less bias than more complicated structured approximations. The variational approach We begin this chapter with a brief theoretical review of the variational expectation maximisation algorithm, before illustrating the important concepts with a simple example in the next section. The vEM algorithm is an approximate version of the expectation maximisation (EM) algorithm [4]. Expectation maximisation is a standard approach to finding maximum likelihood (ML) parameters for latent variable models, including hidden Markov models and linear or non-linear state space models (SSMs) for time series.

217 citations


Journal ArticleDOI
TL;DR: Several decoding methods based on point-process neural encoding models, or forward models that predict spike responses to stimuli, are developed, which allow efficient maximum-likelihood model fitting and stimulus decoding.
Abstract: One of the central problems in systems neuroscience is to understand how neural spike trains convey sensory information. Decoding methods, which provide an explicit means for reading out the information contained in neural spike responses, offer a powerful set of tools for studying the neural coding problem. Here we develop several decoding methods based on point-process neural encoding models, or forward models that predict spike responses to stimuli. These models have concave log-likelihood functions, which allow efficient maximum-likelihood model fitting and stimulus decoding. We present several applications of the encoding model framework to the problem of decoding stimulus information from population spike responses: (1) a tractable algorithm for computing the maximum a posteriori (MAP) estimate of the stimulus, the most probable stimulus to have generated an observed single-or multiple-neuron spike train response, given some prior distribution over the stimulus; (2) a gaussian approximation to the posterior stimulus distribution that can be used to quantify the fidelity with which various stimulus features are encoded; (3) an efficient method for estimating the mutual information between the stimulus and the spike trains emitted by a neural population; and (4) a framework for the detection of change-point times (the time at which the stimulus undergoes a change in mean or variance) by marginalizing over the posterior stimulus distribution. We provide several examples illustrating the performance of these estimators with simulated and real neural data.

162 citations


Posted Content
TL;DR: In this article, a Bayesian model based on automatic relevance determination is proposed, in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior.
Abstract: This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the \beta-divergence. The \beta-divergence is a family of cost functions that includes the squared Euclidean distance, Kullback-Leibler and Itakura-Saito divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example and a stock price prediction task.

158 citations


Proceedings ArticleDOI
06 Dec 2011
TL;DR: This work proposes a 'learning-based' approach, WiGEM, where the received signal strength is modeled as a Gaussian Mixture Model (GMM) where Expectation Maximization (EM) is used to learn the maximum likelihood estimates of the model parameters.
Abstract: We consider the problem of localizing a wireless client in an indoor environment based on the signal strength of its transmitted packets as received on stationary sniffers or access points. Several state-of-the-art indoor localization techniques have the drawback that they rely extensively on a labor-intensive 'training' phase that does not scale well. Use of unmodeled hardware with heterogeneous power levels further reduces the accuracy of these techniques.We propose a 'learning-based' approach, WiGEM, where the received signal strength is modeled as a Gaussian Mixture Model (GMM). Expectation Maximization (EM) is used to learn the maximum likelihood estimates of the model parameters. This approach enables us to localize a transmitting device based on the maximum a posteriori estimate. The key insight is to use the physics of wireless propagation, and exploit the signal strength constraints that exist for different transmit power levels. The learning approach not only avoids the labor-intensive training, but also makes the location estimates considerably robust in the face of heterogeneity and various time varying phenomena. We present evaluations on two different indoor testbeds with multiple WiFi devices. We demonstrate that WiGEM's accuracy is at par with or better than state-of-the-art techniques but without requiring any training.

139 citations


Journal ArticleDOI
TL;DR: The experiments that were performed on a bitemporal TerraSAR-X StripMap data set from South West England during and after a large-scale flooding in 2007 confirm the effectiveness of the proposed change detection method and show an increased classification accuracy of the hybrid MRF model in comparison to the sole application of the HMAP estimation.
Abstract: The near real-time provision of precise information about flood dynamics from synthetic aperture radar (SAR) data is an essential task in disaster management. A novel tile-based parametric thresholding approach under the generalized Gaussian assumption is applied on normalized change index data to automatically solve the three-class change detection problem in large-size images with small class a priori probabilities. The thresholding result is used for the initialization of a hybrid Markov model which integrates scale-dependent and spatiocontextual information into the labeling process by combining hierarchical with noncausal Markov image modeling. Hierarchical maximum a posteriori (HMAP) estimation using the Markov chains in scale, originally developed on quadtrees, is adapted to hierarchical irregular graphs. To reduce the computational effort of the iterative optimization process that is related to noncausal Markov models, a Markov random field (MRF) approach is defined, which is applied on a restricted region of the lowest level of the graph, selected according to the HMAP labeling result. The experiments that were performed on a bitemporal TerraSAR-X StripMap data set from South West England during and after a large-scale flooding in 2007 confirm the effectiveness of the proposed change detection method and show an increased classification accuracy of the hybrid MRF model in comparison to the sole application of the HMAP estimation. Additionally, the impact of the graph structure and the chosen model parameters on the labeling result as well as on the performance is discussed.

136 citations


Journal Article
TL;DR: This paper illustrates the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence and demonstrates that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters.
Abstract: This paper considers the robust and efficient implementation of Gaussian process regression with a Student-t observation model, which has a non-log-concave likelihood. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. Expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of EP is known to be problematic with models containing non-log-concave site functions. In this paper we illustrate the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that standard EP may not converge in the MAP values with some difficult data sets. We present a robust implementation which relies primarily on parallel EP updates and uses a moment-matching-based double-loop algorithm with adaptively selected step size in difficult cases. The predictive performance of EP is compared with Laplace, variational Bayes, and Markov chain Monte Carlo approximations.

127 citations


Journal ArticleDOI
TL;DR: It is shown that for any prior PX, the minimum mean-square error (MMSE) estimator is the solution of a penalized least square problem with some penalty φ, which can be interpreted as the MAP estimator with the prior C·exp(-φ(x
Abstract: Penalized least squares regression is often used for signal denoising and inverse problems, and is commonly interpreted in a Bayesian framework as a Maximum a posteriori (MAP) estimator, the penalty function being the negative logarithm of the prior. For example, the widely used quadratic program (with an l1 penalty) associated to the LASSO/basis pursuit denoising is very often considered as MAP estimation under a Laplacian prior in the context of additive white Gaussian noise (AWGN) reduction. This paper highlights the fact that, while this is one possible Bayesian interpretation, there can be other equally acceptable Bayesian interpretations. Therefore, solving a penalized least squares regression problem with penalty φ(x) need not be interpreted as assuming a prior C·exp(-φ(x)) and using the MAP estimator. In particular, it is shown that for any prior PX, the minimum mean-square error (MMSE) estimator is the solution of a penalized least square problem with some penalty φ(x) , which can be interpreted as the MAP estimator with the prior C·exp(-φ(x)). Vice versa, for certain penalties φ(x), the solution of the penalized least squares problem is indeed the MMSE estimator, with a certain prior PX . In general dPX(x) ≠ C·exp(-φ(x))dx.

125 citations


Posted Content
TL;DR: In this paper, a robust and efficient implementation of Gaussian process regression with a Student-t observation model is considered. But the convergence of the expectation propagation (EP) is known to be problematic with models containing non-log-concave site functions such as the student-t distribution.
Abstract: This paper considers the robust and efficient implementation of Gaussian process regression with a Student-t observation model. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. The expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of the EP is known to be problematic with models containing non-log-concave site functions such as the Student-t distribution. In this paper we illustrate the situations where the standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that the standard EP may not converge in the MAP values in some difficult cases. We present a robust implementation which relies primarily on parallel EP updates and utilizes a moment-matching-based double-loop algorithm with adaptively selected step size in difficult cases. The predictive performance of the EP is compared to the Laplace, variational Bayes, and Markov chain Monte Carlo approximations.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work presents an approach to add true fine-scale spatio-temporal shape detail to dynamic scene geometry captured from multi-view video footage and uses weak temporal priors on lighting, albedo and geometry which improve reconstruction quality yet allow for temporal variations in the data.
Abstract: We present an approach to add true fine-scale spatio-temporal shape detail to dynamic scene geometry captured from multi-view video footage. Our approach exploits shading information to recover the millimeter-scale surface structure, but in contrast to related approaches succeeds under general unconstrained lighting conditions. Our method starts off from a set of multi-view video frames and an initial series of reconstructed coarse 3D meshes that lack any surface detail. In a spatio-temporal maximum a posteriori probability (MAP) inference framework, our approach first estimates the incident illumination and the spatially-varying albedo map on the mesh surface for every time instant. Thereafter, albedo and illumination are used to estimate the true geometric detail visible in the images and add it to the coarse reconstructions. The MAP framework uses weak temporal priors on lighting, albedo and geometry which improve reconstruction quality yet allow for temporal variations in the data.

Journal ArticleDOI
Gang Xu1, Mengdao Xing1, Lei Zhang1, Yabo Liu1, Yachao Li1 
TL;DR: A novel algorithm of inverse synthetic aperture radar (ISAR) imaging based on Bayesian estimation is proposed, wherein the ISAR imaging joint with phase adjustment is mathematically transferred into signal reconstruction via maximum a posteriori estimation.
Abstract: In this letter, a novel algorithm of inverse synthetic aperture radar (ISAR) imaging based on Bayesian estimation is proposed, wherein the ISAR imaging joint with phase adjustment is mathematically transferred into signal reconstruction via maximum a posteriori estimation. In the scheme, phase errors are treated as model errors and are overcome in the sparsity-driven optimization regardless of the formats, while data-driven estimation of the statistical parameters for both noise and target is developed, which guarantees the high precision of image generation. Meanwhile, the fast Fourier transform is utilized to implement the solution to image formation, promoting its efficiency effectively. Due to the high denoising capability of the proposed algorithm, high-quality image also could be achieved even under strong noise. The experimental results using simulated and measured data confirm the validity.

Proceedings Article
12 Dec 2011
TL;DR: This work studies the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document's topic distribution is integrated out, and shows that, when the effective number of topics per document is small, exact inference takes polynomial time, and that this problem is NP-hard.
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document's topic distribution is integrated out. We show that, when the effective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question.

Journal ArticleDOI
Chris Hans1
TL;DR: In this article, the elastic net procedure is viewed as a Bayesian posterior mode under a prior distribution implied by the form of elastic net penalty, which is a form of regularized optimization for linear regression that provides a bridge between ridge regression and the lasso.
Abstract: The elastic net procedure is a form of regularized optimization for linear regression that provides a bridge between ridge regression and the lasso. The estimate that it produces can be viewed as a Bayesian posterior mode under a prior distribution implied by the form of the elastic net penalty. This article broadens the scope of the Bayesian connection by providing a complete characterization of a class of prior distributions that generate the elastic net estimate as the posterior mode. The resulting model-based framework allows for methodology that moves beyond exclusive use of the posterior mode by considering inference based on the full posterior distribution. Two characterizations of the class of prior distributions are introduced: a properly normalized, direct characterization, which is shown to be conjugate for linear regression models, and an alternate representation as a scale mixture of normal distributions. Prior distributions are proposed for the regularization parameters, resulting in an infi...

Journal ArticleDOI
TL;DR: The authors formulate background subtraction as minimizing a penalized instantaneous risk functional-yielding a local online discriminative algorithm that can quickly adapt to temporal changes and develop an implementation that can run efficiently on the highly parallel graphics processing unit (GPU).
Abstract: The authors examine the problem of segmenting foreground objects in live video when background scene textures change over time. In particular, we formulate background subtraction as minimizing a penalized instantaneous risk functional-yielding a local online discriminative algorithm that can quickly adapt to temporal changes. We analyze the algorithm's convergence, discuss its robustness to nonstationarity, and provide an efficient nonlinear extension via sparse kernels. To accommodate interactions among neighboring pixels, a global algorithm is then derived that explicitly distinguishes objects versus background using maximum a posteriori inference in a Markov random field (implemented via graph-cuts). By exploiting the parallel nature of the proposed algorithms, we develop an implementation that can run efficiently on the highly parallel graphics processing unit (GPU). Empirical studies on a wide variety of datasets demonstrate that the proposed approach achieves quality that is comparable to state-of-the-art offline methods, while still being suitable for real-time video analysis (≥ 75nfps on a mid-range GPU).

Journal ArticleDOI
TL;DR: The goal of this paper is to perform a segmentation of atherosclerotic plaques in view of evaluating their burden and to provide boundaries for computing properties such as the plaque deformation and elasticity distribution (elastogram and modulogram).
Abstract: The goal of this paper is to perform a segmentation of atherosclerotic plaques in view of evaluating their burden and to provide boundaries for computing properties such as the plaque deformation and elasticity distribution (elastogram and modulogram). The echogenicity of a region of interest comprising the plaque, the vessel lumen, and the adventitia of the artery wall in an ultrasonic B-mode image was modeled by mixtures of three Nakagami distributions, which yielded the likelihood of a Bayesian segmentation model. The main contribution of this paper is the estimation of the motion field and its integration into the prior of the Bayesian model that included a local geometrical smoothness constraint, as well as an original spatiotemporal cohesion constraint. The Maximum A Posteriori of the proposed model was computed with a variant of the exploration/selection algorithm. The starting point is a manual segmentation of the first frame. The proposed method was quantitatively compared with manual segmentations of all frames by an expert technician. Various measures were used for this evaluation, including the mean point-to-point distance and the Hausdorff distance. Results were evaluated on 94 sequences of 33 patients (for a total of 8988 images). We report a mean point-to-point distance of 0.24 ± 0.08 mm and a Hausdorff distance of 1.24 ± 0.40 mm. Our tests showed that the algorithm was not sensitive to the degree of stenosis or calcification.

Journal ArticleDOI
TL;DR: Novel fixed-lag and fixed-interval smoothing algorithms that are robust to outliers simultaneously present in the measurements and in the state dynamics and which rely on coordinate descent and the alternating direction method of multipliers, are developed.
Abstract: Coping with outliers contaminating dynamical processes is of major importance in various applications because mismatches from nominal models are not uncommon in practice. In this context, the present paper develops novel fixed-lag and fixed-interval smoothing algorithms that are robust to outliers simultaneously present in the measurements and in the state dynamics. Outliers are handled through auxiliary unknown variables that are jointly estimated along with the state based on the least-squares criterion that is regularized with the l1-norm of the outliers in order to effect sparsity control. The resultant iterative estimators rely on coordinate descent and the alternating direction method of multipliers, are expressed in closed form per iteration, and are provably convergent. Additional attractive features of the novel doubly robust smoother include: i) ability to handle both types of outliers; ii) universality to unknown nominal noise and outlier distributions; iii) flexibility to encompass maximum a posteriori optimal estimators with reliable performance under nominal conditions; and iv) improved pCoping with outliers contaminating dynamical processes is of major importance in various applications because mismatches from nominal models are not uncommon in practice. In this context, the present paper develops novel fixed-lag and fixed-interval smoothing algorithms that are robust to outliers simultaneously present in the measurements and in the state dynamics. Outliers are handled through auxiliary unknown variables that are jointly estimated along with the state based on the least-squares criterion that is regularized with the l1-norm of the outliers in order to effect sparsity control. The resultant iterative estimators rely on coordinate descent and the alternating direction method of multipliers, are expressed in closed form per iteration, and are provably convergent. Additional attractive features of the novel doubly robust smoother include: i) ability to handle both types of outliers; ii) universality to unknown nominal noise and outlier distributions; iii) flexibility to encompass maximum a posteriori optimal estimators with reliable performance under nominal conditions; and iv) improved performance relative to competing alternatives at comparable complexity, as corroborated via simulated tests.erformance relative to competing alternatives at comparable complexity, as corroborated via simulated tests.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work proposes a method for vector field learning with outliers, called vector field consensus (VFC), which could distinguish inliers from outliers and learn a vector field fitting for the inLiers simultaneously, and it is very robust to outliers.
Abstract: We propose a method for vector field learning with outliers, called vector field consensus (VFC). It could distinguish inliers from outliers and learn a vector field fitting for the inliers simultaneously. A prior is taken to force the smoothness of the field, which is based on the Tiknonov regularization in vector-valued reproducing kernel Hilbert space. Under a Bayesian framework, we associate each sample with a latent variable which indicates whether it is an inlier, and then formulate the problem as maximum a posteriori problem and use Expectation Maximization algorithm to solve it. The proposed method possesses two characteristics: 1) robust to outliers, and being able to tolerate 90% outliers and even more, 2) computationally efficient. As an application, we apply VFC to solve the problem of mismatch removing. The results demonstrate that our method outperforms many state-of-the-art methods, and it is very robust.

Journal ArticleDOI
TL;DR: This paper investigates the problem of fusion of remote sensing images, e.g., multispectral image fusion, based on MRf models and incorporates the contextual constraints via MRF models into the fusion model and develops fusion algorithms under the maximum a posteriori criterion.
Abstract: Markov random field (MRF) models are powerful tools to model image characteristics accurately and have been successfully applied to a large number of image processing applications. This paper investigates the problem of fusion of remote sensing images, e.g., multispectral image fusion, based on MRF models and incorporates the contextual constraints via MRF models into the fusion model. Fusion algorithms under the maximum a posteriori criterion are developed to search for solutions. Our algorithm is applicable to both multiscale decomposition (MD)-based image fusion and non-MD-based image fusion. Experimental results are provided to demonstrate the improvement of fusion performance by our algorithms.

Journal ArticleDOI
TL;DR: An algorithm for finding the maximum a posteriori (MAP) estimate of the Kalman smoother for a nonlinear model with Gaussian process noise and ℓ1 -Laplace observation noise using the convex composite extension of the Gauss-Newton method.
Abstract: Robustness is a major problem in Kalman filtering and smoothing that can be solved using heavy tailed distributions; e.g., l1-Laplace. This paper describes an algorithm for finding the maximum a posteriori (MAP) estimate of the Kalman smoother for a nonlinear model with Gaussian process noise and l1 -Laplace observation noise. The algorithm uses the convex composite extension of the Gauss-Newton method. This yields convex programming subproblems to which an interior point path-following method is applied. The number of arithmetic operations required by the algorithm grows linearly with the number of time points because the algorithm preserves the underlying block tridiagonal structure of the Kalman smoother problem. Excellent fits are obtained with and without outliers, even though the outliers are simulated from distributions that are not l1 -Laplace. It is also tested on actual data with a nonlinear measurement model for an underwater tracking experiment. The l1-Laplace smoother is able to construct a smoothed fit, without data removal, from data with very large outliers.

Journal ArticleDOI
TL;DR: The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression coefficients, which have been given independent, double exponential prior distributions, is adopted, and a family of hyper‐Lasso penalty functions are provided, which includes the quasi‐Cauchy distribution of Johnstone and Silverman as a special case.
Abstract: Summary The Lasso has sparked interest in the use of penalization of the log-likelihood for variable selection, as well as for shrinkage. We are particularly interested in the more-variables-than-observations case of characteristic importance for modern data. The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression coefficients, which have been given independent, double exponential prior distributions, is adopted. Generalizing this prior provides a family of hyper-Lasso penalty functions, which includes the quasi-Cauchy distribution of Johnstone and Silverman as a special case. The properties of this approach, including the oracle property, are explored, and an EM algorithm for inference in regression problems is described. The posterior is multi-modal, and we suggest a strategy of using a set of perfectly fitting random starting values to explore modes in different regions of the parameter space. Simulations show that our procedure provides significant improvements on a range of established procedures, and we provide an example from chemometrics.

Journal ArticleDOI
09 Mar 2011-Sensors
TL;DR: A novel class of self-organizing sensing agents that adaptively learn an anisotropic, spatio-temporal Gaussian process using noisy measurements and move in order to improve the quality of the estimated covariance function is presented.
Abstract: This paper presents a novel class of self-organizing sensing agents that adaptively learn an anisotropic, spatio-temporal Gaussian process using noisy measurements and move in order to improve the quality of the estimated covariance function. This approach is based on a class of anisotropic covariance functions of Gaussian processes introduced to model a broad range of spatio-temporal physical phenomena. The covariance function is assumed to be unknown a priori. Hence, it is estimated by the maximum a posteriori probability (MAP) estimator. The prediction of the field of interest is then obtained based on the MAP estimate of the covariance function. An optimal sampling strategy is proposed to minimize the information-theoretic cost function of the Fisher Information Matrix. Simulation results demonstrate the effectiveness and the adaptability of the proposed scheme.

Journal ArticleDOI
TL;DR: This paper proposes a superimposed training strategy that allows the destination node to separately obtain the channel information of the source→relay link and the relay→destination link, and derives the Cramér-Rao bound for the random channel parameters.
Abstract: In this paper, we consider the channel estimation for the classical three-node relay networks that employ the amplify-and-forward (AF) transmission scheme and the orthogonal frequency division multiplexing (OFDM) modulation. We propose a superimposed training strategy that allows the destination node to separately obtain the channel information of the source→relay link and the relay→destination link. Specifically, the relay superimposes its own training signal over the received one before forwarding it to the destination. The proposed training strategy can be implemented within two transmission phases and is thus compatible with the two-phase data transmission scheme, i.e., the training can be embedded into data transmission. We also derive the Cramer-Rao bound for the random channel parameters, from which we compute the optimal training sequence as well as the optimal power allocation. Since the optimal minimum mean square error (MMSE) estimator and the maximum a posteriori (MAP) estimator cannot be expressed in closed-form, we propose to first obtain the initial channel estimates from the low complexity linear estimators, e.g., linear minimum mean-square error (LMMSE) and least square (LS) estimators, and then resort to the iterative method to improve the estimation accuracy. Simulation results are provided to corroborate the proposed studies.

Journal ArticleDOI
TL;DR: The model properties and reliability measures are derived and studied in detail and maximum likelihood and Bayes approaches are used for estimation and coverage probability for the parameter.

Journal ArticleDOI
TL;DR: This work compares several Markov chain Monte Carlo (MCMC) algorithms that allow for the calculation of general Bayesian estimators involving posterior expectations (conditional on model parameters) and addresses the application of MCMC methods for extracting nonmarginal properties of the posterior distribution.
Abstract: Stimulus reconstruction or decoding methods provide an important tool for understanding how sensory and motor information is represented in neural activity. We discuss Bayesian decoding methods based on an encoding generalized linear model (GLM) that accurately describes how stimuli are transformed into the spike trains of a group of neurons. The form of the GLM likelihood ensures that the posterior distribution over the stimuli that caused an observed set of spike trains is log concave so long as the prior is. This allows the maximum a posteriori (MAP) stimulus estimate to be obtained using efficient optimization algorithms. Unfortunately, the MAP estimate can have a relatively large average error when the posterior is highly nongaussian. Here we compare several Markov chain Monte Carlo (MCMC) algorithms that allow for the calculation of general Bayesian estimators involving posterior expectations (conditional on model parameters). An efficient version of the hybrid Monte Carlo (HMC) algorithm was significantly superior to other MCMC methods for gaussian priors. When the prior distribution has sharp edges and corners, on the other hand, the “hit-and-run” algorithm performed better than other MCMC methods. Using these algorithms, we show that for this latter class of priors, the posterior mean estimate can have a considerably lower average error than MAP, whereas for gaussian priors, the two estimators have roughly equal efficiency. We also address the application of MCMC methods for extracting nonmarginal properties of the posterior distribution. For example, by using MCMC to calculate the mutual information between the stimulus and response, we verify the validity of a computationally efficient Laplace approximation to this quantity for gaussian priors in a wide range of model parameters; this makes direct model-based computation of the mutual information tractable even in the case of large observed neural populations, where methods based on binning the spike train fail. Finally, we consider the effect of uncertainty in the GLM parameters on the posterior estimators.

Book ChapterDOI
05 Sep 2011
TL;DR: The algorithm, based on the alternating direction method of multipliers (ADMM), is guaranteed to converge to the global optimum of the LP relaxation objective and is competitive with other state-of-the-art algorithms for approximate MAP estimation.
Abstract: Maximum a-posteriori (MAP) estimation is an important task in many applications of probabilistic graphical models. Although finding an exact solution is generally intractable, approximations based on linear programming (LP) relaxation often provide good approximate solutions. In this paper we present an algorithm for solving the LP relaxation optimization problem. In order to overcome the lack of strict convexity, we apply an augmented Lagrangian method to the dual LP. The algorithm, based on the alternating direction method of multipliers (ADMM), is guaranteed to converge to the global optimum of the LP relaxation objective. Our experimental results show that this algorithm is competitive with other state-of-the-art algorithms for approximate MAP estimation.

Journal ArticleDOI
TL;DR: A unified framework which uses a generative model of the imaging process and can address spatial super-resolution, space-time super- resolution, image deconvolution, single-image expansion, removal of noise, and image restoration is presented.
Abstract: We address the problem of super-resolution-obtaining high-resolution images and videos from multiple low-resolution inputs. The increased resolution can be in spatial or temporal dimensions, or even in both. We present a unified framework which uses a generative model of the imaging process and can address spatial super-resolution, space-time super-resolution, image deconvolution, single-image expansion, removal of noise, and image restoration. We model a high-resolution image or video as a Markov random field and use maximum a posteriori estimate as the final solution using graph-cut optimization technique. We derive insights into what super-resolution magnification factors are possible and the conditions necessary for super-resolution. We demonstrate spatial super-resolution reconstruction results with magnifications higher than predicted limits of magnification. We also formulate a scheme for selective super-resolution reconstruction of videos to obtain simultaneous increase of resolutions in both spatial and temporal directions. We show that it is possible to achieve space-time magnification factors beyond what has been suggested in the literature by selectively applying super-resolution constraints. We present results on both synthetic and real input sequences.

Journal ArticleDOI
TL;DR: A data-driven approach to a priori SNR estimation is presented, which reduces speech distortion, particularly in speech onset, while retaining a high level of noise attenuation in speech absence.
Abstract: The a priori signal-to-noise ratio (SNR) plays an important role in many speech enhancement algorithms. In this paper, we present a data-driven approach to a priori SNR estimation. It may be used with a wide range of speech enhancement techniques, such as, e.g., the minimum mean square error (MMSE) (log) spectral amplitude estimator, the super Gaussian joint maximum a posteriori (JMAP) estimator, or the Wiener filter. The proposed SNR estimator employs two trained artificial neural networks, one for speech presence, one for speech absence. The classical decision-directed a priori SNR estimator by Ephraim and Malah is broken down into its two additive components, which now represent the two input signals to the neural networks. Both output nodes are combined to represent the new a priori SNR estimate. As an alternative to the neural networks, also simple lookup tables are investigated. Employment of these data-driven nonlinear a priori SNR estimators reduces speech distortion, particularly in speech onset, while retaining a high level of noise attenuation in speech absence.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This investigation was unable to find evidence of a significant performance increase attributed to the introduction of spatial and consistency constraints, and found that similar levels of performance can be achieved using a much simpler design that essentially ignores these constraints.
Abstract: Many state-of-the-art segmentation algorithms rely on Markov or Conditional Random Field models designed to enforce spatial and global consistency constraints. This is often accomplished by introducing additional latent variables to the model, which can greatly increase its complexity. As a result, estimating the model parameters or computing the best maximum a posteriori (MAP) assignment becomes a computationally expensive task. In a series of experiments on the PASCAL and the MSRC datasets, we were unable to find evidence of a significant performance increase attributed to the introduction of such constraints. On the contrary, we found that similar levels of performance can be achieved using a much simpler design that essentially ignores these constraints. This more simple approach makes use of the same local and global features to leverage evidence from the image, but instead directly biases the preferences of individual pixels. While our investigation does not prove that spatial and consistency constraints are not useful in principle, it points to the conclusion that they should be validated in a larger context.