Showing papers on "Maximum a posteriori estimation published in 2011"

PDF

Open Access

Journal Article•DOI•

Hyperspectral Image Segmentation Using a New Bayesian Approach With Active Learning

[...]

Jun Li¹, Jose M. Bioucas-Dias, Antonio Plaza¹•Institutions (1)

12 May 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A new supervised Bayesian approach to hyperspectral image segmentation with active learning, which consists of a multinomial logistic regression model to learn the class posterior probability distributions and a new active sampling approach, called modified breaking ties, which is able to provide an unbiased sampling.

...read moreread less

Abstract: This paper introduces a new supervised Bayesian approach to hyperspectral image segmentation with active learning, which consists of two main steps. First, we use a multinomial logistic regression (MLR) model to learn the class posterior probability distributions. This is done by using a recently introduced logistic regression via splitting and augmented Lagrangian algorithm. Second, we use the information acquired in the previous step to segment the hyperspectral image using a multilevel logistic prior that encodes the spatial information. In order to reduce the cost of acquiring large training sets, active learning is performed based on the MLR posterior probabilities. Another contribution of this paper is the introduction of a new active sampling approach, called modified breaking ties, which is able to provide an unbiased sampling. Furthermore, we have implemented our proposed method in an efficient way. For instance, in order to obtain the time-consuming maximum a posteriori segmentation, we use the α-expansion min-cut-based integer optimization algorithm. The state-of-the-art performance of the proposed approach is illustrated using both simulated and real hyperspectral data sets in a number of experimental comparisons with recently introduced hyperspectral image analysis methods.

...read moreread less

414 citations

Journal Article•DOI•

Latent Variable Bayesian Models for Promoting Sparsity

[...]

David Wipf¹, Bhaskar D. Rao², Srikantan S. Nagarajan³•Institutions (3)

Microsoft¹, University of California, San Diego², University of California, San Francisco³

01 Sep 2011-IEEE Transactions on Information Theory

TL;DR: In coefficient space, the analysis reveals that Type II is exactly equivalent to performing standard MAP estimation using a particular class of dictionary- and noise-dependent, nonfactorial coefficient priors.

...read moreread less

Abstract: Many practical methods for finding maximally sparse coefficient expansions involve solving a regression problem using a particular class of concave penalty functions. From a Bayesian perspective, this process is equivalent to maximum a posteriori (MAP) estimation using a sparsity-inducing prior distribution (Type I estimation). Using variational techniques, this distribution can always be conveniently expressed as a maximization over scaled Gaussian distributions modulated by a set of latent variables. Alternative Bayesian algorithms, which operate in latent variable space leveraging this variational representation, lead to sparse estimators reflecting posterior information beyond the mode (Type II estimation). Currently, it is unclear how the underlying cost functions of Type I and Type II relate, nor what relevant theoretical properties exist, especially with regard to Type II. Herein a common set of auxiliary functions is used to conveniently express both Type I and Type II cost functions in either coefficient or latent variable space facilitating direct comparisons. In coefficient space, the analysis reveals that Type II is exactly equivalent to performing standard MAP estimation using a particular class of dictionary- and noise-dependent, nonfactorial coefficient priors. One prior (at least) from this class maintains several desirable advantages over all possible Type I methods and utilizes a novel, nonconvex approximation to the l0 norm with most, and in certain quantifiable conditions all, local minima smoothed away. Importantly, the global minimum is always left unaltered unlike standard l1-norm relaxations. This ensures that any appropriate descent method is guaranteed to locate the maximally sparse solution.

...read moreread less

299 citations

Journal Article•DOI•

Exploiting Sparse User Activity in Multiuser Detection

[...]

Hao Zhu¹, Georgios B. Giannakis¹•Institutions (1)

University of Minnesota¹

01 Feb 2011-IEEE Transactions on Communications

TL;DR: The present paper exploits fruitfully a priori information to improve performance of multiuser detectors based on a sparse symbol vector with entries drawn from a finite alphabet that is augmented by the zero symbol to capture user inactivity.

...read moreread less

Abstract: The number of active users in code-division multiple access (CDMA) systems is often much lower than the spreading gain. The present paper exploits fruitfully this a priori information to improve performance of multiuser detectors. A low-activity factor manifests itself in a sparse symbol vector with entries drawn from a finite alphabet that is augmented by the zero symbol to capture user inactivity. The non-equiprobable symbols of the augmented alphabet motivate a sparsity-exploiting maximum a posteriori probability (S-MAP) criterion, which is shown to yield a cost comprising the l2 least-squares error penalized by the p-th norm of the wanted symbol vector (p = 0, 1, 2). Related optimization problems appear in variable selection (shrinkage) schemes developed for linear regression, as well as in the emerging field of compressive sampling (CS). The contribution of this work to such sparse CDMA systems is a gamut of sparsity-exploiting multiuser detectors trading off performance for complexity requirements. From the vantage point of CS and the least-absolute shrinkage selection operator (Lasso) spectrum of applications, the contribution amounts to sparsity-exploiting algorithms when the entries of the wanted signal vector adhere to finite-alphabet constraints.

...read moreread less

280 citations

Book Chapter•DOI•

Two problems with variational expectation maximisation for time-series models

[...]

Richard E. Turner, Maneesh Sahani¹•Institutions (1)

University College London¹

08 Mar 2011

TL;DR: In this paper, the success of variational expectation maximization (vEM) in simple probabilistic time series models is investigated, and it is shown that simpler variational approximations (such as mean-field) can lead to less bias than more complicated structured approximate.

...read moreread less

Abstract: Introduction Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in latent variables, unlike maximum a posteriori methods, and yet generally requiring less computational time than Markov chain Monte Carlo methods. In particular the variational expectation maximisation (vEM) and variational Bayes algorithms, both involving variational optimisation of a free-energy, are widely used in time series modelling. Here, we investigate the success of vEM in simple probabilistic time series models. First we consider the inference step of vEM, and show that a consequence of the well-known compactness property of variational inference is a failure to propagate uncertainty in time, thus limiting the usefulness of the retained distributional information. In particular, the uncertainty may appear to be smallest precisely when the approximation is poorest. Second, we consider parameter learning and analytically reveal systematic biases in the parameters found by vEM. Surprisingly, simpler variational approximations (such as mean-field) can lead to less bias than more complicated structured approximations. The variational approach We begin this chapter with a brief theoretical review of the variational expectation maximisation algorithm, before illustrating the important concepts with a simple example in the next section. The vEM algorithm is an approximate version of the expectation maximisation (EM) algorithm [4]. Expectation maximisation is a standard approach to finding maximum likelihood (ML) parameters for latent variable models, including hidden Markov models and linear or non-linear state space models (SSMs) for time series.

...read moreread less

217 citations

Journal Article•DOI•

Model-based decoding, information estimation, and change-point detection techniques for multineuron spike trains

[...]

Jonathan W. Pillow¹, Yashar Ahmadian², Liam Paninski²•Institutions (2)

University of Texas at Austin¹, Columbia University²

01 Jan 2011-Neural Computation

TL;DR: Several decoding methods based on point-process neural encoding models, or forward models that predict spike responses to stimuli, are developed, which allow efficient maximum-likelihood model fitting and stimulus decoding.

...read moreread less

Abstract: One of the central problems in systems neuroscience is to understand how neural spike trains convey sensory information. Decoding methods, which provide an explicit means for reading out the information contained in neural spike responses, offer a powerful set of tools for studying the neural coding problem. Here we develop several decoding methods based on point-process neural encoding models, or forward models that predict spike responses to stimuli. These models have concave log-likelihood functions, which allow efficient maximum-likelihood model fitting and stimulus decoding. We present several applications of the encoding model framework to the problem of decoding stimulus information from population spike responses: (1) a tractable algorithm for computing the maximum a posteriori (MAP) estimate of the stimulus, the most probable stimulus to have generated an observed single-or multiple-neuron spike train response, given some prior distribution over the stimulus; (2) a gaussian approximation to the posterior stimulus distribution that can be used to quantify the fidelity with which various stimulus features are encoded; (3) an efficient method for estimating the mutual information between the stimulus and the spike trains emitted by a neural population; and (4) a framework for the detection of change-point times (the time at which the stimulus undergoes a change in mean or variance) by marginalizing over the posterior stimulus distribution. We provide several examples illustrating the performance of these estimators with simulated and real neural data.

...read moreread less

162 citations

Posted Content•

Automatic Relevance Determination in Nonnegative Matrix Factorization with the \beta-Divergence

[...]

Vincent Y. F. Tan¹, Cédric Févotte•Institutions (1)

Institute for Infocomm Research Singapore¹

25 Nov 2011-arXiv: Machine Learning

TL;DR: In this article, a Bayesian model based on automatic relevance determination is proposed, in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior.

...read moreread less

Abstract: This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the \beta-divergence. The \beta-divergence is a family of cost functions that includes the squared Euclidean distance, Kullback-Leibler and Itakura-Saito divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example and a stock price prediction task.

...read moreread less

158 citations

Proceedings Article•DOI•

WiGEM: a learning-based approach for indoor localization

[...]

Abhishek Goswami¹, Luis E. Ortiz¹, Samir R. Das¹•Institutions (1)

Stony Brook University¹

06 Dec 2011

TL;DR: This work proposes a 'learning-based' approach, WiGEM, where the received signal strength is modeled as a Gaussian Mixture Model (GMM) where Expectation Maximization (EM) is used to learn the maximum likelihood estimates of the model parameters.

...read moreread less

Abstract: We consider the problem of localizing a wireless client in an indoor environment based on the signal strength of its transmitted packets as received on stationary sniffers or access points. Several state-of-the-art indoor localization techniques have the drawback that they rely extensively on a labor-intensive 'training' phase that does not scale well. Use of unmodeled hardware with heterogeneous power levels further reduces the accuracy of these techniques.We propose a 'learning-based' approach, WiGEM, where the received signal strength is modeled as a Gaussian Mixture Model (GMM). Expectation Maximization (EM) is used to learn the maximum likelihood estimates of the model parameters. This approach enables us to localize a transmitting device based on the maximum a posteriori estimate. The key insight is to use the physics of wireless propagation, and exploit the signal strength constraints that exist for different transmit power levels. The learning approach not only avoids the labor-intensive training, but also makes the location estimates considerably robust in the face of heterogeneity and various time varying phenomena. We present evaluations on two different indoor testbeds with multiple WiFi devices. We demonstrate that WiGEM's accuracy is at par with or better than state-of-the-art techniques but without requiring any training.

...read moreread less

139 citations

Journal Article•DOI•

Unsupervised Extraction of Flood-Induced Backscatter Changes in SAR Data Using Markov Image Modeling on Irregular Graphs

[...]

Sandro Martinis, André Twele, Stefan Voigt

01 Jan 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The experiments that were performed on a bitemporal TerraSAR-X StripMap data set from South West England during and after a large-scale flooding in 2007 confirm the effectiveness of the proposed change detection method and show an increased classification accuracy of the hybrid MRF model in comparison to the sole application of the HMAP estimation.

...read moreread less

Abstract: The near real-time provision of precise information about flood dynamics from synthetic aperture radar (SAR) data is an essential task in disaster management. A novel tile-based parametric thresholding approach under the generalized Gaussian assumption is applied on normalized change index data to automatically solve the three-class change detection problem in large-size images with small class a priori probabilities. The thresholding result is used for the initialization of a hybrid Markov model which integrates scale-dependent and spatiocontextual information into the labeling process by combining hierarchical with noncausal Markov image modeling. Hierarchical maximum a posteriori (HMAP) estimation using the Markov chains in scale, originally developed on quadtrees, is adapted to hierarchical irregular graphs. To reduce the computational effort of the iterative optimization process that is related to noncausal Markov models, a Markov random field (MRF) approach is defined, which is applied on a restricted region of the lowest level of the graph, selected according to the HMAP labeling result. The experiments that were performed on a bitemporal TerraSAR-X StripMap data set from South West England during and after a large-scale flooding in 2007 confirm the effectiveness of the proposed change detection method and show an increased classification accuracy of the hybrid MRF model in comparison to the sole application of the HMAP estimation. Additionally, the impact of the graph structure and the chosen model parameters on the labeling result as well as on the performance is discussed.

...read moreread less

136 citations

Journal Article•

Robust Gaussian Process Regression with a Student- t Likelihood

[...]

Pasi Jylänki¹, Jarno Vanhatalo¹, Aki Vehtari¹•Institutions (1)

Helsinki University of Technology¹

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This paper illustrates the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence and demonstrates that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters.

...read moreread less

Abstract: This paper considers the robust and efficient implementation of Gaussian process regression with a Student-t observation model, which has a non-log-concave likelihood. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. Expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of EP is known to be problematic with models containing non-log-concave site functions. In this paper we illustrate the situations where standard EP fails to converge and review different modifications and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that standard EP may not converge in the MAP values with some difficult data sets. We present a robust implementation which relies primarily on parallel EP updates and uses a moment-matching-based double-loop algorithm with adaptively selected step size in difficult cases. The predictive performance of EP is compared with Laplace, variational Bayes, and Markov chain Monte Carlo approximations.

...read moreread less

127 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
Collapse