Showing papers on "Gaussian process published in 2010"

PDF

Open Access

Journal Article•DOI•

Point Set Registration: Coherent Point Drift

[...]

Andriy Myronenko¹, Xubo Song¹•Institutions (1)

01 Dec 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A probabilistic method, called the Coherent Point Drift (CPD) algorithm, is introduced for both rigid and nonrigid point set registration and a fast algorithm is introduced that reduces the method computation complexity to linear.

...read moreread less

Abstract: Point set registration is a key component in many computer vision tasks. The goal of point set registration is to assign correspondences between two sets of points and to recover the transformation that maps one point set to the other. Multiple factors, including an unknown nonrigid spatial transformation, large dimensionality of point set, noise, and outliers, make the point set registration a challenging problem. We introduce a probabilistic method, called the Coherent Point Drift (CPD) algorithm, for both rigid and nonrigid point set registration. We consider the alignment of two point sets as a probability density estimation problem. We fit the Gaussian mixture model (GMM) centroids (representing the first point set) to the data (the second point set) by maximizing the likelihood. We force the GMM centroids to move coherently as a group to preserve the topological structure of the point sets. In the rigid case, we impose the coherence constraint by reparameterization of GMM centroid locations with rigid parameters and derive a closed form solution of the maximization step of the EM algorithm in arbitrary dimensions. In the nonrigid case, we impose the coherence constraint by regularizing the displacement field and using the variational calculus to derive the optimal transformation. We also introduce a fast algorithm that reduces the method computation complexity to linear. We test the CPD algorithm for both rigid and nonrigid transformations in the presence of noise, outliers, and missing points, where CPD shows accurate results and outperforms current state-of-the-art methods.

...read moreread less

2,429 citations

Proceedings Article•

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

[...]

Niranjan Srinivas¹, Andreas Krause¹, Matthias Seeger², Sham M. Kakade³•Institutions (3)

California Institute of Technology¹, Saarland University², University of Pennsylvania³

21 Jun 2010

TL;DR: This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

...read moreread less

Abstract: Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.

...read moreread less

1,876 citations

Journal Article•DOI•

Gaussian Processes for Machine Learning (GPML) Toolbox

[...]

Carl Edward Rasmussen¹, Hannes Nickisch¹•Institutions (1)

Max Planck Society¹

01 Mar 2010-Journal of Machine Learning Research

TL;DR: The GPML toolbox provides a wide range of functionality for Gaussian process (GP) inference and prediction, including exact and variational inference, Expectation Propagation, and Laplace's method dealing with non-Gaussian likelihoods and FITC for dealing with large regression tasks.

...read moreread less

Abstract: The GPML toolbox provides a wide range of functionality for Gaussian process (GP) inference and prediction. GPs are specified by mean and covariance functions; we offer a library of simple mean and covariance functions and mechanisms to compose more complex ones. Several likelihood functions are supported including Gaussian and heavy-tailed for regression as well as others suitable for classification. Finally, a range of inference methods is provided, including exact and variational inference, Expectation Propagation, and Laplace's method dealing with non-Gaussian likelihoods and FITC for dealing with large regression tasks.

...read moreread less

924 citations

Journal Article•DOI•

A new kernel-based approach for linear system identification

[...]

Gianluigi Pillonetto¹, Giuseppe De Nicolao²•Institutions (2)

University of Padua¹, University of Pavia²

01 Jan 2010-Automatica

TL;DR: A new kernel-based approach for linear system identification of stable systems that model the impulse response as the realization of a Gaussian process whose statistics include information not only on smoothness but also on BIBO-stability.

...read moreread less

469 citations

Journal Article•DOI•

Bayesian Compressive Sensing Via Belief Propagation

[...]

Dror Baron¹, Shriram Sarvotham², Richard G. Baraniuk³•Institutions (3)

Technion – Israel Institute of Technology¹, Halliburton², Rice University³

01 Jan 2010-IEEE Transactions on Signal Processing

TL;DR: In this paper, a two-state mixture Gaussian model is used to perform asymptotically optimal Bayesian inference using belief propagation decoding, which represents the CS encoding matrix as a graphical model.

...read moreread less

Abstract: Compressive sensing (CS) is an emerging field based on the revelation that a small collection of linear projections of a sparse signal contains enough information for stable, sub-Nyquist signal acquisition When a statistical characterization of the signal is available, Bayesian inference can complement conventional CS methods based on linear programming or greedy algorithms We perform asymptotically optimal Bayesian inference using belief propagation (BP) decoding, which represents the CS encoding matrix as a graphical model Fast computation is obtained by reducing the size of the graphical model with sparse encoding matrices To decode a length-N signal containing K large coefficients, our CS-BP decoding algorithm uses O(K log(N)) measurements and O(N log2(N)) computation Finally, although we focus on a two-state mixture Gaussian model, CS-BP is easily adapted to other signal models

...read moreread less

468 citations

Journal Article•DOI•

Sparse Spectrum Gaussian Process Regression

[...]

Miguel Lázaro-Gredilla¹, Joaquin Quiñonero-Candela², Carl Edward Rasmussen², Aníbal R. Figueiras-Vidal³•Institutions (3)

Complutense University of Madrid¹, Max Planck Society², University of Cambridge³

01 Mar 2010-Journal of Machine Learning Research

TL;DR: The achievable trade-offs between predictive accuracy and computational requirements are compared, and it is shown that these are typically superior to existing state-of-the-art sparse approximations.

...read moreread less

Abstract: We present a new sparse Gaussian Process (GP) model for regression. The key novel idea is to sparsify the spectral representation of the GP. This leads to a simple, practical algorithm for regression tasks. We compare the achievable trade-offs between predictive accuracy and computational requirements, and show that these are typically superior to existing state-of-the-art sparse approximations. We discuss both the weight space and function space representations, and note that the new construction implies priors over functions which are always stationary, and can approximate any covariance function in this class.

...read moreread less

463 citations

Journal Article•DOI•

Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction

[...]

Tomohiro Nakatani¹, Takuya Yoshioka¹, Keisuke Kinoshita¹, Masato Miyoshi², Biing-Hwang Juang³ - Show less +1 more•Institutions (3)

Nippon Telegraph and Telephone¹, Kanazawa University², Georgia Institute of Technology³

01 Sep 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal and can be implemented in a computationally efficient manner in the time-frequency domain.

...read moreread less

Abstract: This paper proposes a statistical model-based speech dereverberation approach that can cancel the late reverberation of a reverberant speech signal captured by distant microphones without prior knowledge of the room impulse responses. With this approach, the generative model of the captured signal is composed of a source process, which is assumed to be a Gaussian process with a time-varying variance, and an observation process modeled by a delayed linear prediction (DLP). The optimization objective for the dereverberation problem is derived to be the sum of the squared prediction errors normalized by the source variances; hence, this approach is referred to as variance-normalized delayed linear prediction (NDLP). Inheriting the characteristic of DLP, NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal. In addition, owing to the use of variance normalization, NDLP allows us to improve the dereverberation result especially with relatively short (of the order of a few seconds) observations. Furthermore, NDLP can be implemented in a computationally efficient manner in the time-frequency domain. Experimental results demonstrate the effectiveness and efficiency of the proposed approach in comparison with two existing approaches.

...read moreread less

371 citations

Book Chapter•DOI•

Kriging is well-suited to parallelize optimization

[...]

David Ginsbourger¹, Rodolphe Le Riche¹, Laurent Carraro¹•Institutions (1)

École Normale Supérieure¹

01 Jan 2010

TL;DR: This work investigates a multi-points optimization criterion, the multipoints expected improvement ($q-{\mathbb E}I$), aimed at choosing several points at the same time, and proposes two classes of heuristic strategies meant to approximately optimize the Q-EI, and applies them to the classical Branin-Hoo test-case function.

...read moreread less

Abstract: The optimization of expensive-to-evaluate functions generally relies on metamodel-based exploration strategies. Many deterministic global optimization algorithms used in the field of computer experiments are based on Kriging (Gaussian process regression). Starting with a spatial predictor including a measure of uncertainty, they proceed by iteratively choosing the point maximizing a criterion which is a compromise between predicted performance and uncertainty. Distributing the evaluation of such numerically expensive objective functions on many processors is an appealing idea. Here we investigate a multi-points optimization criterion, the multipoints expected improvement ($q-{\mathbb E}I$), aimed at choosing several points at the same time. An analytical expression of the $q-{\mathbb E}I$ is given when q = 2, and a consistent statistical estimate is given for the general case. We then propose two classes of heuristic strategies meant to approximately optimize the $q-{\mathbb E}I$, and apply them to the classical Branin-Hoo test-case function. It is finally demonstrated within the covered example that the latter strategies perform as good as the best Latin Hypercubes and Uniform Designs ever found by simulation (2000 designs drawn at random for every q ∈ [1,10]).

...read moreread less

364 citations

Proceedings Article•

Bayesian Gaussian Process Latent Variable Model

[...]

Michalis K. Titsias¹, Neil D. Lawrence²•Institutions (2)

National and Kapodistrian University of Athens¹, University of Sheffield²

31 Mar 2010

TL;DR: In this article, a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction is introduced, which can automatically select the dimensionality of the nonlinear latent space.

...read moreread less

Abstract: We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.

...read moreread less

338 citations

Journal Article•DOI•

Twin Gaussian Processes for Structured Prediction

[...]

Liefeng Bo, Cristian Sminchisescu¹•Institutions (1)

University of Bonn¹

01 Mar 2010-International Journal of Computer Vision

TL;DR: Twin Gaussian processes (TGP), a generic structured prediction method that uses Gaussian process priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, is described.

...read moreread less

Abstract: We describe twin Gaussian processes (TGP), a generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions. TGP captures not only the interdependencies between covariates, as in a typical GP, but also those between responses, so correlations among both inputs and outputs are accounted for. TGP is exemplified, with promising results, for the reconstruction of 3d human poses from monocular and multicamera video sequences in the recently introduced HumanEva benchmark, where we achieve 5 cm error on average per 3d marker for models trained jointly, using data from multiple people and multiple activities. The method is fast and automatic: it requires no hand-crafting of the initial pose, camera calibration parameters, or the availability of a 3d body model associated with human subjects used for training or testing.

...read moreread less

303 citations

Journal Article•DOI•

Statistical Representation of Distribution System Loads Using Gaussian Mixture Model

[...]

Ravindra Singh¹, Bikash C. Pal¹, Rabih A. Jabr²•Institutions (2)

Imperial College London¹, American University of Beirut²

01 Feb 2010-IEEE Transactions on Power Systems

TL;DR: In this article, a probabilistic approach for statistical modeling of the loads in distribution networks is presented, where the probability density functions (pdfs) of loads at different buses show a number of variations and cannot be represented by any specific distribution.

...read moreread less

Abstract: This paper presents a probabilistic approach for statistical modeling of the loads in distribution networks. In a distribution network, the probability density functions (pdfs) of loads at different buses show a number of variations and cannot be represented by any specific distribution. The approach presented in this paper represents all the load pdfs through Gaussian mixture model (GMM). The expectation maximization (EM) algorithm is used to obtain the parameters of the mixture components. The performance of the method is demonstrated on a 95-bus generic distribution network model.

...read moreread less

Journal Article•DOI•

An Inverse Gaussian Process Model for Degradation Data

[...]

Xiao Wang¹, Dihua Xu²•Institutions (2)

Purdue University¹, University of Maryland, Baltimore County²

01 May 2010-Technometrics

TL;DR: Both the subject-to-subject heterogeneity and covariate information can be incorporated into the model in a natural way and the bootstrap is used to assess the variability of the maximum likelihood estimators.

...read moreread less

Abstract: This paper studies the maximum likelihood estimation of a class of inverse Gaussian process models for degradation data Both the subject-to-subject heterogeneity and covariate information can be incorporated into the model in a natural way The EM algorithm is used to obtain the maximum likelihood estimators of the unknown parameters and the bootstrap is used to assess the variability of the maximum likelihood estimators Simulations are used to validate the method The model is fitted to laser data and corresponding goodness-of-fit tests are carried out Failure time distributions in terms of degradation level passages are calculated and illustrated The supplemental materials for this article are available online

...read moreread less

Journal Article•DOI•

Nonparametric belief propagation

[...]

Erik B. Sudderth¹, Alexander T. Ihler², Michael Isard³, William T. Freeman⁴, Alan S. Willsky⁴ - Show less +1 more•Institutions (4)

Brown University¹, University of California, Irvine², Microsoft³, Massachusetts Institute of Technology⁴

01 Oct 2010-Communications of The ACM

TL;DR: This work describes an extension of BP to continuous variable models, generalizing particle filtering, and Gaussian mixture filtering techniques for time series to more complex models and illustrates the power of the resulting nonparametric BP algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.

...read moreread less

Abstract: Continuous quantities are ubiquitous in models of real-world phenomena, but are surprisingly difficult to reason about automatically. Probabilistic graphical models such as Bayesian networks and Markov random fields, and algorithms for approximate inference such as belief propagation (BP), have proven to be powerful tools in a wide range of applications in statistics and artificial intelligence. However, applying these methods to models with continuous variables remains a challenging task. In this work we describe an extension of BP to continuous variable models, generalizing particle filtering, and Gaussian mixture filtering techniques for time series to more complex models. We illustrate the power of the resulting nonparametric BP algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.

...read moreread less

Proceedings Article•DOI•

Kalman filtering and smoothing solutions to temporal Gaussian process regression models

[...]

Jouni Hartikainen¹, Simo Särkkä¹•Institutions (1)

Aalto University¹

07 Oct 2010

TL;DR: This paper shows how temporal Gaussian process regression models in machine learning can be reformulated as linear-Gaussian state space models, which can be solved exactly with classical Kalman filtering theory, and produces an efficient non-parametric learning algorithm.

...read moreread less

Abstract: In this paper, we show how temporal (i.e., time-series) Gaussian process regression models in machine learning can be reformulated as linear-Gaussian state space models, which can be solved exactly with classical Kalman filtering theory. The result is an efficient non-parametric learning algorithm, whose computational complexity grows linearly with respect to number of observations. We show how the reformulation can be done for Matern family of covariance functions analytically and for squared exponential covariance function by applying spectral Taylor series approximation. Advantages of the proposed approach are illustrated with two numerical experiments.

...read moreread less

Journal Article•DOI•

MIMO Gaussian Channels With Arbitrary Inputs: Optimal Precoding and Power Allocation

[...]

Fernando Perez-Cruz¹, Miguel R. D. Rodrigues², Sergio Verdu¹•Institutions (2)

Princeton University¹, University of Porto²

01 Mar 2010-IEEE Transactions on Information Theory

TL;DR: A generalization of the mercury/waterfilling algorithm, previously proposed for parallel noninterfering channels, is put forth, in which the mercury level accounts not only for the non-Gaussian input distributions, but also for the interference among inputs.

...read moreread less

Abstract: In this paper, we investigate the linear precoding and power allocation policies that maximize the mutual information for general multiple-input-multiple-output (MIMO) Gaussian channels with arbitrary input distributions, by capitalizing on the relationship between mutual information and minimum mean-square error (MMSE). The optimal linear precoder satisfies a fixed-point equation as a function of the channel and the input constellation. For non-Gaussian inputs, a nondiagonal precoding matrix in general increases the information transmission rate, even for parallel noninteracting channels. Whenever precoding is precluded, the optimal power allocation policy also satisfies a fixed-point equation; we put forth a generalization of the mercury/waterfilling algorithm, previously proposed for parallel noninterfering channels, in which the mercury level accounts not only for the non-Gaussian input distributions, but also for the interference among inputs.

...read moreread less

Journal Article•DOI•

Shannon-Theoretic Limits on Noisy Compressive Sampling

[...]

Mehmet Akcakaya¹, Vahid Tarokh¹•Institutions (1)

Harvard University¹

01 Jan 2010-IEEE Transactions on Information Theory

TL;DR: In this article, the authors studied the number of measurements required to recover a sparse signal in CM with L nonzero coefficients from compressed samples in the presence of noise, and proved that O(L) is sufficient and sufficient for signal recovery, whenever L grows linearly as a function of M. In contrast, the implementation of their proof method would have a higher complexity.

...read moreread less

Abstract: In this paper, we study the number of measurements required to recover a sparse signal in CM with L nonzero coefficients from compressed samples in the presence of noise. We consider a number of different recovery criteria, including the exact recovery of the support of the signal, which was previously considered in the literature, as well as new criteria for the recovery of a large fraction of the support of the signal, and the recovery of a large fraction of the energy of the signal. For these recovery criteria, we prove that O(L) (an asymptotically linear multiple of L) measurements are necessary and sufficient for signal recovery, whenever L grows linearly as a function of M. This improves on the existing literature that is mostly focused on variants of a specific recovery algorithm based on convex programming, for which O(L log(M - L)) measurements are required. In contrast, the implementation of our proof method would have a higher complexity. We also show that O(L log(M - L)) measurements are required in the sublinear regime (L - o(M)). For our sufficiency proofs, we introduce a Shannon-theoretic decoder based on joint typicality, which allows error events to be defined in terms of a single random variable in contrast to previous information-theoretic work, where comparison of random variables are required. We also prove concentration results for our error bounds implying that a randomly selected Gaussian matrix will suffice with high probability. For our necessity proofs, we rely on results from channel coding and rate-distortion theory.

...read moreread less

Proceedings Article•DOI•

Multi-task warped Gaussian process for personalized age estimation

[...]

Yu Zhang¹, Dit-Yan Yeung¹•Institutions (1)

Hong Kong University of Science and Technology¹

13 Jun 2010

TL;DR: This paper proposes a novel approach to age estimation by formulating the problem as a multi-task learning problem called multi- task warped Gaussian process (MTWGP), and shows that MTWGP compares favorably with state-of-the-art age estimation methods.

...read moreread less

Abstract: Automatic age estimation from facial images has aroused research interests in recent years due to its promising potential for some computer vision applications. Among the methods proposed to date, personalized age estimation methods generally outperform global age estimation methods by learning a separate age estimator for each person in the training data set. However, since typical age databases only contain very limited training data for each person, training a separate age estimator using only training data for that person runs a high risk of overfitting the data and hence the prediction performance is limited. In this paper, we propose a novel approach to age estimation by formulating the problem as a multi-task learning problem. Based on a variant of the Gaussian process (GP) called warped Gaussian process (WGP), we propose a multi-task extension called multi-task warped Gaussian process (MTWGP). Age estimation is formulated as a multi-task regression problem in which each learning task refers to estimation of the age function for each person. While MTWGP models common features shared by different tasks (persons), it also allows task-specific (person-specific) features to be learned automatically. Moreover, unlike previous age estimation methods which need to specify the form of the regression functions or determine many parameters in the functions using inefficient methods such as cross validation, the form of the regression functions in MTWGP is implicitly defined by the kernel function and all its model parameters can be learned from data automatically. We have conducted experiments on two publicly available age databases, FG-NET and MORPH. The experimental results are very promising in showing that MTWGP compares favorably with state-of-the-art age estimation methods.

...read moreread less

Parameter Estimation from Time-Series Data with Correlated Errors: A Wavelet-Based Method and its Application to Transit Light Curves

[...]

Joshua A. Carter¹, Joshua N. Winn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 2010

TL;DR: In this article, the authors consider the problem of fitting a parametric model to time-series data that are afflicted by correlated noise, represented by a sum of two stationary Gaussian processes: one that is uncorrelated in time and another that has a power spectral density varying as 1/f γ.

...read moreread less

Abstract: We consider the problem of fitting a parametric model to time-series data that are afflicted by correlated noise. The noise is represented by a sum of two stationary Gaussian processes: one that is uncorrelated in time, and another that has a power spectral density varying as 1/f γ. We present an accurate and fast [O(N)] algorithm for parameter estimation based on computing the likelihood in a wavelet basis. The method is illustrated and tested using simulated time-series photometry of exoplanetary transits, with particular attention to estimating the mid-transit time. We compare our method to two other methods that have been used in the literature, the time-averaging method and the residual-permutation method. For noise processes that obey our assumptions, the algorithm presented here gives more accurate results for mid-transit times and truer estimates of their uncertainties.

...read moreread less

Journal Article•DOI•

Convergence properties of the expected improvement algorithm with fixed mean and covariance functions

[...]

Emmanuel Vazquez¹, Julien Bect¹•Institutions (1)

Supélec¹

01 Nov 2010-Journal of Statistical Planning and Inference

TL;DR: The first result is that under some mild hypotheses on the covariance function k of the Gaussian process, the expected improvement algorithm produces a dense sequence of evaluation points in the search domain, when the function to be optimized is in the reproducing kernel Hilbert space generated by k.

...read moreread less

Proceedings Article•

Slice sampling covariance hyperparameters of latent Gaussian models

[...]

Iain Murray¹, Ryan P. Adams²•Institutions (2)

University of Edinburgh¹, University of Toronto²

06 Dec 2010

TL;DR: In this article, a slice sampling approach is presented that requires little tuning while mixing well in both strong and weak data regimes, in which the covariance structure can be specified using unknown hyperparameters.

...read moreread less

Abstract: The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes.

...read moreread less

Journal Article•DOI•

Gaussian Processes for Object Categorization

[...]

Ashish Kapoor¹, Kristen Grauman², Raquel Urtasun³, Trevor Darrell³•Institutions (3)

Microsoft¹, University of Texas at Austin², University of California, Berkeley³

01 Jun 2010-International Journal of Computer Vision

TL;DR: This work shows that with an appropriate combination of kernels a significant boost in classification performance is possible, and indicates the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.

...read moreread less

Abstract: Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.

...read moreread less

Book•

Efficient Reinforcement Learning Using Gaussian Processes

[...]

Marc Peter Deisenroth

22 Nov 2010

TL;DR: First, PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available is introduced, and principled algorithms for robust filtering and smoothing in GP dynamic systems are proposed.

...read moreread less

Abstract: This book examines Gaussian processes in both model-based reinforcement learning (RL) and inference in nonlinear dynamic systems. First, we introduce PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available. PILCO takes model uncertainties consistently into account during long-term planning to reduce model bias. Second, we propose principled algorithms for robust filtering and smoothing in GP dynamic systems. Umfang: IX, 205 S. Preis: €36.00 | £33.00 | $63.00

...read moreread less

Proceedings Article•DOI•

Subspace Gaussian Mixture Models for speech recognition

[...]

Daniel Povey¹, Lukas Burget², Mohit Agarwal³, Pinar Akyazi, Kai Feng⁴, Arnab Ghoshal⁵, Ondrej Glembek², Nagendra Kumar Goel, Martin Karafiat², Ariya Rastrow⁶, Richard Rose⁷, Petr Schwarz², Samuel Thomas⁶ - Show less +9 more•Institutions (7)

Microsoft¹, Brno University of Technology², Indian Institute of Information Technology, Allahabad³, Hong Kong University of Science and Technology⁴, Saarland University⁵, Johns Hopkins University⁶, McGill University⁷

14 Mar 2010

TL;DR: An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.

...read moreread less

Abstract: We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.

...read moreread less

Proceedings Article•DOI•

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

[...]

Lukas Burget¹, Petr Schwarz¹, Mohit Agarwal², Pinar Akyazi, Kai Feng³, Arnab Ghoshal⁴, Ondrej Glembek¹, Nagendra Kumar Goel, Martin Karafiat¹, Daniel Povey⁵, Ariya Rastrow⁶, Richard Rose⁷, Samuel Thomas⁶ - Show less +9 more•Institutions (7)

Brno University of Technology¹, Indian Institute of Information Technology, Allahabad², Hong Kong University of Science and Technology³, Saarland University⁴, Microsoft⁵, Johns Hopkins University⁶, McGill University⁷

14 Mar 2010

TL;DR: This work reports experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages.

...read moreread less

Abstract: Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.

...read moreread less

Journal Article•DOI•

Low-Dimensional Models for Dimensionality Reduction and Signal Recovery: A Geometric Perspective

[...]

Richard G. Baraniuk¹, Volkan Cevher¹, Michael B. Wakin²•Institutions (2)

Rice University¹, Colorado School of Mines²

29 Apr 2010

TL;DR: In this article, the authors compare and contrast from a geometric perspective a number of low-dimensional signal models that support stable information-preserving dimensionality reduction, including sparse and compressible signal models for deterministic and random signals.

...read moreread less

Abstract: We compare and contrast from a geometric perspective a number of low-dimensional signal models that support stable information-preserving dimensionality reduction. We consider sparse and compressible signal models for deterministic and random signals, structured sparse and compressible signal models, point clouds, and manifold signal models. Each model has a particular geometrical structure that enables signal information to be stably preserved via a simple linear and nonadaptive projection to a much lower dimensional space; in each case the projection dimension is independent of the signal's ambient dimension at best or grows logarithmically with it at worst. As a bonus, we point out a common misconception related to probabilistic compressible signal models, namely, by showing that the oft-used generalized Gaussian and Laplacian models do not support stable linear dimensionality reduction.

...read moreread less

Proceedings Article•DOI•

A Gaussian mixture PHD filter for extended target tracking

[...]

Karl Granstrom¹, Christian Lundquist¹, Umut Orguner¹•Institutions (1)

Linköping University¹

26 Jul 2010

TL;DR: The main contribution of this paper is the implementation of a Probability Hypothesis Density filter for tracking of multiple extended targets, and a method to easily partition the measurements into a number of subsets that all stem from the same source.

...read moreread less

Abstract: In extended target tracking, targets potentially produce more than one measurement per time step. Multiple extended targets are therefore usually hard to track, due to the resulting complex data association. The main contribution of this paper is the implementation of a Probability Hypothesis Density (PHD) filter for tracking of multiple extended targets. A general modification of the PHD filter to handle extended targets has been presented recently by Mahler, and the novelty in this work lies in the realisation of a Gaussian mixture PHD filter for extended targets. Furthermore, we propose a method to easily partition the measurements into a number of subsets, each of which is supposed to contain measurements that all stem from the same source. The method is illustrated in simulation examples, and the advantage of the implemented extended target PHD filter is shown in a comparison with a standard PHD filter.

...read moreread less

Journal Article•DOI•

Distribution system state estimation through Gaussian mixture model of the load as pseudo-measurement

[...]

Ravindra Singh¹, Bikash C. Pal¹, Rabih A. Jabr²•Institutions (2)

Imperial College London¹, American University of Beirut²

01 Jan 2010-Iet Generation Transmission & Distribution

TL;DR: In this article, the load probability density function (pdf) in the distribution network shows a number of variations at different nodes and cannot be represented by any specific distribution, and an approach to utilise the loads as pseudo-measurements for the purpose of distribution system state estimation (DSSE).

...read moreread less

Abstract: This study presents an approach to utilise the loads as pseudo-measurements for the purpose of distribution system state estimation (DSSE). The load probability density function (pdf) in the distribution network shows a number of variations at different nodes and cannot be represented by any specific distribution. The approach presented in this study represents all the load pdfs through the Gaussian mixture model (GMM). The expectation maximisation (EM) algorithm is used to obtain the parameters of the mixture components. The standard weighted least squares (WLS) algorithm utilises these load models as pseudo-measurements. The effectiveness of WLS is assessed through some statistical measures such as bias, consistency and quality of the estimates in a 95-bus generic distribution network model.

...read moreread less

Journal Article•DOI•

Mobile Emitter Geolocation and Tracking Using TDOA and FDOA Measurements

[...]

Darko Musicki¹, R. Kaune, Wolfgang Koch•Institutions (1)

Hanyang University¹

01 Mar 2010-IEEE Transactions on Signal Processing

TL;DR: This paper considers recursive tracking of one mobile emitter using a sequence of time difference of arrival (TDOA) and frequency difference of arriving measurement pairs obtained by one pair of sensors, which results in a better track state probability density function approximation by a Gaussian mixture, and tracking results near the Cramer-Rao lower bound.

...read moreread less

Abstract: This paper considers recursive tracking of one mobile emitter using a sequence of time difference of arrival (TDOA) and frequency difference of arrival (FDOA) measurement pairs obtained by one pair of sensors. We consider only a single emitter without data association issues (no missed detections or false measurements). Each TDOA measurement defines a region of possible emitter locations around a unique hyperbola. This likelihood function is approximated by a Gaussian mixture, which leads to a dynamic bank of Kalman filters tracking algorithm. The FDOA measurements update relative probabilities and estimates of individual Kalman filters. This approach results in a better track state probability density function approximation by a Gaussian mixture, and tracking results near the Cramer-Rao lower bound. Proposed algorithm is also applicable in other cases of nonlinear information fusion. The performance of proposed Gaussian mixture approach is evaluated using a simulation study, and compared with a bank of EKF filters and the Cramer-Rao lower bound.

...read moreread less

Journal Article•DOI•

Feedback Capacity of Stationary Gaussian Channels

[...]

Young-Han Kim¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 2010-IEEE Transactions on Information Theory

TL;DR: This result shows that the celebrated Schalkwijk-Kailath coding achieves the feedback capacity for the first-order autoregressive moving-average Gaussian channel, positively answering a long-standing open problem studied by Butman, Tiernan-SchalkWijk, Wolfowitz, Ozarow, Ordentlich, Yang-Kavc¿ic¿-Tatikonda, and others.

...read moreread less

Abstract: The feedback capacity of additive stationary Gaussian noise channels is characterized as the solution to a variational problem in the noise power spectral density. When specialized to the first-order autoregressive moving-average noise spectrum, this variational characterization yields a closed-form expression for the feedback capacity. In particular, this result shows that the celebrated Schalkwijk-Kailath coding achieves the feedback capacity for the first-order autoregressive moving-average Gaussian channel, positively answering a long-standing open problem studied by Butman, Tiernan-Schalkwijk, Wolfowitz, Ozarow, Ordentlich, Yang-Kavc?ic?-Tatikonda, and others. More generally, it is shown that a k-dimensional generalization of the Schalkwijk-Kailath coding achieves the feedback capacity for any autoregressive moving-average noise spectrum of order k. Simply put, the optimal transmitter iteratively refines the receiver's knowledge of the intended message. This development reveals intriguing connections between estimation, control, and feedback communication.

...read moreread less

Proceedings Article•DOI•

Bayesian optimization for sensor set selection

[...]

Roman Garnett¹, Michael A. Osborne¹, Stephen J. Roberts¹•Institutions (1)

University of Oxford¹

12 Apr 2010

TL;DR: A natural metric is introduced between sets of sensors that can be used to construct covariance functions over sets, and thereby perform Gaussian process inference over a function whose domain is a power set.

...read moreread less

Abstract: We consider the problem of selecting an optimal set of sensors, as determined, for example, by the predictive accuracy of the resulting sensor network. Given an underlying metric between pairs of set elements, we introduce a natural metric between sets of sensors for this task. Using this metric, we can construct covariance functions over sets, and thereby perform Gaussian process inference over a function whose domain is a power set. If the function has additional inputs, our covariances can be readily extended to incorporate them---allowing us to consider, for example, functions over both sets and time. These functions can then be optimized using Gaussian process global optimization (GPGO). We use the root mean squared error (RMSE) of the predictions made using a set of sensors at a particular time as an example of such a function to be optimized; the optimal point specifies the best choice of sensor locations. We demonstrate the resulting method by dynamically selecting the best subset of a given set of weather sensors for the prediction of the air temperature across the United Kingdom.

...read moreread less

Collapse