scispace - formally typeset
Search or ask a question
Author

Pierre Alquier

Bio: Pierre Alquier is an academic researcher from Université Paris-Saclay. The author has contributed to research in topics: Estimator & Bayesian probability. The author has an hindex of 23, co-authored 97 publications receiving 1597 citations. Previous affiliations of Pierre Alquier include ENSAE ParisTech & University College Dublin.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors explore a variety of situations where it is possible to quantify how close the chain given by the transition kernel given by a Markov chain is to the chain generated by a transition kernel.
Abstract: Monte Carlo algorithms often aim to draw from a distribution $$\pi $$? by simulating a Markov chain with transition kernel $$P$$P such that $$\pi $$? is invariant under $$P$$P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel $$P$$P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace $$P$$P by an approximation $$\hat{P}$$P^. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how `close' the chain given by the transition kernel $$\hat{P}$$P^ is to the chain given by $$P$$P. We apply these results to several examples from spatial statistics and network analysis.

155 citations

Journal Article
TL;DR: In this paper, the authors consider variational approximations of the Gibbs posterior, which are fast to compute and have the same rate of convergence as the original PAC-Bayesian procedure.
Abstract: The PAC-Bayesian approach is a powerful set of techniques to derive nonasymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately often intractable. One may sample from it using Markov chain Monte Carlo, but this is usually too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. In addition, we show that, when the risk function is convex, a variational approximation can be obtained in polynomial time using a convex solver. We give finite sample oracle inequalities for the corresponding estimator. We specialize our results to several learning tasks (classification, ranking, matrix completion), discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.

97 citations

Journal ArticleDOI
TL;DR: In this article, the authors propose a two-step procedure for predicting the next value of a stationary time series, where the first step follows machine learning theory paradigm and consists in determining a set of possible predictors as randomized estimators in (possibly numerous) different predictive models.
Abstract: Observing a stationary time series, we propose a two-steps procedure for the prediction of its next value. The first step follows machine learning theory paradigm and consists in determining a set of possible predictors as randomized estimators in (possibly numerous) different predictive models. The second step follows the model selection paradigm and consists in choosing one predictor with good properties among all the predictors of the first step. We study our procedure for two different types of observations: causal Bernoulli shifts and bounded weakly dependent processes. In both cases, we give oracle inequalities: the risk of the chosen predictor is close to the best prediction risk in all predictive models that we consider. We apply our procedure for predictive models as linear predictors, neural networks predictors and nonparametric autoregressive predictors.

76 citations

Journal ArticleDOI
TL;DR: A general approach to prove the concentration of variational approximations of fractional posteriors of matrix completion and Gaussian VB is proposed.
Abstract: While Bayesian methods are extremely popular in statistics and machine learning, their application to massive data sets is often challenging, when possible at all. The classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family $\mathcal{F}$. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as collaborative filtering, image and video processing or NLP to name a few. However, despite nice results in practice, the theoretical properties of these approximations are not known. We propose a general oracle inequality that relates the quality of the VB approximation to the prior $\pi $ and to the structure of $\mathcal{F}$. We provide a simple condition that allows to derive rates of convergence from this oracle inequality. We apply our theory to various examples. First, we show that for parametric models with log-Lipschitz likelihood, Gaussian VB leads to efficient algorithms and consistent estimators. We then study a high-dimensional example: matrix completion, and a nonparametric example: density estimation.

66 citations

Posted Content
TL;DR: This work considers the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach and offers a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single- index recovery.
Abstract: Let $(\bX, Y)$ be a random pair taking values in $\mathbb R^p \times \mathbb R$. In the so-called single-index model, one has $Y=f^{\star}(\theta^{\star T}\bX)+\bW$, where $f^{\star}$ is an unknown univariate measurable function, $\theta^{\star}$ is an unknown vector in $\mathbb R^d$, and $W$ denotes a random noise satisfying $\mathbb E[\bW|\bX]=0$. The single-index model is known to offer a flexible way to model a variety of high-dimensional real-world phenomena. However, despite its relative simplicity, this dimension reduction scheme is faced with severe complications as soon as the underlying dimension becomes larger than the number of observations ("$p$ larger than $n$" paradigm). To circumvent this difficulty, we consider the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach. On the theoretical side, we offer a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single-index recovery. The proposed method is implemented by means of the reversible jump Markov chain Monte Carlo technique and its performance is compared with that of standard procedures.

62 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book ChapterDOI
31 Oct 2006

1,424 citations