Home
/
Authors
/
Pierre Alquier

Author

Pierre Alquier

Other affiliations: ENSAE ParisTech, University College Dublin, University of Paris

Bio: Pierre Alquier is an academic researcher from Université Paris-Saclay. The author has contributed to research in topics: Estimator & Bayesian probability. The author has an hindex of 23, co-authored 97 publications receiving 1597 citations. Previous affiliations of Pierre Alquier include ENSAE ParisTech & University College Dublin.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels

[...]

Pierre Alquier¹, Nial Friel², Richard G. Everitt³, Aidan Boland²•Institutions (3)

ENSAE ParisTech¹, University College Dublin², University of Reading³

01 Jan 2016-Statistics and Computing

TL;DR: In this article, the authors explore a variety of situations where it is possible to quantify how close the chain given by the transition kernel given by a Markov chain is to the chain generated by a transition kernel.

...read moreread less

Abstract: Monte Carlo algorithms often aim to draw from a distribution $$\pi $$? by simulating a Markov chain with transition kernel $$P$$P such that $$\pi $$? is invariant under $$P$$P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel $$P$$P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace $$P$$P by an approximation $$\hat{P}$$P^. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how `close' the chain given by the transition kernel $$\hat{P}$$P^ is to the chain given by $$P$$P. We apply these results to several examples from spatial statistics and network analysis.

...read moreread less

155 citations

Journal Article•

On the properties of variational approximations of Gibbs posteriors

[...]

Pierre Alquier¹, James Ridgway¹, Nicolas Chopin¹•Institutions (1)

ENSAE ParisTech¹

01 Jan 2016-Journal of Machine Learning Research

TL;DR: In this paper, the authors consider variational approximations of the Gibbs posterior, which are fast to compute and have the same rate of convergence as the original PAC-Bayesian procedure.

...read moreread less

Abstract: The PAC-Bayesian approach is a powerful set of techniques to derive nonasymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately often intractable. One may sample from it using Markov chain Monte Carlo, but this is usually too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. In addition, we show that, when the risk function is convex, a variational approximation can be obtained in polynomial time using a convex solver. We give finite sample oracle inequalities for the corresponding estimator. We specialize our results to several learning tasks (classification, ranking, matrix completion), discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.

...read moreread less

97 citations

Journal Article•DOI•

Model selection for weakly dependent time series forecasting

[...]

Pierre Alquier, Olivier Wintenberger

01 Aug 2012-Bernoulli

TL;DR: In this article, the authors propose a two-step procedure for predicting the next value of a stationary time series, where the first step follows machine learning theory paradigm and consists in determining a set of possible predictors as randomized estimators in (possibly numerous) different predictive models.

...read moreread less

Abstract: Observing a stationary time series, we propose a two-steps procedure for the prediction of its next value. The first step follows machine learning theory paradigm and consists in determining a set of possible predictors as randomized estimators in (possibly numerous) different predictive models. The second step follows the model selection paradigm and consists in choosing one predictor with good properties among all the predictors of the first step. We study our procedure for two different types of observations: causal Bernoulli shifts and bounded weakly dependent processes. In both cases, we give oracle inequalities: the risk of the chosen predictor is close to the best prediction risk in all predictive models that we consider. We apply our procedure for predictive models as linear predictors, neural networks predictors and nonparametric autoregressive predictors.

...read moreread less

76 citations

Journal Article•DOI•

Concentration of tempered posteriors and of their variational approximations

[...]

Pierre Alquier, James Ridgway

01 Jun 2020-Annals of Statistics

TL;DR: A general approach to prove the concentration of variational approximations of fractional posteriors of matrix completion and Gaussian VB is proposed.

...read moreread less

Abstract: While Bayesian methods are extremely popular in statistics and machine learning, their application to massive data sets is often challenging, when possible at all. The classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family $\mathcal{F}$. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as collaborative filtering, image and video processing or NLP to name a few. However, despite nice results in practice, the theoretical properties of these approximations are not known. We propose a general oracle inequality that relates the quality of the VB approximation to the prior $\pi $ and to the structure of $\mathcal{F}$. We provide a simple condition that allows to derive rates of convergence from this oracle inequality. We apply our theory to various examples. First, we show that for parametric models with log-Lipschitz likelihood, Gaussian VB leads to efficient algorithms and consistent estimators. We then study a high-dimensional example: matrix completion, and a nonparametric example: density estimation.

...read moreread less

66 citations

Posted Content•

Sparse single-index model

[...]

Pierre Alquier¹, Gérard Biau²•Institutions (2)

University College Dublin¹, Pierre-and-Marie-Curie University²

17 Jan 2011-arXiv: Statistics Theory

TL;DR: This work considers the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach and offers a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single- index recovery.

...read moreread less

Abstract: Let $(\bX, Y)$ be a random pair taking values in $\mathbb R^p \times \mathbb R$. In the so-called single-index model, one has $Y=f^{\star}(\theta^{\star T}\bX)+\bW$, where $f^{\star}$ is an unknown univariate measurable function, $\theta^{\star}$ is an unknown vector in $\mathbb R^d$, and $W$ denotes a random noise satisfying $\mathbb E[\bW|\bX]=0$. The single-index model is known to offer a flexible way to model a variety of high-dimensional real-world phenomena. However, despite its relative simplicity, this dimension reduction scheme is faced with severe complications as soon as the underlying dimension becomes larger than the number of observations ("$p$ larger than $n$" paradigm). To circumvent this difficulty, we consider the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach. On the theoretical side, we offer a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single-index recovery. The proposed method is implemented by means of the reversible jump Markov chain Monte Carlo technique and its performance is compared with that of standard procedures.

...read moreread less

62 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

On robust estimation of the location parameter

[...]

Frederick R. Forst

01 Jan 1980

3,652 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations