Open accessJournal Article

# In search of lost mixing time: adaptive Markov chain Monte Carlo schemes for Bayesian variable selection with very large p

02 Mar 2021-Biometrika (Oxford Academic)-Vol. 108, Iss: 1, pp 53-69
Abstract: The availability of datasets with large numbers of variables is rapidly increasing. The effective application of Bayesian variable selection methods for regression with these datasets has proved difficult since available Markov chain Monte Carlo methods do not perform well in typical problem sizes of interest. We propose new adaptive Markov chain Monte Carlo algorithms to address this shortcoming. The adaptive design of these algorithms exploits the observation that in large-p⁠, small-n settings, the majority of the p variables will be approximately uncorrelated a posteriori. The algorithms adaptively build suitable nonlocal proposals that result in moves with squared jumping distance significantly larger than standard methods. Their performance is studied empirically in high-dimensional problems and speed-ups of up to four orders of magnitude are observed.

Topics:
##### Citations
More

9 results found

Open accessJournal Article
Kolyan Ray1, Botond Szabo2Institutions (2)
Abstract: We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the desi...

Topics: Prior probability (58%), Bayes' theorem (56%), Bayesian inference (56%) ... show more

17 Citations

Open accessJournal Article
Kitty Wan1, Jim E. Griffin2Institutions (2)
Abstract: Bayesian variable selection is an important method for discovering variables which are most useful for explaining the variation in a response. The widespread use of this method has been restricted by the challenging computational problem of sampling from the corresponding posterior distribution. Recently, the use of adaptive Monte Carlo methods has been shown to lead to performance improvement over traditionally used algorithms in linear regression models. This paper looks at applying one of these algorithms (the adaptively scaled independence sampler) to logistic regression and accelerated failure time models. We investigate the use of this algorithm with data augmentation, Laplace approximation and the correlated pseudo-marginal method. The performance of the algorithms is compared on several genomic data sets.

3 Citations

Open accessPosted Content
14 Dec 2020-arXiv: Computation
Abstract: We propose the approximate Laplace approximation (ALA) to evaluate integrated likelihoods, a bottleneck in Bayesian model selection. The Laplace approximation (LA) is a popular tool that speeds up such computation and equips strong model selection properties. However, when the sample size is large or one considers many models the cost of the required optimizations becomes impractical. ALA reduces the cost to that of solving a least-squares problem for each model. Further, it enables efficient computation across models such as sharing pre-computed sufficient statistics and certain operations in matrix decompositions. We prove that in generalized (possibly non-linear) models ALA achieves a strong form of model selection consistency for a suitably-defined optimal model, at the same functional rates as exact computation. We consider fixed- and high-dimensional problems, group and hierarchical constraints, and the possibility that all models are misspecified. We also obtain ALA rates for Gaussian regression under non-local priors, an important example where the LA can be costly and does not consistently estimate the integrated likelihood. Our examples include non-linear regression, logistic, Poisson and survival models. We implement the methodology in the R package mombf.

Topics: Model selection (58%), Laplace's method (56%), Marginal likelihood (55%) ... show more

2 Citations

Open accessPosted Content
12 May 2021-arXiv: Methodology
Abstract: Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledge, this is the first high-dimensional result which rigorously shows that the mixing rate of informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation. Motivated by the theoretical analysis of our sampler, we further propose a new approach called "two-stage drift condition" to studying convergence rates of Markov chains on general state spaces, which can be useful for obtaining tight complexity bounds in high-dimensional settings. The practical advantages of our algorithm are illustrated by both simulation studies and real data analysis.

2 Citations

Open accessJournal Article
Abstract: We propose the approximate Laplace approximation (ALA) to evaluate integrated likelihoods, a bottleneck in Bayesian model selection. The Laplace approximation (LA) is a popular tool that speeds up such computation and equips strong model selection properties. However, when the sample size is large or one considers many models the cost of the required optimizations becomes impractical. ALA reduces the cost to that of solving a least-squares problem for each model. Further, it enables efficient computation across models such as sharing pre-computed sufficient statistics and certain operations in matrix decompositions. We prove that in generalized (possibly non-linear) models ALA achieves a strong form of model selection consistency for a suitably-defined optimal model, at the same functional rates as exact computation. We consider fixed- and high-dimensional problems, group and hierarchical constraints, and the possibility that all models are misspecified. We also obtain ALA rates for Gaussian regression under non-local priors, an important example where the LA can be costly and does not consistently estimate the integrated likelihood. Our examples include non-linear regression, logistic, Poisson and survival models. We implement the methodology in the R package mombf.

Topics: Model selection (58%), Laplace's method (56%), Marginal likelihood (55%) ... show more

1 Citations

##### References
More

46 results found

Open accessJournal ArticleDOI: 10.2307/3318737
01 Apr 2001-Bernoulli
Abstract: A proper choice of a proposal distribution for Markov chain Monte Carlo methods, for example for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence of the algorithm. In this paper we introduce an adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far. Due to the adaptive nature of the process, the AM algorithm is non-Markovian, but we establish here that it has the correct ergodic properties. We also include the results of our numerical tests, which indicate that the AM algorithm competes well with traditional Metropolis-Hastings algorithms, and demonstrate that the AM algorithm is easy to use in practical computation.

2,212 Citations

BookDOI: 10.1201/B18401
07 May 2015-
Abstract: Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

1,822 Citations

Open accessJournal Article
Abstract: This paper considers the problem of scaling the proposal distribution of a multidimensional random walk Metropolis algorithm in order to maximize the efficiency of the algorithm. The main result is a weak convergence result as the dimension of a sequence of target densities, n, converges to $\infty$. When the proposal variance is appropriately scaled according to n, the sequence of stochastic processes formed by the first component of each Markov chain converges to the appropriate limiting Langevin diffusion process. The limiting diffusion approximation admits a straightforward efficiency maximization problem, and the resulting asymptotically optimal policy is related to the asymptotic acceptance rate of proposed moves for the algorithm. The asymptotically optimal acceptance rate is 0.234 under quite general conditions. The main result is proved in the case where the target density has a symmetric product form. Extensions of the result are discussed.

1,639 Citations

Open accessJournal Article
01 Apr 1997-Statistica Sinica
Abstract: This paper describes and compares various hierarchical mixture prior formulations of variable selection uncertainty in normal linear regression models. These include the nonconjugate SSVS formulation of George and McCulloch (1993), as well as conjugate formulations which allow for analytical simplification. Hyperpa- rameter settings which base selection on practical significance, and the implications of using mixtures with point priors are discussed. Computational methods for pos- terior evaluation and exploration are considered. Rapid updating methods are seen to provide feasible methods for exhaustive evaluation using Gray Code sequencing in moderately sized problems, and fast Markov Chain Monte Carlo exploration in large problems. Estimation of normalization constants is seen to provide improved posterior estimates of individual model probabilities and the total visited probabil- ity. Various procedures are illustrated on simulated sample problems and on a real problem concerning the construction of financial index tracking portfolios.

1,206 Citations

Journal Article
Abstract: We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters during a run. Examples include the Adaptive Metropolis (AM) multivariate algorithm of Haario, Saksman, and Tamminen (2001), Metropolis-within-Gibbs algorithms for nonconjugate hierarchical models, regionally adjusted Metropolis algorithms, and logarithmic scalings. Computer simulations indicate that the algorithms perform very well compared to nonadaptive algorithms, even in high dimension.