scispace - formally typeset
Open accessJournal ArticleDOI: 10.1093/BIOMET/ASAA055

In search of lost mixing time: adaptive Markov chain Monte Carlo schemes for Bayesian variable selection with very large p

02 Mar 2021-Biometrika (Oxford Academic)-Vol. 108, Iss: 1, pp 53-69
Abstract: The availability of datasets with large numbers of variables is rapidly increasing. The effective application of Bayesian variable selection methods for regression with these datasets has proved difficult since available Markov chain Monte Carlo methods do not perform well in typical problem sizes of interest. We propose new adaptive Markov chain Monte Carlo algorithms to address this shortcoming. The adaptive design of these algorithms exploits the observation that in large-p⁠, small-n settings, the majority of the p variables will be approximately uncorrelated a posteriori. The algorithms adaptively build suitable nonlocal proposals that result in moves with squared jumping distance significantly larger than standard methods. Their performance is studied empirically in high-dimensional problems and speed-ups of up to four orders of magnitude are observed.

... read more

Topics: Markov chain Monte Carlo (72%)
Citations
  More

9 results found


Open accessJournal ArticleDOI: 10.1080/01621459.2020.1847121
Kolyan Ray1, Botond Szabo2Institutions (2)
Abstract: We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the desi...

... read more

Topics: Prior probability (58%), Bayes' theorem (56%), Bayesian inference (56%) ... show more

17 Citations


Open accessJournal ArticleDOI: 10.1007/S11222-020-09974-2
Kitty Wan1, Jim E. Griffin2Institutions (2)
Abstract: Bayesian variable selection is an important method for discovering variables which are most useful for explaining the variation in a response. The widespread use of this method has been restricted by the challenging computational problem of sampling from the corresponding posterior distribution. Recently, the use of adaptive Monte Carlo methods has been shown to lead to performance improvement over traditionally used algorithms in linear regression models. This paper looks at applying one of these algorithms (the adaptively scaled independence sampler) to logistic regression and accelerated failure time models. We investigate the use of this algorithm with data augmentation, Laplace approximation and the correlated pseudo-marginal method. The performance of the algorithms is compared on several genomic data sets.

... read more

Topics: Regression analysis (57%), Markov chain Monte Carlo (56%), Logistic regression (55%) ... show more

3 Citations


Open accessPosted Content
14 Dec 2020-arXiv: Computation
Abstract: We propose the approximate Laplace approximation (ALA) to evaluate integrated likelihoods, a bottleneck in Bayesian model selection. The Laplace approximation (LA) is a popular tool that speeds up such computation and equips strong model selection properties. However, when the sample size is large or one considers many models the cost of the required optimizations becomes impractical. ALA reduces the cost to that of solving a least-squares problem for each model. Further, it enables efficient computation across models such as sharing pre-computed sufficient statistics and certain operations in matrix decompositions. We prove that in generalized (possibly non-linear) models ALA achieves a strong form of model selection consistency for a suitably-defined optimal model, at the same functional rates as exact computation. We consider fixed- and high-dimensional problems, group and hierarchical constraints, and the possibility that all models are misspecified. We also obtain ALA rates for Gaussian regression under non-local priors, an important example where the LA can be costly and does not consistently estimate the integrated likelihood. Our examples include non-linear regression, logistic, Poisson and survival models. We implement the methodology in the R package mombf.

... read more

Topics: Model selection (58%), Laplace's method (56%), Marginal likelihood (55%) ... show more

2 Citations


Open accessPosted Content
12 May 2021-arXiv: Methodology
Abstract: Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledge, this is the first high-dimensional result which rigorously shows that the mixing rate of informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation. Motivated by the theoretical analysis of our sampler, we further propose a new approach called "two-stage drift condition" to studying convergence rates of Markov chains on general state spaces, which can be useful for obtaining tight complexity bounds in high-dimensional settings. The practical advantages of our algorithm are illustrated by both simulation studies and real data analysis.

... read more

Topics: Markov chain (55%), Mixing (physics) (54%), Markov chain Monte Carlo (53%) ... show more

2 Citations


Open accessJournal ArticleDOI: 10.1111/RSSB.12466
Abstract: We propose the approximate Laplace approximation (ALA) to evaluate integrated likelihoods, a bottleneck in Bayesian model selection. The Laplace approximation (LA) is a popular tool that speeds up such computation and equips strong model selection properties. However, when the sample size is large or one considers many models the cost of the required optimizations becomes impractical. ALA reduces the cost to that of solving a least-squares problem for each model. Further, it enables efficient computation across models such as sharing pre-computed sufficient statistics and certain operations in matrix decompositions. We prove that in generalized (possibly non-linear) models ALA achieves a strong form of model selection consistency for a suitably-defined optimal model, at the same functional rates as exact computation. We consider fixed- and high-dimensional problems, group and hierarchical constraints, and the possibility that all models are misspecified. We also obtain ALA rates for Gaussian regression under non-local priors, an important example where the LA can be costly and does not consistently estimate the integrated likelihood. Our examples include non-linear regression, logistic, Poisson and survival models. We implement the methodology in the R package mombf.

... read more

Topics: Model selection (58%), Laplace's method (56%), Marginal likelihood (55%) ... show more

1 Citations


References
  More

46 results found


Open accessJournal ArticleDOI: 10.2307/3318737
01 Apr 2001-Bernoulli
Abstract: A proper choice of a proposal distribution for Markov chain Monte Carlo methods, for example for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence of the algorithm. In this paper we introduce an adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far. Due to the adaptive nature of the process, the AM algorithm is non-Markovian, but we establish here that it has the correct ergodic properties. We also include the results of our numerical tests, which indicate that the AM algorithm competes well with traditional Metropolis-Hastings algorithms, and demonstrate that the AM algorithm is easy to use in practical computation.

... read more

Topics: Multiple-try Metropolis (64%), Metropolis light transport (62%), Rejection sampling (62%) ... show more

2,212 Citations


BookDOI: 10.1201/B18401
07 May 2015-
Abstract: Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

... read more

Topics: Elastic net regularization (66%), Lasso (statistics) (66%), Statistical model (57%) ... show more

1,822 Citations


Open accessJournal ArticleDOI: 10.1214/AOAP/1034625254
Abstract: This paper considers the problem of scaling the proposal distribution of a multidimensional random walk Metropolis algorithm in order to maximize the efficiency of the algorithm. The main result is a weak convergence result as the dimension of a sequence of target densities, n, converges to $\infty$. When the proposal variance is appropriately scaled according to n, the sequence of stochastic processes formed by the first component of each Markov chain converges to the appropriate limiting Langevin diffusion process. The limiting diffusion approximation admits a straightforward efficiency maximization problem, and the resulting asymptotically optimal policy is related to the asymptotic acceptance rate of proposed moves for the algorithm. The asymptotically optimal acceptance rate is 0.234 under quite general conditions. The main result is proved in the case where the target density has a symmetric product form. Extensions of the result are discussed.

... read more

Topics: Multiple-try Metropolis (59%), Weak convergence (59%), Random walk (58%) ... show more

1,639 Citations


Open accessJournal Article
01 Apr 1997-Statistica Sinica
Abstract: This paper describes and compares various hierarchical mixture prior formulations of variable selection uncertainty in normal linear regression models. These include the nonconjugate SSVS formulation of George and McCulloch (1993), as well as conjugate formulations which allow for analytical simplification. Hyperpa- rameter settings which base selection on practical significance, and the implications of using mixtures with point priors are discussed. Computational methods for pos- terior evaluation and exploration are considered. Rapid updating methods are seen to provide feasible methods for exhaustive evaluation using Gray Code sequencing in moderately sized problems, and fast Markov Chain Monte Carlo exploration in large problems. Estimation of normalization constants is seen to provide improved posterior estimates of individual model probabilities and the total visited probabil- ity. Various procedures are illustrated on simulated sample problems and on a real problem concerning the construction of financial index tracking portfolios.

... read more

Topics: Markov chain Monte Carlo (56%), Conjugate prior (54%), Prior probability (54%) ... show more

1,206 Citations


Journal ArticleDOI: 10.1198/JCGS.2009.06134
Abstract: We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters during a run. Examples include the Adaptive Metropolis (AM) multivariate algorithm of Haario, Saksman, and Tamminen (2001), Metropolis-within-Gibbs algorithms for nonconjugate hierarchical models, regionally adjusted Metropolis algorithms, and logarithmic scalings. Computer simulations indicate that the algorithms perform very well compared to nonadaptive algorithms, even in high dimension.

... read more

948 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20218
20201