scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 2022"


Journal ArticleDOI
TL;DR: In this paper, a new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints and provide mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions.
Abstract: A new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints. The methodology includes the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. This is done in a natural way by filling the gap among models and providing a smooth transition among them. The methodology provides mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions. Novel information criteria are proposed to help the user in choosing parameters. The interest of the proposed methodology is illustrated through simulation studies and a real-data application on COVID data.

6 citations


Journal ArticleDOI
TL;DR: In this article, a split point procedure based on the explicit likelihood was proposed to save time when searching for the best split for a given splitting variable, and a simulation study was performed to assess the computational gain when building GLM trees.
Abstract: Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.

3 citations


Journal ArticleDOI
TL;DR: In this article, a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters is developed.
Abstract: We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and is illustrated with synthetic and real data examples.

1 citations




DOI
TL;DR: An order-free multivariate scalable Bayesian modelling approach to smooth mortality (or incidence) risks of several diseases simultaneously to permit the analysis of big data sets and provides better results than a single multivariate model.

1 citations


Journal ArticleDOI
TL;DR: In this article, a generalized version of the Metropolis-adjusted Langevin algorithm (MALA) is proposed for Bayesian logistic regression, and the authors theoretically study the efficiency of the sampler by making use of the local and global balance concepts introduced in Zanella.
Abstract: We introduce a generalized version of the Metropolis-adjusted Langevin algorithm (MALA). The informed proposal distribution of this new sampler features two tuning parameters: the usual step size parameter $$\sigma ^2$$ and an interpolation parameter $$\gamma $$ that may be adjusted to accommodate the dimension of the target distribution. We theoretically study the efficiency of the sampler by making use of the local- and global-balance concepts introduced in Zanella (JASA 115:852–865, 2020) and provide efficient tuning guidelines that work well with a variety of target distributions. Although the usual MALA ( $$\gamma =1$$ ) is shown to be optimal for infinite-dimensional targets, in practice, the generalized MALA ( $$1<\gamma \le 2$$ ) remains the most appealing option, even in high-dimensional contexts. Simulation studies and numerical experiments are presented to illustrate our findings. We apply the new sampler to a Bayesian logistic regression context and show that its efficiency compares favourably to competing algorithms.


Journal ArticleDOI
TL;DR: In this article, the authors propose to accelerate Hamiltonian and Lagrangian Monte Carlo algorithms by coupling them with Gaussian processes for emulation of the log unnormalised posterior distribution, and provide proofs of detailed balance with respect to the exact posterior distribution for these algorithms, and validate the correctness of the samplers implementation by Geweke consistency tests.
Abstract: We propose to accelerate Hamiltonian and Lagrangian Monte Carlo algorithms by coupling them with Gaussian processes for emulation of the log unnormalised posterior distribution. We provide proofs of detailed balance with respect to the exact posterior distribution for these algorithms, and validate the correctness of the samplers’ implementation by Geweke consistency tests. We implement these algorithms in a delayed acceptance (DA) framework, and investigate whether the DA scheme can offer computational gains over the standard algorithms. A comparative evaluation study is carried out to assess the performance of the methods on a series of models described by differential equations, including a real-world application of a 1D fluid-dynamics model of the pulmonary blood circulation. The aim is to identify the algorithm which gives the best trade-off between accuracy and computational efficiency, to be used in nonlinear DE models, which are computationally onerous due to repeated numerical integrations in a Bayesian analysis. Results showed no advantage of the DA scheme over the standard algorithms with respect to several efficiency measures based on the effective sample size for most methods and DE models considered. These gradient-driven algorithms register a high acceptance rate, thus the number of expensive forward model evaluations is not significantly reduced by the first emulator-based stage of DA. Additionally, the Lagrangian Dynamical Monte Carlo and Riemann Manifold Hamiltonian Monte Carlo tended to register the highest efficiency (in terms of effective sample size normalised by the number of forward model evaluations), followed by the Hamiltonian Monte Carlo, and the No U-turn sampler tended to be the least efficient.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed the use of recursive algorithms to update a sequence of VB posterior approximations in an online, time series setting, with the computation of each posterior update requiring only the data observed since the previous update.
Abstract: Variational Bayesian (VB) methods produce posterior inference in a time frame considerably smaller than traditional Markov Chain Monte Carlo approaches. Although the VB posterior is an approximation, it has been shown to produce good parameter estimates and predicted values when a rich classes of approximating distributions are considered. In this paper, we propose the use of recursive algorithms to update a sequence of VB posterior approximations in an online, time series setting, with the computation of each posterior update requiring only the data observed since the previous update. We show how importance sampling can be incorporated into online variational inference allowing the user to trade accuracy for a substantial increase in computational speed. The proposed methods and their properties are detailed in two separate simulation studies. Additionally, two empirical illustrations are provided, including one where a Dirichlet Process Mixture model with a novel posterior dependence structure is repeatedly updated in the context of predicting the future behaviour of vehicles on a stretch of the US Highway 101.