Showing papers in "Statistics and Computing in 2022"

PDF

Open Access

Journal Article•DOI•

Constrained parsimonious model-based clustering.

[...]

Luis Angel García-Escudero¹, Agustín Mayo-Iscar¹, Marco Riani²•Institutions (2)

University of Valladolid¹, University of Parma²

01 Jan 2022-Statistics and Computing

TL;DR: In this paper, a new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints and provide mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions.

...read moreread less

Abstract: A new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints. The methodology includes the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. This is done in a natural way by filling the gap among models and providing a smooth transition among them. The methodology provides mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions. Novel information criteria are proposed to help the user in choosing parameters. The interest of the proposed methodology is illustrated through simulation studies and a real-data application on COVID data.

...read moreread less

6 citations

Journal Article•DOI•

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

[...]

Christophe Dutang¹, Quentin Guibert¹•Institutions (1)

Paris Dauphine University¹

15 Feb 2022-Statistics and Computing

TL;DR: In this article, a split point procedure based on the explicit likelihood was proposed to save time when searching for the best split for a given splitting variable, and a simulation study was performed to assess the computational gain when building GLM trees.

...read moreread less

Abstract: Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.

...read moreread less

3 citations

Journal Article•DOI•

Erlang mixture modeling for Poisson process intensities

[...]

Hyotae Kim¹, Athanasios Kottas¹•Institutions (1)

University of California, Santa Cruz¹

15 Feb 2022-Statistics and Computing

TL;DR: In this article, a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters is developed.

...read moreread less

Abstract: We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and is illustrated with synthetic and real data examples.

...read moreread less

1 citations

Journal Article•DOI•

Efficient estimation of multiple expectations with the same sample by adaptive importance sampling and control variates

[...]

Julien Demange-Chryst, François Bachoc, Jérôme Morio

30 Nov 2022-Statistics and Computing

1 citations

Journal Article•DOI•

Unlabelled landmark matching via Bayesian data selection, and application to cell matching across imaging modalities

[...]

Jessica E Forsyth, Ali Al-Anbaki, Berenika Plusa, Simon L. Cotter

30 May 2022-Statistics and Computing

1 citations

DOI•

High-dimensional order-free multivariate spatial disease mapping

[...]

Aritz Adin, Tomás Goicoa, M. Dolores Ugarte

26 Oct 2022-Statistics and Computing

TL;DR: An order-free multivariate scalable Bayesian modelling approach to smooth mortality (or incidence) risks of several diseases simultaneously to permit the analysis of big data sets and provides better results than a single multivariate model.

...read moreread less

1 citations

Journal Article•DOI•

MALA with annealed proposals: a generalization of locally and globally balanced proposal distributions

[...]

Gabriel Boisvert-Beaudry¹, Mylène Bédard¹•Institutions (1)

Université de Montréal¹

15 Feb 2022-Statistics and Computing

TL;DR: In this article, a generalized version of the Metropolis-adjusted Langevin algorithm (MALA) is proposed for Bayesian logistic regression, and the authors theoretically study the efficiency of the sampler by making use of the local and global balance concepts introduced in Zanella.

...read moreread less

Abstract: We introduce a generalized version of the Metropolis-adjusted Langevin algorithm (MALA). The informed proposal distribution of this new sampler features two tuning parameters: the usual step size parameter $$\sigma ^2$$ and an interpolation parameter $$\gamma $$ that may be adjusted to accommodate the dimension of the target distribution. We theoretically study the efficiency of the sampler by making use of the local- and global-balance concepts introduced in Zanella (JASA 115:852–865, 2020) and provide efficient tuning guidelines that work well with a variety of target distributions. Although the usual MALA ( $$\gamma =1$$ ) is shown to be optimal for infinite-dimensional targets, in practice, the generalized MALA ( $$1<\gamma \le 2$$ ) remains the most appealing option, even in high-dimensional contexts. Simulation studies and numerical experiments are presented to illustrate our findings. We apply the new sampler to a Bayesian logistic regression context and show that its efficiency compares favourably to competing algorithms.

...read moreread less

Journal Article•DOI•

Correction to: A two-stage Bayesian semiparametricmodel for novelty detection with robust prior information

[...]

Francesco Denti, Andrea Cappozzo, Francesca Greselin

10 Feb 2022-Statistics and Computing

Journal Article•DOI•

Emulation-accelerated Hamiltonian Monte Carlo algorithms for parameter estimation and uncertainty quantification in differential equation models

[...]

L. Mihaela Paun¹, Dirk Husmeier¹•Institutions (1)

University of Glasgow¹

15 Feb 2022-Statistics and Computing

TL;DR: In this article, the authors propose to accelerate Hamiltonian and Lagrangian Monte Carlo algorithms by coupling them with Gaussian processes for emulation of the log unnormalised posterior distribution, and provide proofs of detailed balance with respect to the exact posterior distribution for these algorithms, and validate the correctness of the samplers implementation by Geweke consistency tests.

...read moreread less

Abstract: We propose to accelerate Hamiltonian and Lagrangian Monte Carlo algorithms by coupling them with Gaussian processes for emulation of the log unnormalised posterior distribution. We provide proofs of detailed balance with respect to the exact posterior distribution for these algorithms, and validate the correctness of the samplers’ implementation by Geweke consistency tests. We implement these algorithms in a delayed acceptance (DA) framework, and investigate whether the DA scheme can offer computational gains over the standard algorithms. A comparative evaluation study is carried out to assess the performance of the methods on a series of models described by differential equations, including a real-world application of a 1D fluid-dynamics model of the pulmonary blood circulation. The aim is to identify the algorithm which gives the best trade-off between accuracy and computational efficiency, to be used in nonlinear DE models, which are computationally onerous due to repeated numerical integrations in a Bayesian analysis. Results showed no advantage of the DA scheme over the standard algorithms with respect to several efficiency measures based on the effective sample size for most methods and DE models considered. These gradient-driven algorithms register a high acceptance rate, thus the number of expensive forward model evaluations is not significantly reduced by the first emulator-based stage of DA. Additionally, the Lagrangian Dynamical Monte Carlo and Riemann Manifold Hamiltonian Monte Carlo tended to register the highest efficiency (in terms of effective sample size normalised by the number of forward model evaluations), followed by the Hamiltonian Monte Carlo, and the No U-turn sampler tended to be the least efficient.

...read moreread less

Journal Article•DOI•

Updating Variational Bayes: fast sequential posterior inference

[...]

Nathaniel Tomasetti, Catherine S. Forbes¹, Anastasios Panagiotelis²•Institutions (2)

Monash University¹, University of Sydney²

01 Feb 2022-Statistics and Computing

TL;DR: In this paper, the authors proposed the use of recursive algorithms to update a sequence of VB posterior approximations in an online, time series setting, with the computation of each posterior update requiring only the data observed since the previous update.

...read moreread less

Abstract: Variational Bayesian (VB) methods produce posterior inference in a time frame considerably smaller than traditional Markov Chain Monte Carlo approaches. Although the VB posterior is an approximation, it has been shown to produce good parameter estimates and predicted values when a rich classes of approximating distributions are considered. In this paper, we propose the use of recursive algorithms to update a sequence of VB posterior approximations in an online, time series setting, with the computation of each posterior update requiring only the data observed since the previous update. We show how importance sampling can be incorporated into online variational inference allowing the user to trade accuracy for a substantial increase in computational speed. The proposed methods and their properties are detailed in two separate simulation studies. Additionally, two empirical illustrations are provided, including one where a Dirichlet Process Mixture model with a novel posterior dependence structure is repeatedly updated in the context of predicting the future behaviour of vehicles on a stretch of the US Highway 101.

...read moreread less

DOI•

Correction to: A two-stage Bayesian semiparametricmodel for novelty detection with robust prior information

[...]

Francesco Denti, Andrea Cappozzo, Francesca Greselin

10 Feb 2022-Statistics and Computing

Book•DOI•

Applied Time Series Analysis and Forecasting with Python

[...]

Chang-Quan Huang, Alla Petukhina

01 Jan 2022-Statistics and Computing