Showing papers in &quot;Statistics and Computing in 2019&quot;

A probabilistic model for the numerical solution of initial value problems

TL;DR: In this work, deep Gaussian mixture models (DGMM) are introduced and discussed, a network of multiple layers of latent variables where, at each layer, the variables follow a mixture of Gaussian distributions.

...read moreread less

Abstract: Deep learning is a hierarchical inference method formed by subsequent multiple layers of learning able to more efficiently describe complex relationships. In this work, deep Gaussian mixture models (DGMM) are introduced and discussed. A DGMM is a network of multiple layers of latent variables, where, at each layer, the variables follow a mixture of Gaussian distributions. Thus, the deep mixture model consists of a set of nested mixtures of linear models, which globally provide a nonlinear model able to describe the data in a very flexible way. In order to avoid overparameterized solutions, dimension reduction by factor models can be applied at each layer of the architecture, thus resulting in deep mixtures of factor analyzers.

...read moreread less

70 citations

Journal Article•DOI•

[...]

Michael Schober¹, Simo Särkkä², Philipp Hennig¹•Institutions (2)

Max Planck Society¹, Aalto University²

15 Jan 2019-Statistics and Computing

TL;DR: In this paper, the authors provide a new view of probabilistic ODE solvers as active inference agents operating on stochastic differential equation models that estimate the unknown initial value problem (IVP) solution from approximate observations of the solution derivative, as provided by the ODE dynamics.

...read moreread less

Abstract: We study connections between ordinary differential equation (ODE) solvers and probabilistic regression methods in statistics. We provide a new view of probabilistic ODE solvers as active inference agents operating on stochastic differential equation models that estimate the unknown initial value problem (IVP) solution from approximate observations of the solution derivative, as provided by the ODE dynamics. Adding to this picture, we show that several multistep methods of Nordsieck form can be recasted as Kalman filtering on q-times integrated Wiener processes. Doing so provides a family of IVP solvers that return a Gaussian posterior measure, rather than a point estimate. We show that some such methods have low computational overhead, nontrivial convergence order, and that the posterior has a calibrated concentration rate. Additionally, we suggest a step size adaptation algorithm which completes the proposed method to a practically useful implementation, which we experimentally evaluate using a representative set of standard codes in the DETEST benchmark set.

...read moreread less

60 citations

Journal Article•DOI•

A Modern Retrospective on Probabilistic Numerics

[...]

Chris J. Oates¹, Chris J. Oates², Timothy Sullivan³, Timothy Sullivan⁴•Institutions (4)

The Turing Institute¹, Newcastle University², Free University of Berlin³, Zuse Institute Berlin⁴

04 Oct 2019-Statistics and Computing

TL;DR: This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment.

...read moreread less

Abstract: This article attempts to place the emergence of probabilistic numerics as a mathematical–statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul$$'$$din and Larkin in the 1960s and how their pioneering early ideas have reached a degree of maturity in the intervening period, mediated by paradigms such as average-case analysis and information-based complexity. We provide a subjective assessment of the state of research in probabilistic numerics and highlight some difficulties to be addressed by future works.

...read moreread less

42 citations

Journal Article•DOI•

Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective

[...]

Filip Tronarp¹, Hans Kersting², Simo Särkkä¹, Philipp Hennig²•Institutions (2)

Aalto University¹, Max Planck Society²

18 Sep 2019-Statistics and Computing

TL;DR: In this paper, the authors formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process regression with nonlinear measurement functions.

...read moreread less

Abstract: We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with nonlinear measurement functions. This is achieved by defining the measurement sequence to consist of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP—which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a nonlinear Bayesian filtering problem and all widely used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers that are formulated in terms of generating synthetic measurements of the gradient field come out as specific approximations. Based on the nonlinear Bayesian filtering problem posed in this paper, we develop novel Gaussian solvers for which we establish favourable stability properties. Additionally, non-Gaussian approximations to the filtering problem are derived by the particle filter approach. The resulting solvers are compared with other probabilistic solvers in illustrative experiments.

...read moreread less

38 citations

Journal Article•DOI•

Clustering time series by linear dependency

[...]

Andrés M. Alonso¹, Daniel Peña¹•Institutions (1)

Charles III University of Madrid¹

Resample-smoothing of Voronoi intensity estimators

TL;DR: A new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation, which can be applied to large data sets and useful to find groups in dynamic factor models.

...read moreread less

Abstract: We present a new way to find clusters in large vectors of time series by using a measure of similarity between two time series, the generalized cross correlation. This measure compares the determinant of the correlation matrix until some lag k of the bivariate vector with those of the two univariate time series. A matrix of similarities among the series based on this measure is used as input of a clustering algorithm. The procedure is automatic, can be applied to large data sets and it is useful to find groups in dynamic factor models. The cluster method is illustrated with some Monte Carlo experiments and a real data example.

...read moreread less

32 citations

Journal Article•DOI•

[...]

Mehdi Moradi¹, Ottmar Cronie², Ege Rubak³, Raphaël Lachièze-Rey⁴, Jorge Mateu, Adrian Baddeley⁵ - Show less +2 more•Institutions (5)

James I University¹, Umeå University², Aalborg University³, Paris Descartes University⁴, Curtin University⁵

19 Jan 2019-Statistics and Computing

TL;DR: This work proposes to apply an additional smoothing operation to the Voronoi estimator, based on resampling the point pattern by independent random thinning, and shows that the resample-smoothing technique improves the estimation substantially.

...read moreread less

Abstract: Voronoi estimators are non-parametric and adaptive estimators of the intensity of a point process. The intensity estimate at a given location is equal to the reciprocal of the size of the Voronoi/Dirichlet cell containing that location. Their major drawback is that they tend to paradoxically under-smooth the data in regions where the point density of the observed point pattern is high, and over-smooth where the point density is low. To remedy this behaviour, we propose to apply an additional smoothing operation to the Voronoi estimator, based on resampling the point pattern by independent random thinning. Through a simulation study we show that our resample-smoothing technique improves the estimation substantially. In addition, we study statistical properties such as unbiasedness and variance, and propose a rule-of-thumb and a data-driven cross-validation approach to choose the amount of smoothing to apply. Finally we apply our proposed intensity estimation scheme to two datasets: locations of pine saplings (planar point pattern) and motor vehicle traffic accidents (linear network point pattern).

...read moreread less

32 citations

Journal Article•DOI•

Model-based clustering with sparse covariance matrices

[...]

Michael Fop¹, Thomas Brendan Murphy¹, Luca Scrucca²•Institutions (2)

University College Dublin¹, University of Perugia²

Robust clustering tools based on optimal transportation

TL;DR: In this paper, a penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge.

...read moreread less

Abstract: Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. Model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality. The general methodology for model-based clustering with sparse covariance matrices is implemented in the R package mixggm, available on CRAN.

...read moreread less

30 citations

Journal Article•DOI•

[...]

E. del Barrio¹, Juan A. Cuesta-Albertos², Carlos Matrán¹, Agustín Mayo-Iscar¹•Institutions (2)

University of Valladolid¹, University of Cantabria²

Penalized estimation of directed acyclic graphs from discrete data

TL;DR: In this article, a trimmed k-barycenter is proposed to trim the most discrepant distributions, which results in a gain in stability and robustness, highly convenient in this setting.

...read moreread less

Abstract: A robust clustering method for probabilities in Wasserstein space is introduced. This new ‘trimmed k-barycenters’ approach relies on recent results on barycenters in Wasserstein space that allow intensive computation, as required by clustering algorithms to be feasible. The possibility of trimming the most discrepant distributions results in a gain in stability and robustness, highly convenient in this setting. As a remarkable application, we consider a parallelized clustering setup in which each of m units processes a portion of the data, producing a clustering report, encoded as k probabilities. We prove that the trimmed k-barycenter of the $$m\times k$$ reports produces a consistent aggregation which we consider the result of a ‘wide consensus’. We also prove that a weighted version of trimmed k-means algorithms based on k-barycenters in the space of Wasserstein keeps the descending character of the concentration step, guaranteeing convergence to local minima. We illustrate the methodology with simulated and real data examples. These include clustering populations by age distributions and analysis of cytometric data.

...read moreread less

Journal Article•DOI•

[...]

Jiaying Gu¹, Fei Fu¹, Qing Zhou¹•Institutions (1)

University of California, Los Angeles¹

15 Jan 2019-Statistics and Computing

TL;DR: A maximum penalized likelihood method to tackle Bayesian networks from discrete or categorical data, which model the conditional distribution of a node given its parents by multi-logit regression instead of the commonly used multinomial distribution.

...read moreread less

Abstract: Bayesian networks, with structure given by a directed acyclic graph (DAG), are a popular class of graphical models. However, learning Bayesian networks from discrete or categorical data is particularly challenging, due to the large parameter space and the difficulty in searching for a sparse structure. In this article, we develop a maximum penalized likelihood method to tackle this problem. Instead of the commonly used multinomial distribution, we model the conditional distribution of a node given its parents by multi-logit regression, in which an edge is parameterized by a set of coefficient vectors with dummy variables encoding the levels of a node. To obtain a sparse DAG, a group norm penalty is employed, and a blockwise coordinate descent algorithm is developed to maximize the penalized likelihood subject to the acyclicity constraint of a DAG. When interventional data are available, our method constructs a causal network, in which a directed edge represents a causal relation. We apply our method to various simulated and real data sets. The results show that our method is very competitive, compared to many existing methods, in DAG estimation from both interventional and high-dimensional observational data.

...read moreread less

Journal Article•DOI•

Strong convergence rates of probabilistic integrators for ordinary differential equations

[...]

Han Cheng Lie¹, Andrew M. Stuart², Timothy Sullivan³, Timothy Sullivan⁴•Institutions (4)

University of Potsdam¹, California Institute of Technology², Free University of Berlin³, Zuse Institute Berlin⁴

Decision-making under uncertainty: using MLMC for efficient estimation of EVPPI

TL;DR: In this paper, the authors extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al. (2017), to establish mean-square convergence in the uniform norm on discrete-or continuous-time solutions under relaxed regularity assumptions on the driving vector fields and their induced flows.

...read moreread less

Abstract: Probabilistic integration of a continuous dynamical system is a way of systematically introducing discretisation error, at scales no larger than errors introduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al. (Stat Comput 27(4):1065–1082, 2017. https://doi.org/10.1007/s11222-016-9671-0), to establish mean-square convergence in the uniform norm on discrete- or continuous-time solutions under relaxed regularity assumptions on the driving vector fields and their induced flows. Specifically, we show that randomised high-order integrators for globally Lipschitz flows and randomised Euler integrators for dissipative vector fields with polynomially bounded local Lipschitz constants all have the same mean-square convergence rate as their deterministic counterparts, provided that the variance of the integration noise is not of higher order than the corresponding deterministic integrator. These and similar results are proven for probabilistic integrators where the random perturbations may be state-dependent, non-Gaussian, or non-centred random variables.

...read moreread less

Journal Article•DOI•

[...]

Michael B. Giles¹, Takashi Goda²•Institutions (2)

University of Oxford¹, University of Tokyo²

Irreversible samplers from jump and continuous Markov processes

TL;DR: A very efficient approach to the Monte Carlo estimation of the expected value of partial perfect information (EVPPI) that measures the average benefit of knowing the value of a subset of uncertain parameters involved in a decision model.

...read moreread less

Abstract: In this paper, we develop a very efficient approach to the Monte Carlo estimation of the expected value of partial perfect information (EVPPI) that measures the average benefit of knowing the value of a subset of uncertain parameters involved in a decision model. The calculation of EVPPI is inherently a nested expectation problem, with an outer expectation with respect to one random variable X and an inner conditional expectation with respect to the other random variable Y. We tackle this problem by using a multilevel Monte Carlo (MLMC) method (Giles in Oper Res 56(3): 607–617, 2008) in which the number of inner samples for Y increases geometrically with level, so that the accuracy of estimating the inner conditional expectation improves and the cost also increases with level. We construct an antithetic MLMC estimator and provide sufficient assumptions on a decision model under which the antithetic property of the estimator is well exploited, and consequently a root-mean-square accuracy of $${\varepsilon }$$ can be achieved at a cost of $$O({\varepsilon }^{-2})$$ . Numerical results confirm the considerable computational savings compared to the standard, nested Monte Carlo method for some simple test cases and a more realistic medical application.

...read moreread less

Journal Article•DOI•

[...]

Yi-An Ma¹, Emily B. Fox², Tianqi Chen², Lei Wu³•Institutions (3)

University of California¹, University of Washington², Peking University³

GPU-accelerated Gibbs sampling: a case study of the Horseshoe Probit model

TL;DR: This paper shows how the previously proposed MALA method can be extended to exploit irreversible stochastic dynamics as proposal distributions in the I-Jump sampler and explores how irreversibility can increase the efficiency of the samplers in different situations.

...read moreread less

Abstract: In this paper, we propose irreversible versions of the Metropolis–Hastings (MH) and Metropolis-adjusted Langevin algorithm (MALA) with a main focus on the latter. For the former, we show how one can simply switch between different proposal and acceptance distributions upon rejection to obtain an irreversible jump sampler (I-Jump). The resulting algorithm has a simple implementation akin to MH, but with the demonstrated benefits of irreversibility. We then show how the previously proposed MALA method can also be extended to exploit irreversible stochastic dynamics as proposal distributions in the I-Jump sampler. Our experiments explore how irreversibility can increase the efficiency of the samplers in different situations.

...read moreread less

Journal Article•DOI•

[...]

Alexander Terenin¹, Shawfeng Dong², David Draper²•Institutions (2)

Imperial College London¹, University of California, Santa Cruz²

01 Mar 2019-Statistics and Computing

TL;DR: Gibbs sampling is a widely used Markov chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences.

...read moreread less

Abstract: Gibbs sampling is a widely used Markov chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Many implementations of MCMC methods do not extend easily to parallel computing environments, as their inherently sequential nature incurs a large synchronization cost. In the case study illustrated by this paper, we show how to do Gibbs sampling in a fully data-parallel manner on a graphics processing unit, for a large class of exchangeable models that admit latent variable representations. Our approach takes a systems perspective, with emphasis placed on efficient use of compute hardware. We demonstrate our method on a Horseshoe Probit regression model and find that our implementation scales effectively to thousands of predictors and millions of data points simultaneously.

...read moreread less

Journal Article•DOI•

Double-Parallel Monte Carlo for Bayesian analysis of big data

[...]

Jingnan Xue¹, Faming Liang²•Institutions (2)

Texas A&M University¹, Purdue University²

Hierarchical-block conditioning approximations for high-dimensional multivariate normal probabilities

TL;DR: The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior and is coined as “Double-Parallel Monte Carlo.”

...read moreread less

Abstract: This paper proposes a simple, practical, and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since this algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as "Double-Parallel Monte Carlo." The validity of the proposed algorithm is justified mathematically and numerically.

...read moreread less

Journal Article•DOI•

[...]

Jian Cao¹, Marc G. Genton¹, David E. Keyes¹, George Turkiyyah¹•Institutions (1)

King Abdullah University of Science and Technology¹

01 May 2019-Statistics and Computing

TL;DR: A new method to estimate large-scale multivariate normal probabilities that combines a hierarchical representation with processing of the covariance matrix that decomposes the n-dimensional problem into a sequence of smaller m-dimensional ones and introduces an inexpensive block reordering strategy to provide improved accuracy in the overall probability computation.

...read moreread less

Abstract: This paper presents a new method to estimate large-scale multivariate normal probabilities. The approach combines a hierarchical representation with processing of the covariance matrix that decomposes the n-dimensional problem into a sequence of smaller m-dimensional ones. It also includes a d-dimensional conditioning method that further decomposes the m-dimensional problems into smaller d-dimensional problems. The resulting two-level hierarchical-block conditioning method requires Monte Carlo simulations to be performed only in d dimensions, with $$d \ll n$$ , and allows the complexity of the algorithm’s major cost to be $$O(n \log n)$$ . The run-time cost of the method depends on two parameters, m and d, where m represents the diagonal block size and controls the sizes of the blocks of the covariance matrix that are replaced by low-rank approximations, and d allows a trade-off of accuracy for expensive computations in the evaluation of the probabilities of m-dimensional blocks. We also introduce an inexpensive block reordering strategy to provide improved accuracy in the overall probability computation. The downside of this method, as with other such conditioning approximations, is the absence of an internal estimate of its error to use in tuning the approximation. Numerical simulations on problems from 2D spatial statistics with dimensions up to 16,384 indicate that the algorithm achieves a $$1\%$$ error level and improves the run time over a one-level hierarchical Quasi-Monte Carlo method by a factor between 10 and 15.

...read moreread less

Journal Article•DOI•

Optimal Monte Carlo integration on closed manifolds

[...]

Martin Ehler¹, Manuel Gräf², Chris J. Oates³•Institutions (3)

University of Vienna¹, Austrian Academy of Sciences², Newcastle University³

Langevin diffusions on the torus: estimation and applications

TL;DR: In this article, it was shown that the worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as n − 1/2 − 2 − 1 2 − 2, where n is the number of points in the kernel Hilbert space.

...read moreread less

Abstract: The worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as $$n^{-1/2}$$. However, the re-weighting of random points, as exemplified in the Bayesian Monte Carlo method, can sometimes be used to improve the convergence order. This paper contributes general theoretical results for Sobolev spaces on closed Riemannian manifolds, where we verify that such re-weighting yields optimal approximation rates up to a logarithmic factor. We also provide numerical experiments matching the theoretical results for some Sobolev spaces on the sphere $${\mathbb {S}}^2$$ and on the Grassmannian manifold $${\mathcal {G}}_{2,4}$$. Our theoretical findings also cover function spaces on more general sets such as the unit ball, the cube, and the simplex.

...read moreread less

Journal Article•DOI•

[...]

Eduardo García-Portugués¹, Eduardo García-Portugués², Michael Sørensen¹, Kanti V. Mardia³, Kanti V. Mardia⁴, Thomas Hamelryck¹ - Show less +2 more•Institutions (4)

University of Copenhagen¹, Charles III University of Madrid², University of Leeds³, University of Oxford⁴

Bayesian nonparametric spectral density estimation using B-spline priors

TL;DR: In this article, Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics are considered as toroidal analogues of the Ornstein-Uhlenbeck process and their likelihood function is a product of transition densities with no analytical expression.

...read moreread less

Abstract: We introduce stochastic models for continuous-time evolution of angles and develop their estimation. We focus on studying Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics, since such diffusions can be regarded as toroidal analogues of the Ornstein–Uhlenbeck process. Their likelihood function is a product of transition densities with no analytical expression, but that can be calculated by solving the Fokker–Planck equation numerically through adequate schemes. We propose three approximate likelihoods that are computationally tractable: (i) a likelihood based on the stationary distribution; (ii) toroidal adaptations of the Euler and Shoji–Ozaki pseudo-likelihoods; (iii) a likelihood based on a specific approximation to the transition density of the wrapped normal process. A simulation study compares, in dimensions one and two, the approximate transition densities to the exact ones, and investigates the empirical performance of the approximate likelihoods. Finally, two diffusions are used to model the evolution of the backbone angles of the protein G (PDB identifier 1GB1) during a molecular dynamics simulation. The software package sdetorus implements the estimation methods and applications presented in the paper.

...read moreread less

Journal Article•DOI•

[...]

M. C. Edwards¹, M. C. Edwards², Renate Meyer², Nelson Christensen³, Nelson Christensen¹ - Show less +1 more•Institutions (3)

Carleton College¹, University of Auckland², Artemis³

Fast automatic Bayesian cubature using lattice sampling

TL;DR: The Bayesian nonparametric prior based on a mixture of B-spline distributions is specified and provides more accurate Monte Carlo estimates in terms of $$L_1$$L1-error and uniform coverage probabilities than the Bernstein polynomial prior.

...read moreread less

Abstract: We present a new Bayesian nonparametric approach to estimating the spectral density of a stationary time series. A nonparametric prior based on a mixture of B-spline distributions is specified and can be regarded as a generalization of the Bernstein polynomial prior of Petrone (Scand J Stat 26:373–393, 1999a; Can J Stat 27:105–126, 1999b) and Choudhuri et al. (J Am Stat Assoc 99(468):1050–1059, 2004). Whittle’s likelihood approximation is used to obtain the pseudo-posterior distribution. This method allows for a data-driven choice of the number of mixture components and the location of knots. Posterior samples are obtained using a Metropolis-within-Gibbs Markov chain Monte Carlo algorithm, and mixing is improved using parallel tempering. We conduct a simulation study to demonstrate that for complicated spectral densities, the B-spline prior provides more accurate Monte Carlo estimates in terms of $$L_1$$ -error and uniform coverage probabilities than the Bernstein polynomial prior. We apply the algorithm to annual mean sunspot data to estimate the solar cycle. Finally, we demonstrate the algorithm’s ability to estimate a spectral density with sharp features, using real gravitational wave detector data from LIGO’s sixth science run, recoloured to match the Advanced LIGO target sensitivity.

...read moreread less

Journal Article•DOI•

[...]

R. Jagadeeswaran¹, Fred J. Hickernell¹•Institutions (1)

Illinois Institute of Technology¹

Selection of sparse vine copulas in high dimensions with the Lasso

TL;DR: In this paper, the integrand is an instance of a Gaussian stochastic process parameterized by a constant mean and a covariance kernel defined by a scale parameter times a parameterized function specifying how the integrant values at two different points in the domain are related.

...read moreread less

Abstract: Automatic cubatures approximate integrals to user-specified error tolerances. For high-dimensional problems, it is difficult to adaptively change the sampling pattern, but one can automatically determine the sample size, n, given a reasonable, fixed sampling pattern. We take this approach here using a Bayesian perspective. We postulate that the integrand is an instance of a Gaussian stochastic process parameterized by a constant mean and a covariance kernel defined by a scale parameter times a parameterized function specifying how the integrand values at two different points in the domain are related. These hyperparameters are inferred or integrated out using integrand values via one of three techniques: empirical Bayes, full Bayes, or generalized cross-validation. The sample size, n, is increased until the half-width of the credible interval for the Bayesian posterior mean is no greater than the error tolerance. The process outlined above typically requires a computational cost of $$O(N_{\text {opt}}n^3)$$, where $$N_{\text {opt}}$$ is the number of optimization steps required to identify the hyperparameters. Our innovation is to pair low discrepancy nodes with matching covariance kernels to lower the computational cost to $$O(N_{\text {opt}} n \log n)$$. This approach is demonstrated explicitly with rank-1 lattice sequences and shift-invariant kernels. Our algorithm is implemented in the Guaranteed Automatic Integration Library (GAIL).

...read moreread less

Journal Article•DOI•

[...]

Dominik Müller¹, Claudia Czado¹•Institutions (1)

Technische Universität München¹

15 Mar 2019-Statistics and Computing

TL;DR: In this article, a structure selection method for high-dimensional (€ d > 100€ ) sparse vine copulas is proposed, which uses a connection between the vine and structural equation models.

...read moreread less

Abstract: We propose a novel structure selection method for high-dimensional ( $$d > 100$$ ) sparse vine copulas Current sequential greedy approaches for structure selection require calculating spanning trees in hundreds of dimensions and fitting the pair copulas and their parameters iteratively throughout the structure selection process Our method uses a connection between the vine and structural equation models The later can be estimated very fast using the Lasso, also in very high dimensions, to obtain sparse models Thus, we obtain a structure estimate independently of the chosen pair copulas and parameters Additionally, we define the novel concept of regularization paths for R-vine matrices It relates sparsity of the vine copula model in terms of independence copulas to a penalization coefficient in the structural equation models We illustrate our approach and provide many numerical examples These include simulations and data applications in high dimensions, showing the superiority of our approach to other existing methods

...read moreread less

Journal Article•DOI•

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

[...]

Mohsen Maleki¹, Darren Wraith², Reinaldo B. Arellano-Valle³•Institutions (3)

Shiraz University¹, Queensland University of Technology², Pontifical Catholic University of Chile³

01 May 2019-Statistics and Computing

TL;DR: In this paper, an unrestricted skew-normal generalized hyperbolic (SUNGH) distribution is proposed for finite mixture modeling or clustering problems, which has several desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters.

...read moreread less

Abstract: In this paper, we introduce an unrestricted skew-normal generalized hyperbolic (SUNGH) distribution for use in finite mixture modeling or clustering problems. The SUNGH is a broad class of flexible distributions that includes various other well-known asymmetric and symmetric families such as the scale mixtures of skew-normal, the skew-normal generalized hyperbolic and its corresponding symmetric versions. The class of distributions provides a much needed unified framework where the choice of the best fitting distribution can proceed quite naturally through either parameter estimation or by placing constraints on specific parameters and assessing through model choice criteria. The class has several desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. We illustrate the flexibility of the proposed class of distributions in a mixture modeling context using a Bayesian framework and assess the performance using simulated and real data.

...read moreread less

Journal Article•DOI•

Probabilistic linear solvers: a unifying view

[...]

Simon Bartels¹, Simon Bartels², Jon Cockayne³, Ilse C. F. Ipsen⁴, Philipp Hennig¹, Philipp Hennig² - Show less +2 more•Institutions (4)

Max Planck Society¹, University of Tübingen², University of Warwick³, North Carolina State University⁴

Bayesian nonparametric clustering for large data sets

TL;DR: In this work surprisingly general conditions for equivalence of these disparate methods are presented, and connections between Probabilistic linear solvers and projection methods for linear systems are described, providing a probabilistic interpretation of a far more general class of iterative methods.

...read moreread less

Abstract: Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behaviour of the conjugate gradient method as a prototypical iterative method. In this work, surprisingly general conditions for equivalence of these disparate methods are presented. We also describe connections between probabilistic linear solvers and projection methods for linear systems, providing a probabilistic interpretation of a far more general class of iterative methods. In particular, this provides such an interpretation of the generalised minimum residual method. A probabilistic view of preconditioning is also introduced. These developments unify the literature on probabilistic linear solvers and provide foundational connections to the literature on iterative solvers for linear systems.

...read moreread less

Journal Article•DOI•

[...]

Daiane Aparecida Zuanetti¹, Peter Müller², Yitan Zhu³, Shengjie Yang³, Yuan Ji³ - Show less +1 more•Institutions (3)

Federal University of São Carlos¹, University of Texas at Austin², NorthShore University HealthSystem³

01 Mar 2019-Statistics and Computing

TL;DR: Two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction are proposed and compare favorably with other clustering algorithms, including k-mean, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm.

...read moreread less

Abstract: We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene---gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene---gene interactions extracted from the online search tool "Zodiac."

...read moreread less

Journal Article•DOI•

The dynamic stochastic topic block model for dynamic networks with textual edges

[...]

Marco Corneli¹, Charles Bouveyron, Pierre Latouche¹, Fabrice Rossi¹•Institutions (1)

University of Paris¹