scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 2020"


Journal ArticleDOI
TL;DR: In this article, an approximate series expansion of the covariance function in terms of an eigenfunction expansion of Laplace operator in a compact subset of the Gaussian process is proposed.
Abstract: This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of $$\mathbb {R}^d$$. On this approximate eigenbasis, the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as $$\mathcal {O}(nm^2)$$ (initial) and $$\mathcal {O}(m^3)$$ (hyperparameter learning) with m basis functions and n data points. Furthermore, the basis functions are independent of the parameters of the covariance function, which allows for very fast hyperparameter learning. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. We also show that the convergence rate of the truncation error is independent of the input dimensionality provided that the differentiability order of the covariance function increases appropriately, and for the squared exponential covariance function it is always bounded by $${\sim }1/m$$ regardless of the input dimensionality. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.

138 citations


Journal ArticleDOI
TL;DR: In this paper, an integrated framework for estimation and inference from generalized linear models using adjusted score equations that result in mean and median bias reduction is presented. But the framework does not consider the effect of the dispersion parameter.
Abstract: This paper presents an integrated framework for estimation and inference from generalized linear models using adjusted score equations that result in mean and median bias reduction. The framework unifies theoretical and methodological aspects of past research on mean bias reduction and accommodates, in a natural way, new advances on median bias reduction. General expressions for the adjusted score functions are derived in terms of quantities that are readily available in standard software for fitting generalized linear models. The resulting estimating equations are solved using a unifying quasi-Fisher scoring algorithm that is shown to be equivalent to iteratively reweighted least squares with appropriately adjusted working variates. Formal links between the iterations for mean and median bias reduction are established. Core model invariance properties are used to develop a novel mixed adjustment strategy when the estimation of a dispersion parameter is necessary. It is also shown how median bias reduction in multinomial logistic regression can be done using the equivalent Poisson log-linear model. The estimates coming out from mean and median bias reduction are found to overcome practical issues related to infinite estimates that can occur with positive probability in generalized linear models with multinomial or discrete responses, and can result in valid inferences even in the presence of a high-dimensional nuisance parameter.

55 citations


Journal ArticleDOI
TL;DR: A semi-parametric approach is developed to relax the parametric assumption implicit in BSL to an extent and maintain the computational advantages of BSL without any additional tuning and can be significantly more robust than BSL and another approach in the literature.
Abstract: Bayesian synthetic likelihood (BSL) is now a well-established method for performing approximate Bayesian parameter estimation for simulation-based models that do not possess a tractable likelihood function. BSL approximates an intractable likelihood function of a carefully chosen summary statistic at a parameter value with a multivariate normal distribution. The mean and covariance matrix of this normal distribution are estimated from independent simulations of the model. Due to the parametric assumption implicit in BSL, it can be preferred to its nonparametric competitor, approximate Bayesian computation, in certain applications where a high-dimensional summary statistic is of interest. However, despite several successful applications of BSL, its widespread use in scientific fields may be hindered by the strong normality assumption. In this paper, we develop a semi-parametric approach to relax this assumption to an extent and maintain the computational advantages of BSL without any additional tuning. We test our new method, semiBSL, on several challenging examples involving simulated and real data and demonstrate that semiBSL can be significantly more robust than BSL and another approach in the literature.

41 citations


Journal ArticleDOI
TL;DR: In this article, a tensor train approximation to the target probability density function using a small number of function evaluations is proposed, which is based on low-rank surrogates in the tensor-train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry.
Abstract: General multivariate distributions are notoriously expensive to sample from, particularly the high-dimensional posterior distributions in PDE-constrained inverse problems. This paper develops a sampler for arbitrary continuous multivariate distributions that is based on low-rank surrogates in the tensor train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry. We build upon recent developments of the cross approximation algorithms in linear algebra to construct a tensor train approximation to the target probability density function using a small number of function evaluations. For sufficiently smooth distributions, the storage required for accurate tensor train approximations is moderate, scaling linearly with dimension. In turn, the structure of the tensor train surrogate allows sampling by an efficient conditional distribution method since marginal distributions are computable with linear complexity in dimension. Expected values of non-smooth quantities of interest, with respect to the surrogate distribution, can be estimated using transformed independent uniformly-random seeds that provide Monte Carlo quadrature or transformed points from a quasi-Monte Carlo lattice to give more efficient quasi-Monte Carlo quadrature. Unbiased estimates may be calculated by correcting the transformed random seeds using a Metropolis–Hastings accept/reject step, while the quasi-Monte Carlo quadrature may be corrected either by a control-variate strategy or by importance weighting. We show that the error in the tensor train approximation propagates linearly into the Metropolis–Hastings rejection rate and the integrated autocorrelation time of the resulting Markov chain; thus, the integrated autocorrelation time may be made arbitrarily close to 1, implying that, asymptotic in sample size, the cost per effectively independent sample is one target density evaluation plus the cheap tensor train surrogate proposal that has linear cost with dimension. These methods are demonstrated in three computed examples: fitting failure time of shock absorbers; a PDE-constrained inverse diffusion problem; and sampling from the Rosenbrock distribution. The delayed rejection adaptive Metropolis (DRAM) algorithm is used as a benchmark. In all computed examples, the importance weight-corrected quasi-Monte Carlo quadrature performs best and is more efficient than DRAM by orders of magnitude across a wide range of approximation accuracies and sample sizes. Indeed, all the methods developed here significantly outperform DRAM in all computed examples.

38 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to present a simple, novel and substantially more efficient approach to the computation of the most expensive computation in model estimation, the formation of the matrix cross product X T WX.
Abstract: Wood et al. (J Am Stat Assoc 112(519):1199–1210, 2017) developed methods for fitting penalized regression spline based generalized additive models, with of the order of $$10^4$$ coefficients, to up to $$10^8$$ data. The methods offered two to three orders of magnitude reduction in computational cost relative to the most efficient previous methods. Part of the gain resulted from the development of a set of methods for efficiently computing model matrix products when model covariates each take only a discrete set of values substantially smaller than the sample size [generalizing an idea first appearing in Lang et al. (Stat Comput 24(2):223–238, 2014)]. Covariates can always be rounded to achieve such discretization, and it should be noted that the covariate discretization is marginal. That is we do not rely on discretizing covariates jointly, which would typically require the use of very coarse discretization. The most expensive computation in model estimation is the formation of the matrix cross product $$\mathbf{X}^{\mathsf{T}}{\mathbf{WX}}$$ where $$\mathbf{X}$$ is a model matrix and $${\mathbf{W}}$$ a diagonal or tri-diagonal matrix. The purpose of this paper is to present a simple, novel and substantially more efficient approach to the computation of this cross product. The new method offers, for example, a 30 fold reduction in cross product computation time for the Black Smoke model dataset motivating Wood et al. (2017). Given this reduction in computational cost, the subsequent Cholesky decomposition of $$\mathbf{X}^{\mathsf{T}}{\mathbf{WX}}$$ and follow on computation of $$(\mathbf{X}^{\mathsf{T}}{\mathbf{WX}})^{-1}$$ become a more significant part of the computational burden, and we also discuss the choice of methods for improving their speed.

38 citations


Journal ArticleDOI
TL;DR: In this article, a Gaussian ODE solver with a Gauss-Markov process (GOMP) is introduced, which is then iteratively conditioned on information about the true solution x and its first q derivatives.
Abstract: A recently introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution x and its first q derivatives a priori as a Gauss–Markov process $${\varvec{X}}$$ , which is then iteratively conditioned on information about $${\dot{x}}$$ . This article establishes worst-case local convergence rates of order $$q+1$$ for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order q in the case of $$q=1$$ and an integrated Brownian motion prior, and analyses how inaccurate information on $${\dot{x}}$$ coming from approximate evaluations of f affects these rates. Moreover, we show that, in the globally convergent case, the posterior credible intervals are well calibrated in the sense that they globally contract at the same rate as the truncation error. We illustrate these theoretical results by numerical experiments which might indicate their generalizability to $$q \in \{2,3,\ldots \}$$ .

31 citations


Journal ArticleDOI
TL;DR: A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
Abstract: A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature Users can choose the indexes that are relevant in the application at hand In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation Calibration is relative to a set of random clusterings on the same data Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data

30 citations


Journal ArticleDOI
TL;DR: In this paper, a probabilistic numerical method for quantifying the uncertainty induced by the time integration of ODEs is introduced, which allows for the conservation of geometric properties of the underlying deterministic integrator such as mass conservation, symplecticity or conservation of first integrals.
Abstract: A novel probabilistic numerical method for quantifying the uncertainty induced by the time integration of ordinary differential equations (ODEs) is introduced. Departing from the classical strategy to randomise ODE solvers by adding a random forcing term, we show that a probability measure over the numerical solution of ODEs can be obtained by introducing suitable random time steps in a classical time integrator. This intrinsic randomisation allows for the conservation of geometric properties of the underlying deterministic integrator such as mass conservation, symplecticity or conservation of first integrals. Weak and mean square convergence analysis is derived. We also analyse the convergence of the Monte Carlo estimator for the proposed random time step method and show that the measure obtained with repeated sampling converges in the mean square sense independently of the number of samples. Numerical examples including chaotic Hamiltonian systems, chemical reactions and Bayesian inferential problems illustrate the accuracy, robustness and versatility of our probabilistic numerical method.

28 citations


Journal ArticleDOI
TL;DR: An empirical investigation is carried out to assess the impact of imposing non-negativity constraints on forecast reconciliation over the unconstrained method, and it is observed that slight gains in forecast accuracy have occurred at the most disaggregated level.
Abstract: The sum of forecasts of disaggregated time series is often required to equal the forecast of the aggregate, giving a set of coherent forecasts. The least squares solution for finding coherent forecasts uses a reconciliation approach known as MinT, proposed by Wickramasuriya, Athanasopoulos, and Hyndman (2019). The MinT approach and its variants do not guarantee that the coherent forecasts are non-negative, even when all of the original forecasts are non-negative in nature. This has become a serious issue in applications that are inherently non-negative such as with sales data or tourism numbers. While overcoming this difficulty, we reconsider the least squares minimization problem with non-negativity constraints to ensure that the coherent forecasts are strictly non-negative. The constrained quadratic programming problem is solved using three algorithms. They are the block principal pivoting (BPV) algorithm, projected conjugate gradient (PCG) algorithm, and scaled gradient projection algorithm. A Monte Carlo simulation is performed to evaluate the computational performances of these algorithms as the number of time series increases. The results demonstrate that the BPV algorithm clearly outperforms the rest, and PCG is the second best. The superior performance of the BPV algorithm can be partially attributed to the alternative representation of the weight matrix in the MinT approach. An empirical investigation is carried out to assess the impact of imposing non-negativity constraints on forecast reconciliation over the unconstrained method. It is observed that slight gains in forecast accuracy have occurred at the most disaggregated level. At the aggregated level, slight losses are also observed. Although the gains or losses are negligible, the procedure plays an important role in decision and policy implementation processes.

25 citations


Journal ArticleDOI
TL;DR: This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data by suggesting a model-based estimation strategy by modelling the missing mechanism distribution and a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and themissing mechanism.
Abstract: Missing values challenge data analysis because many supervised and unsu-pervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values ; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors.

24 citations


Journal ArticleDOI
TL;DR: In this paper, the authors derive a non-reversible, continuous-time Markov chain Monte Carlo sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process, which is a variant of the Zigzag sampler.
Abstract: We derive a novel non-reversible, continuous-time Markov chain Monte Carlo sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process, which is a variant of the Zigzag sampler of Bierkens et al. (Ann Stat 47(3):1288–1320, 2019). In addition to providing a theoretical validation for this new simulation algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity convergence, for distributions whose tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. Several numerical examples highlight that our coordinate sampler is more efficient than the Zigzag sampler, in terms of effective sample size.

Journal ArticleDOI
TL;DR: In this article, a variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates is proposed.
Abstract: In this paper, we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample variance. This feature is theoretically demonstrated by means of a deep non-asymptotic analysis of a variance reduced functional as well as by a thorough simulation study. In particular, we apply our method to various MCMC Bayesian estimation problems where it favorably compares to the existing variance reduction approaches.

Journal ArticleDOI
TL;DR: This paper develops numerically stable and efficient parameter estimation and model selection algorithms for a class of multivariate log Gaussian Cox processes for tropical rain forest ecology.
Abstract: Statistical inference for highly multivariate point pattern data is challenging due to complex models with large numbers of parameters. In this paper, we develop numerically stable and efficient parameter estimation and model selection algorithms for a class of multivariate log Gaussian Cox processes. The methodology is applied to a highly multivariate point pattern data set from tropical rain forest ecology.

Journal ArticleDOI
TL;DR: Boosting algorithms capable of estimating conditional transformation models of arbitrary complexity are proposed, starting from simple shift transformation models featuring linear predictors to essentially unstructured conditional transformation model allowing complex nonlinear interaction functions.
Abstract: The broad class of conditional transformation models includes interpretable and simple as well as potentially very complex models for conditional distributions. This makes conditional transformation models attractive for predictive distribution modelling, especially because models featuring interpretable parameters and black-box machines can be understood as extremes in a whole cascade of models. So far, algorithms and corresponding theory was developed for special forms of conditional transformation models only: maximum likelihood inference is available for rather simple models, there exists a tailored boosting algorithm for the estimation of additive conditional transformation models, and a special form of random forests targets the estimation of interaction models. Here, I propose boosting algorithms capable of estimating conditional transformation models of arbitrary complexity, starting from simple shift transformation models featuring linear predictors to essentially unstructured conditional transformation models allowing complex nonlinear interaction functions. A generic form of the likelihood is maximized. Thus, the novel boosting algorithms for conditional transformation models are applicable to all types of univariate response variables, including randomly censored or truncated observations.

Journal ArticleDOI
TL;DR: A large-scale comparison of penalized regression methods is presented, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods.
Abstract: Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

Journal ArticleDOI
TL;DR: The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference.
Abstract: Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise: First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretisation) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.

Journal ArticleDOI
TL;DR: It is demonstrated that if the input series is Gaussian, then the mappings preserve the Gaussianity of the data and this approach outperforms the current state-of-the-art multivariate changepoint methods in terms of accuracy of detected changepoints and computational efficiency.
Abstract: High-dimensional changepoint analysis is a growing area of research and has applications in a wide range of fields. The aim is to accurately and efficiently detect changepoints in time series data when both the number of time points and dimensions grow large. Existing methods typically aggregate or project the data to a smaller number of dimensions, usually one. We present a high-dimensional changepoint detection method that takes inspiration from geometry to map a high-dimensional time series to two dimensions. We show theoretically and through simulation that if the input series is Gaussian, then the mappings preserve the Gaussianity of the data. Applying univariate changepoint detection methods to both mapped series allows the detection of changepoints that correspond to changes in the mean and variance of the original time series. We demonstrate that this approach outperforms the current state-of-the-art multivariate changepoint methods in terms of accuracy of detected changepoints and computational efficiency. We conclude with applications from genetics and finance.

Journal ArticleDOI
TL;DR: It is shown that performance of HMC can be significantly improved by incorporating importance sampling and an irreversible part of the dynamics into a chain, and is called Mix & Match Hamiltonian Monte Carlo (MMHMC).
Abstract: The Hamiltonian Monte Carlo (HMC) method has been recognized as a powerful sampling tool in computational statistics. We show that performance of HMC can be significantly improved by incorporating importance sampling and an irreversible part of the dynamics into a chain. This is achieved by replacing Hamiltonians in the Metropolis test with modified Hamiltonians and a complete momentum update with a partial momentum refreshment. We call the resulting generalized HMC importance sampler Mix & Match Hamiltonian Monte Carlo (MMHMC). The method is irreversible by construction and further benefits from (i) the efficient algorithms for computation of modified Hamiltonians; (ii) the implicit momentum update procedure and (iii) the multistage splitting integrators specially derived for the methods sampling with modified Hamiltonians. MMHMC has been implemented, tested on the popular statistical models and compared in sampling efficiency with HMC, Riemann Manifold Hamiltonian Monte Carlo, Generalized Hybrid Monte Carlo, Generalized Shadow Hybrid Monte Carlo, Metropolis Adjusted Langevin Algorithm and Random Walk Metropolis–Hastings. To make a fair comparison, we propose a metric that accounts for correlations among samples and weights and can be readily used for all methods which generate such samples. The experiments reveal the superiority of MMHMC over popular sampling techniques, especially in solving high-dimensional problems.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a two-stage approach for estimating a multivariate mixed model for the longitudinal outcomes and then using the output from this model to fit the survival submodel.
Abstract: Joint models for longitudinal and survival data have gained a lot of attention in recent years, with the development of myriad extensions to the basic model, including those which allow for multivariate longitudinal data, competing risks and recurrent events. Several software packages are now also available for their implementation. Although mathematically straightforward, the inclusion of multiple longitudinal outcomes in the joint model remains computationally difficult due to the large number of random effects required, which hampers the practical application of this extension. We present a novel approach that enables the fitting of such models with more realistic computational times. The idea behind the approach is to split the estimation of the joint model in two steps: estimating a multivariate mixed model for the longitudinal outcomes and then using the output from this model to fit the survival submodel. So-called two-stage approaches have previously been proposed and shown to be biased. Our approach differs from the standard version, in that we additionally propose the application of a correction factor, adjusting the estimates obtained such that they more closely resemble those we would expect to find with the multivariate joint model. This correction is based on importance sampling ideas. Simulation studies show that this corrected two-stage approach works satisfactorily, eliminating the bias while maintaining substantial improvement in computational time, even in more difficult settings.

Journal ArticleDOI
TL;DR: The minimum regularized covariance determinant (MRCD) approach is proposed, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset.
Abstract: The minimum covariance determinant (MCD) approach estimates the location and scatter matrix using the subset of given size with lowest sample covariance determinant. Its main drawback is that it cannot be applied when the dimension exceeds the subset size. We propose the minimum regularized covariance determinant (MRCD) approach, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset. A data-driven procedure sets the weight of the target matrix, so that the regularization is only used when needed. The MRCD estimator is defined in any dimension, is well-conditioned by construction and preserves the good robustness properties of the MCD. We prove that so-called concentration steps can be performed to reduce the MRCD objective function, and we exploit this fact to construct a fast algorithm. We verify the accuracy and robustness of the MRCD estimator in a simulation study and illustrate its practical use for outlier detection and regression analysis on real-life high-dimensional data sets in chemistry and criminology.

Journal ArticleDOI
TL;DR: In this paper, the authors interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function.
Abstract: Parameter reduction can enable otherwise infeasible design and uncertainty studies with modern computational science models that contain several input parameters. In statistical regression, techniques for sufficient dimension reduction (SDR) use data to reduce the predictor dimension of a regression problem. A computational scientist hoping to use SDR for parameter reduction encounters a problem: a computer prediction is best represented by a deterministic function of the inputs, so data comprised of computer simulation queries fail to satisfy the SDR assumptions. To address this problem, we interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function. Within this interpretation, SIR and SAVE estimate matrices of integrals whose column spaces are contained in the ridge directions’ span; we analyze and numerically verify convergence of these column spaces as the number of computer model queries increases. Moreover, we show example functions that are not ridge functions but whose inverse conditional moment matrices are low-rank. Consequently, the computational scientist should beware when using SIR and SAVE for parameter reduction, since SIR and SAVE may mistakenly suggest that truly important directions are unimportant.

Journal ArticleDOI
TL;DR: It is proved analytically that nudged particle filters can still attain asymptotic convergence with the same error rates as conventional particle methods.
Abstract: We investigate a new sampling scheme aimed at improving the performance of particle filters whenever (a) there is a significant mismatch between the assumed model dynamics and the actual system, or (b) the posterior probability tends to concentrate in relatively small regions of the state space. The proposed scheme pushes some particles toward specific regions where the likelihood is expected to be high, an operation known as nudging in the geophysics literature. We reinterpret nudging in a form applicable to any particle filtering scheme, as it does not involve any changes in the rest of the algorithm. Since the particles are modified, but the importance weights do not account for this modification, the use of nudging leads to additional bias in the resulting estimators. However, we prove analytically that nudged particle filters can still attain asymptotic convergence with the same error rates as conventional particle methods. Simple analysis also yields an alternative interpretation of the nudging operation that explains its robustness to model errors. Finally, we show numerical results that illustrate the improvements that can be attained using the proposed scheme. In particular, we present nonlinear tracking examples with synthetic data and a model inference example using real-world financial data.

Journal ArticleDOI
TL;DR: The notions of pseudostationarity and intensity reweighted moment pseudostators for point processes on linear networks are introduced and their nonparametric estimators manage to capture clustering, regularity as well as Poisson process independence.
Abstract: As a workaround for the lack of transitive transformations on linear network structures, which are required to consider different notions of distributional invariance, including stationarity, we introduce the notions of pseudostationarity and intensity reweighted moment pseudostationarity for point processes on linear networks. Moreover, using arbitrary so-called regular linear network distances, e.g. the Euclidean and the shortest-path distance, we further propose geometrically corrected versions of different higher-order summary statistics, including the inhomogeneous empty space function, the inhomogeneous nearest neighbour distance distribution function and the inhomogeneous J-function. Such summary statistics detect interactions of order higher than two. We also discuss their nonparametric estimators and through a simulation study, considering models with different types of spatial interaction and different networks, we study the performance of our proposed summary statistics by means of envelopes. Our summary statistic estimators manage to capture clustering, regularity as well as Poisson process independence. Finally, we make use of our new summary statistics to analyse two different datasets: motor vehicle traffic accidents and spiderwebs.

Journal ArticleDOI
TL;DR: In this article, a mini-batch algorithm for estimating maximum likelihood of mixtures of exponential family distributions is proposed, with emphasis on estimating the maximum likelihood for mixtures with normal distributions.
Abstract: Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the constructed mini-batch algorithms. Theoretical results regarding the convergence of the mini-batch EM algorithms are presented. We then demonstrate how the mini-batch framework may be applied to conduct maximum likelihood (ML) estimation of mixtures of exponential family distributions, with emphasis on ML estimation for mixtures of normal distributions. Via a simulation study, we demonstrate that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm. Further evidence of the performance of the mini-batch framework is provided via an application to the famous MNIST data set.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed NC-Impute, an EM-flavored framework for computing a family of nonconvex penalized matrix completion problems with warm starts, which leads to low-rank solutions with better predictive performance when compared to those obtained from nuclear norm problems.
Abstract: In this paper, we study the popularly dubbed matrix completion problem, where the task is to “fill in” the unobserved entries of a matrix from a small subset of observed entries, under the assumption that the underlying matrix is of low rank. Our contributions herein enhance our prior work on nuclear norm regularized problems for matrix completion (Mazumder et al. in J Mach Learn Res 1532(11):2287–2322, 2010) by incorporating a continuum of nonconvex penalty functions between the convex nuclear norm and nonconvex rank functions. Inspired by Soft-Impute (Mazumder et al. 2010; Hastie et al. in J Mach Learn Res, 2016), we propose NC-Impute—an EM-flavored algorithmic framework for computing a family of nonconvex penalized matrix completion problems with warm starts. We present a systematic study of the associated spectral thresholding operators, which play an important role in the overall algorithm. We study convergence properties of the algorithm. Using structured low-rank SVD computations, we demonstrate the computational scalability of our proposal for problems up to the Netflix size (approximately, a 500,000 $$\times $$ 20,000 matrix with $$10^8$$ observed entries). We demonstrate that on a wide range of synthetic and real data instances, our proposed nonconvex regularization framework leads to low-rank solutions with better predictive performance when compared to those obtained from nuclear norm problems. Implementations of algorithms proposed herein, written in the R language, are made available on github.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method for inference on moderately high-dimensional, nonlinear, non-Gaussian, partially observed Markov process models for which the transition density is not analytically tractable.
Abstract: We propose a method for inference on moderately high-dimensional, nonlinear, non-Gaussian, partially observed Markov process models for which the transition density is not analytically tractable Markov processes with intractable transition densities arise in models defined implicitly by simulation algorithms Widely used particle filter methods are applicable to nonlinear, non-Gaussian models but suffer from the curse of dimensionality Improved scalability is provided by ensemble Kalman filter methods, but these are inappropriate for highly nonlinear and non-Gaussian models We propose a particle filter method having improved practical and theoretical scalability with respect to the model dimension This method is applicable to implicitly defined models having analytically intractable transition densities Our method is developed based on the assumption that the latent process is defined in continuous time and that a simulator of this latent process is available In this method, particles are propagated at intermediate time intervals between observations and are resampled based on a forecast likelihood of future observations We combine this particle filter with parameter estimation methodology to enable likelihood-based inference for highly nonlinear spatiotemporal systems We demonstrate our methodology on a stochastic Lorenz 96 model and a model for the population dynamics of infectious diseases in a network of linked regions

Journal ArticleDOI
TL;DR: This work introduces a likelihood-free approximate Gibbs sampler that naturally circumvents the dimensionality issue by focusing on lower-dimensional conditional distributions and is able to fit substantially more challenging statistical models than would otherwise be possible.
Abstract: Likelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional problems, but suffer a curse of dimensionality in the number of model parameters. We introduce a likelihood-free approximate Gibbs sampler that naturally circumvents the dimensionality issue by focusing on lower-dimensional conditional distributions. These distributions are estimated by flexible regression models either before the sampler is run, or adaptively during sampler implementation. As a result, and in comparison to Metropolis-Hastings-based approaches, we are able to fit substantially more challenging statistical models than would otherwise be possible. We demonstrate the sampler’s performance via two simulated examples, and a real analysis of Airbnb rental prices using a intractable high-dimensional multivariate nonlinear state-space model with a 36-dimensional latent state observed on 365 time points, which presents a real challenge to standard ABC techniques.

Journal ArticleDOI
TL;DR: In this article, a Bayesian model for simultaneous and automatic selection of the appropriate dimension of the latent space and the number of blocks is proposed, which is tested on simulated and real world network data.
Abstract: Spectral embedding of adjacency or Laplacian matrices of undirected graphs is a common technique for representing a network in a lower dimensional latent space, with optimal theoretical guarantees. The embedding can be used to estimate the community structure of the network, with strong consistency results in the stochastic blockmodel framework. One of the main practical limitations of standard algorithms for community detection from spectral embeddings is that the number of communities and the latent dimension of the embedding must be specified in advance. In this article, a novel Bayesian model for simultaneous and automatic selection of the appropriate dimension of the latent space and the number of blocks is proposed. Extensions to directed and bipartite graphs are discussed. The model is tested on simulated and real world network data, showing promising performance for recovering latent community structure.

Journal ArticleDOI
TL;DR: A functional single-index quantile regression model is proposed to address issues of skewness, and the maximal value of PM 10 concentrations is predicted based on the intraday concentrations of the previous day.
Abstract: It is known that functional single-index regression models can achieve better prediction accuracy than functional linear models or fully nonparametric models, when the target is to predict a scalar response using a function-valued covariate. However, the performance of these models may be adversely affected by extremely large values or skewness in the response. In addition, they are not able to offer a full picture of the conditional distribution of the response. Motivated by using trajectories of $$\hbox {PM}_{{10}}$$ concentrations of last day to predict the maximum $$\hbox {PM}_{{10}}$$ concentration of the current day, a functional single-index quantile regression model is proposed to address those issues. A generalized profiling method is employed to estimate the model. Simulation studies are conducted to investigate the finite sample performance of the proposed estimator. We apply the proposed framework to predict the maximal value of $$\hbox {PM}_{{10}}$$ concentrations based on the intraday $$\hbox {PM}_{{10}}$$ concentrations of the previous day.

Journal ArticleDOI
TL;DR: In this paper, two EM-based classifiers are proposed: the first one that jointly exploits the training and test sets (transductive approach), and the second one that expands the parameter estimation using the test set, to complete the group structure learned from the training set.
Abstract: Three important issues are often encountered in Supervised and Semi-Supervised Classification: class memberships are unreliable for some training units (label noise), a proportion of observations might depart from the main structure of the data (outliers) and new groups in the test set may have not been encountered earlier in the learning phase (unobserved classes). The present work introduces a robust and adaptive Discriminant Analysis rule, capable of handling situations in which one or more of the aforementioned problems occur. Two EM-based classifiers are proposed: the first one that jointly exploits the training and test sets (transductive approach), and the second one that expands the parameter estimation using the test set, to complete the group structure learned from the training set (inductive approach). Experiments on synthetic and real data, artificially adulterated, are provided to underline the benefits of the proposed method.