scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2011"


Journal ArticleDOI
Simon N. Wood1
TL;DR: In this article, a Laplace approximation is used to obtain an approximate restricted maximum likelihood (REML) or marginal likelihood (ML) for smoothing parameter selection in semiparametric regression.
Abstract: Summary. Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton–Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.

4,846 citations


Journal ArticleDOI
TL;DR: In this article, the authors give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Abstract: Summary. In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.

3,054 citations


Journal ArticleDOI
TL;DR: It is shown that, using an approximate stochastic weak solution to (linear) stochastically partial differential equations, some Gaussian fields in the Matérn class can provide an explicit link, for any triangulation of , between GFs and GMRFs, formulated as a basis function representation.
Abstract: Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in R-2 only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matern class, provide an explicit link, for any triangulation of R-d, between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere. (Less)

2,212 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations.
Abstract: The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The methods provide fully automated adaptation mechanisms that circumvent the costly pilot runs that are required to tune proposal densities for Metropolis–Hastings or indeed Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The methodology proposed exploits the Riemann geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density. The performance of these Riemann manifold Monte Carlo methods is rigorously assessed by performing inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models and Bayesian estimation of dynamic systems described by non-linear differential equations. Substantial improvements in the time-normalized effective sample size are reported when compared with alternative sampling approaches. MATLAB code that is available from http://www.ucl.ac.uk/statistics/research/rmhmc allows replication of all the results reported.

1,279 citations


Journal ArticleDOI
TL;DR: This work proposes penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability, and uses a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminating vectors.
Abstract: We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high-dimensional setting where p ≫ n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule obtained from LDA, since it involves all p features. We propose penalized LDA, a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach in order to efficiently optimize it when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L(1) and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high-dimensional setting, and explore their relationships with our proposal.

405 citations


Journal ArticleDOI
TL;DR: In this article, the posterior distribution of a mixture model is studied in the presence of overfitting, where the number of components in the mixture is larger than the true numbers of components, a situation referred to as an overfitted mixture.
Abstract: Summary. We study the asymptotic behaviour of the posterior distribution in a mixture model when the number of components in the mixture is larger than the true number of components: a situation which is commonly referred to as an overfitted mixture. We prove in particular that quite generally the posterior distribution has a stable and interesting behaviour, since it tends to empty the extra components. This stability is achieved under some restriction on the prior, which can be used as a guideline for choosing the prior. Some simulations are presented to illustrate this behaviour.

298 citations


Journal ArticleDOI
TL;DR: A data‐driven weighted linear combination of convex loss functions, together with weighted L1‐penalty is proposed and established a strong oracle property of the method proposed that has both the model selection consistency and estimation efficiency for the true non‐zero coefficients.
Abstract: In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples.

183 citations


Journal ArticleDOI
TL;DR: The results show that the detection boundary changes dramatically when the proportion of non‐null component shifts from the sparse regime to the dense regime and it is shown that the higher criticism test is optimally adaptive to the unknown degrees of heterogeneity and heteroscedasticity in both the sparse and the dense cases.
Abstract: The problem of detecting heterogeneous and heteroscedastic Gaussian mixtures is considered. The focus is on how the parameters of heterogeneity, heteroscedasticity, and proportion of non-null component influence the difficulty of the problem. We establish an explicit detection boundary which separates the detectable region where the likelihood ratio test is shown to reliably detect the presence of non-null effect, from the undetectable region where no method can do so. In particular, the results show that the detection boundary changes dramatically when the proportion of nonnull component shifts from the sparse regime to the dense regime. Furthermore, it is shown that the Higher Criticism test, which does not require the specific information of model parameters, is optimally adaptive to the unknown degrees of heterogeneity and heteroscedasticity in both the sparse and dense cases.

143 citations


Journal ArticleDOI
TL;DR: In this paper, a Cox multiplicative intensity model using covariates that depend on the history of the process is introduced for treating directed interactions as a multivariate point process, and the resulting inferential framework is then employed to model message sending behavior in a corporate e-mail network.
Abstract: : Network data often take the form of repeated interactions between senders and receivers tabulated over time. A primary question to ask of such data is which traits and behaviors are predictive of interaction. To answer this question, a model is introduced for treating directed interactions as a multivariate point process: a Cox multiplicative intensity model using covariates that depend on the history of the process. Consistency and asymptotic normality are proved for the resulting partial-likelihood-based estimators under suitable regularity conditions, and an efficient fitting procedure is described. Multicast interactions--those involving a single sender but multiple receivers--are treated explicitly. The resulting inferential framework is then employed to model message sending behavior in a corporate e-mail network. The analysis gives a precise quantification of which static shared traits and dynamic network effects are predictive of message recipient selection.

138 citations


Journal ArticleDOI
TL;DR: The development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models are considered and a novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables is applied.
Abstract: We consider the development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models. Our approach uses a Mixture of Dirichlet Process (MDP) model for the unknown sampling distribution (likelihood) for the observations arising in each state and a computationally efficient data augmentation scheme to aid inference. The method uses novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables. The methodology is computationally efficient, both in terms of MCMC mixing properties, and robustness to the length of the time series being investigated. Moreover, the method is easy to implement requiring little or no user-interaction. We apply our methodology to the analysis of genomic copy number variation.

117 citations


Journal ArticleDOI
TL;DR: In this article, a projection estimator for the pointwise and the integrated risk of deconvolution is presented, which is obtained by penalization of the projection contrast, which provides model selection.
Abstract: We study the following model of deconvolution $Y=X+\varepsilon$ with i.i.d. observations $Y_1,\dots, Y_n$ and $\varepsilon_{-1},\dots,\varepsilon_{-M}$. The $(X_j)_{1\leq j\leq n}$ are i.i.d. with density $f$, independent of the $\varepsilon_j$. The aim of the paper is to estimate $f$ without knowing the density $f_{\varepsilon}$ of the $\varepsilon_j$. We first define a projection estimator, for which we provide bounds for the pointwise and the integrated $L^2$-risk. We consider ordinary smooth and supersmooth noise $\varepsilon$ with regard to ordinary smooth and supersmooth densities $f$. Then we present an adaptive estimator of the density of $f$. This estimator is obtained by penalization of the projection contrast, which provides model selection. Lastly, we present simulation experiments to illustrate the good performances of our estimator and study from the empirical point of view the importance of theoretical constraints.

Journal ArticleDOI
TL;DR: In this article, the mean direction of the Kent distribution is modeled as a function of a vector of covariates and the estimators can be regarded as asymptotic maximum likelihood estimators.
Abstract: Summary. Compositional data can be transformed to directional data by the square-root transformation and then modelled by using distributions defined on the hypersphere. One advantage of this approach is that zero components are catered for naturally in the models. The Kent distribution for directional data is a good candidate model because it has a sufficiently general covariance structure. We propose a new regression model which models the mean direction of the Kent distribution as a function of a vector of covariates. Our estimators can be regarded as asymptotic maximum likelihood estimators. We show that these estimators perform well and are suitable for typical compositional data sets, including those with some zero components.

Journal ArticleDOI
TL;DR: It is shown that the bootstrap and also the more popular but less accurate t‐distribution and normal approximations are more effective in the tails than towards the middle of the distribution, and it is established that robustness properties of the t‐ratio are preserved under applications of the boot strap.
Abstract: Summary. Student's t-statistic is finding applications today that were never envisaged when it was introduced more than a century ago. Many of these applications rely on properties, e.g. robustness against heavy-tailed sampling distributions, that were not explicitly considered until relatively recently. We explore these features of the t-statistic in the context of its application to very high dimensional problems, including feature selection and ranking, the simultaneous testing of many different hypotheses and sparse, high dimensional signal detection. Robustness properties of the t-ratio are highlighted, and it is established that those properties are preserved under applications of the bootstrap. In particular, bootstrap methods correct for skewness and therefore lead to second-order accuracy, even in the extreme tails. Indeed, it is shown that the bootstrap and also the more popular but less accurate t-distribution and normal approximations are more effective in the tails than towards the middle of the distribution. These properties motivate new methods, e.g. bootstrap-based techniques for signal detection, that confine attention to the significant tail of a statistic.

Journal ArticleDOI
TL;DR: In this article, a multiscale adaptive regression model (MARM) is proposed to integrate the propagation-separation (PS) approach with statistical modeling at each voxel for spatial and adaptive analysis of neuroimaging data from multiple subjects.
Abstract: Neuroimaging studies aim to analyze imaging data with complex spatial patterns in a large number of locations (called voxels) on a two-dimensional (2D) surface or in a 3D volume. Conventional analyses of imaging data include two sequential steps: spatially smoothing imaging data and then independently fitting a statistical model at each voxel. However, conventional analyses suffer from the same amount of smoothing throughout the whole image, the arbitrary choice of smoothing extent, and low statistical power in detecting spatial patterns. We propose a multiscale adaptive regression model (MARM) to integrate the propagation–separation (PS) approach (Polzehl and Spokoiny, 2000, 2006) with statistical modeling at each voxel for spatial and adaptive analysis of neuroimaging data from multiple subjects. MARM has three features: being spatial, being hierarchical, and being adaptive. We use a multiscale adaptive estimation and testing procedure (MAET) to utilize imaging observations from the neighboring voxels of the current voxel to adaptively calculate parameter estimates and test statistics. Theoretically, we establish consistency and asymptotic normality of the adaptive parameter estimates and the asymptotic distribution of the adaptive test statistics. Our simulation studies and real data analysis confirm that MARM significantly outperforms conventional analyses of imaging data.

Journal ArticleDOI
TL;DR: In this article, the authors extend the definition of the controlled direct effect of a point exposure on a survival outcome, other than through some given, time-fixed intermediate variable, to the additive hazard scale.
Abstract: Summary. We extend the definition of the controlled direct effect of a point exposure on a survival outcome, other than through some given, time-fixed intermediate variable, to the additive hazard scale. We propose two-stage estimators for this effect when the exposure is dichotomous and randomly assigned and when the association between the intermediate variable and the survival outcome is confounded only by measured factors, which may themselves be affected by the exposure. The first stage of the estimation procedure involves assessing the effect of the intermediate variable on the survival outcome via Aalen's additive regression for the event time, given exposure, intermediate variable and confounders. The second stage involves applying Aalen's additive model, given the exposure alone, to a modified stochastic process (i.e. a modification of the observed counting process based on the first-stage estimates). We give the large sample properties of the estimator proposed and investigate its small sample properties by Monte Carlo simulation. A real data example is provided for illustration.

Journal ArticleDOI
TL;DR: In this article, the optimality is achieved by estimating the forecast distribution non-parametrically over a given broad model class and proving asymptotic (non-parametric) efficiency in that setting.
Abstract: Efficient probabilistic forecasts of integer-valued random variables are derived. The optimality is achieved by estimating the forecast distribution non-parametrically over a given broad model class and proving asymptotic (non-parametric) efficiency in that setting. The method is developed within the context of the integer auto-regressive class of models, which is a suitable class for any count data that can be interpreted as a queue, stock, birth-and-death process or branching process. The theoretical proofs of asymptotic efficiency are supplemented by simulation results that demonstrate the overall superiority of the non-parametric estimator relative to a misspecified parametric alternative, in large but finite samples. The method is applied to counts of stock market iceberg orders. A subsampling method is used to assess sampling variation in the full estimated forecast distribution and a proof of its validity is given. © 2011 Royal Statistical Society.

Journal ArticleDOI
TL;DR: The functional singular value decomposition (FSD) as mentioned in this paper is an extension of the functional principal component analysis for single processes to the case of paired processes, and it can be used to measure functional correlation.
Abstract: Summary. Aiming at quantifying the dependence of pairs of functional data (X,Y), we develop the concept of functional singular value decomposition for covariance and functional singular component analysis, building on the concept of ‘canonical expansion’ of compact operators in functional analysis. We demonstrate the estimation of the resulting singular values, functions and components for the practically relevant case of sparse and noise-contaminated longitudinal data and provide asymptotic consistency results. Expanding bivariate functional data into singular functions emerges as a natural extension of the popular functional principal component analysis for single processes to the case of paired processes. A natural application of the functional singular value decomposition is a measure of functional correlation. Owing to the involvement of an inverse operation, most previously considered functional correlation measures are plagued by numerical instabilities and strong sensitivity to the choice of smoothing parameters. These problems are exacerbated for the case of sparse longitudinal data, on which we focus. The functional correlation measure that is derived from the functional singular value decomposition behaves well with respect to numerical stability and statistical error, as we demonstrate in a simulation study. Practical feasibility for applications to longitudinal data is illustrated with examples from a study on aging and on-line auctions.

Journal ArticleDOI
TL;DR: Using the banded autocovariance matrix enables us to fit a much longer auto‐regressive AR(p) model to the observed data than typically suggested by the Akaike information criterion, while controlling how many parameters are to be estimated precisely and the level of accuracy.
Abstract: Summary. The paper addresses a ‘large p–small n’ problem in a time series framework and considers properties of banded regularization of an empirical autocovariance matrix of a time series process. Utilizing the banded autocovariance matrix enables us to fit a much longer auto-regressive AR(p) model to the observed data than typically suggested by the Akaike information criterion, while controlling how many parameters are to be estimated precisely and the level of accuracy. We present results on asymptotic consistency of banded autocovariance matrices under the Frobenius norm and provide a theoretical justification on optimal band selection by using cross-validation. Remarkably, the cross-validation loss function for banded prediction is related to the conditional mean-square prediction error and, thus, may be viewed as an alternative model selection criterion. The procedure proposed is illustrated by simulations and application to predicting the sea surface temperature index in the Nino 3.4 region.

Journal ArticleDOI
TL;DR: In this article, a principal stratification approach is adopted to disentangle direct and indirect effects by investigating new augmented experimental designs, where the treatment is randomized, and the mediating variable is not forced, but only randomly encouraged.
Abstract: Summary. Many studies involving causal questions are often concerned with understanding the causal pathways by which a treatment affects an outcome. Thus, the concept of ‘direct’versus‘indirect’ effects comes into play. We tackle the problem of disentangling direct and indirect effects by investigating new augmented experimental designs, where the treatment is randomized, and the mediating variable is not forced, but only randomly encouraged. There are two key features of our framework: we adopt a principal stratification approach, and we mainly focus on principal strata effects, avoiding involving a priori counterfactual outcomes. Using non-parametric identification strategies, we provide a set of assumptions, which allow us to identify partially the causal estimands of interest: the principal strata direct effects. Some examples are shown to illustrate our design and causal estimands of interest. Large sample bounds for the principal strata average direct effects are provided, and a simple hypothetical example is used to show how our augmented design can be implemented and how the bounds can be calculated. Finally our augmented design is compared and contrasted with a standard randomized design.

Journal ArticleDOI
TL;DR: In this paper, a prior distribution is constructed on an infinite-dimensional model for this measure, the model being at the same time dense and computationally manageable.
Abstract: The tail of a bivariate distribution function in the domain of attraction of a bivariate extreme-value distribution may be approximated by the one of its extreme-value attractor. The extreme-value attractor has margins that belong to a three-parameter family and a dependence structure which is characterised by a probability measure on the unit interval with mean equal to one half, called spectral measure. Inference is done in a Bayesian framework using a censored-likelihood approach. A prior distribution is constructed on an infinite-dimensional model for this measure, the model being at the same time dense and computationally manageable. A trans-dimensional Markov chain Monte Carlo algorithm is developed and convergence to the posterior distribution is established. In simulations, the Bayes estimator for the spectral measure is shown to compare favorably with frequentist nonparametric estimators. An application to a data-set of Danish fire insurance claims is provided.

Journal ArticleDOI
TL;DR: In this article, the mean function of a Gaussian process is estimated from a sample of independent trajectories of the process, observed at random time points and corrupted by additive random error.
Abstract: Summary. We propose and analyse fully data-driven methods for inference about the mean function of a Gaussian process from a sample of independent trajectories of the process, observed at random time points and corrupted by additive random error. Our methods are based on thresholded least squares estimators relative to an approximating function basis. The variable threshold levels are determined from the data and the resulting estimates adapt to the unknown sparsity of the mean function relative to the approximating basis. These results are obtained via novel oracle inequalities, which are further used to derive the rates of convergence of our mean estimates. In addition, we construct confidence balls that adapt to the unknown regularity of the mean and covariance function of the stochastic process. They are easy to compute since they do not require explicit estimation of the covariance operator of the process. A simulation study shows that the new method performs very well in practice and is robust against large variations that may be introduced by the random-error terms.

Journal ArticleDOI
TL;DR: The self‐consistent estimate is defined as a prior candidate density that precisely reproduces itself and is applied to artificial data generated from various distributions and reaches the theoretical limit for the scaling of the square error with the size of the data set.
Abstract: The estimation of a density profile from experimental data points is a challenging problem, usually tackled by plotting a histogram. Prior assumptions on the nature of the density, from its smoothness to the specification of its form, allow the design of more accurate estimation procedures, such as Maximum Likelihood. Our aim is to construct a procedure that makes no explicit assumptions, but still providing an accurate estimate of the density. We introduce the self-consistent estimate: the power spectrum of a candidate density is given, and an estimation procedure is constructed on the assumption, to be released a posteriori, that the candidate is correct. The self-consistent estimate is defined as a prior candidate density that precisely reproduces itself. Our main result is to derive the exact expression of the self-consistent estimate for any given dataset, and to study its properties. Applications of the method require neither priors on the form of the density nor the subjective choice of parameters. A cutoff frequency, akin to a bin size or a kernel bandwidth, emerges naturally from the derivation. We apply the self-consistent estimate to artificial data generated from various distributions and show that it reaches the theoretical limit for the scaling of the square error with the dataset size.

Journal ArticleDOI
TL;DR: These tests have optimality properties and computational advantages that are similar to those of the classical score tests in the parametric model framework and are applicable to several semiparametric extensions of measurement error models, including when the measurement error distribution is estimated non-parametrically as well as for generalized partially linear models.
Abstract: We consider functional measurement error models, i.e. models where covariates are measured with error and yet no distributional assumptions are made about the mismeasured variable. We propose and study a score-type local test and an orthogonal series-based, omnibus goodness-of-fit test in this context, where no likelihood function is available or calculated—i.e. all the tests are proposed in the semiparametric model framework. We demonstrate that our tests have optimality properties and computational advantages that are similar to those of the classical score tests in the parametric model framework. The test procedures are applicable to several semiparametric extensions of measurement error models, including when the measurement error distribution is estimated non-parametrically as well as for generalized partially linear models. The performance of the local score-type and omnibus goodness-of-fit tests is demonstrated through simulation studies and analysis of a nutrition data set.

Journal ArticleDOI
TL;DR: In this paper, a new class of dynamic multiscale models for spatiotemporal processes arising from Gaussian areal data is introduced, where nested geographical structures are used to decompose the original process into multi-scale coefficients which evolve through time following state space equations.
Abstract: Summary. We introduce a new class of dynamic multiscale models for spatiotemporal processes arising from Gaussian areal data. Specifically, we use nested geographical structures to decompose the original process into multiscale coefficients which evolve through time following state space equations. Our approach naturally accommodates data that are observed on irregular grids as well as heteroscedasticity. Moreover, we propose a multiscale spatiotemporal clustering algorithm that facilitates estimation of the nested geographical multiscale structure. In addition, we present a singular forward filter backward sampler for efficient Bayesian estimation. Our multiscale spatiotemporal methodology decomposes large data analysis problems into many smaller components and thus leads to scalable and highly efficient computational procedures. Finally, we illustrate the utility and flexibility of our dynamic multiscale framework through two spatiotemporal applications. The first example considers mortality ratios in the state of Missouri whereas the second example examines agricultural production in Espirito Santo State, Brazil.

Journal ArticleDOI
TL;DR: In this article, experimental designs for dose-response studies were constructed to minimize the maximum mean-squared error of the estimated dose required to attain a response in 100% of the target population.
Abstract: Summary. We construct experimental designs for dose–response studies. The designs are robust against possibly misspecified link functions; for this they minimize the maximum meansquared error of the estimated dose required to attain a response in 100p% of the target population. Here p might be one particular value—p D 0:5 corresponds to ED50-estimation—or it might range over an interval of values of interest. The maximum of the mean-squared error is evaluated over a Kolmogorov neighbourhood of the fitted link. Both the maximum and the minimum must be evaluated numerically; the former is carried out by quadratic programming and the latter by simulated annealing.

Journal ArticleDOI
TL;DR: In this article, the authors proposed the Thick-Pen Transform (TPT) to measure cross-dependence in multivariate time series, classifying time series and testing for stationarity.
Abstract: Traditional visualisation of time series data often consists of plotting the time series values against time and “connecting the dots”. We propose an alternative, multiscale visualisation technique, motivated by the scale-space approach in computer vision. In brief, our method also “connects the dots”, but uses a range of pens of varying thicknesses for this purpose. The resulting multiscale map, termed the Thick-Pen Transform (TPT) corresponds to viewing the time series from a range of distances. We formally prove that the TPT is a discriminatory statistic for two Gaussian time series with distinct correlation structures. Further, we show interesting possible applications of the TPT to measuring cross-dependence in multivariate time series, classifying time series, and testing for stationarity. In particular, we derive the asymptotic distribution of our test statistic, and argue that the test is applicable to both linear and nonlinear processes under low moment assumptions. Various other aspects of the methodology, including other possible applications, are also discussed.

Journal ArticleDOI
TL;DR: This work couple the strategy of sufficient dimension reduction with a flexible semiparametric model, and employs sufficientdimension reduction to bring down the dimension of the regression effectively.
Abstract: Summary. As high dimensional data become routinely available in applied sciences, sufficient dimension reduction has been widely employed and its research has received considerable attention. However, with the majority of sufficient dimension reduction methodology focusing on the dimension reduction step, complete analysis and inference after dimension reduction have yet to receive much attention. We couple the strategy of sufficient dimension reduction with a flexible semiparametric model. We concentrate on inference with respect to the primary variables of interest, and we employ sufficient dimension reduction to bring down the dimension of the regression effectively. Extensive simulations demonstrate the efficacy of the method proposed, and a real data analysis is presented for illustration.

Journal ArticleDOI
TL;DR: It is shown that the classical equivalence theorem and the famous geometric characterization of Elfving from the case of uncorrelated data can be adapted to the problem of selecting optimal sets of observations for the n individual patients.
Abstract: We consider the problem of optimal design of experiments for random-effects models, especially population models, where a small number of correlated observations can be taken on each individual, whereas the observations corresponding to different individuals are assumed to be uncorrelated. We focus on c-optimal design problems and show that the classical equivalence theorem and the famous geometric characterization of Elfving from the case of uncorrelated data can be adapted to the problem of selecting optimal sets of observations for the n individual patients. The theory is demonstrated by finding optimal designs for a linear model with correlated observations and a non-linear random-effects population model, which is commonly used in pharmaco-kinetics.

Journal ArticleDOI
TL;DR: A novel prior using the socalled ‘Chinese restaurant process’ to create structures in the form of equal intensities of some neighbouring pixels is proposed, which outperforms most existing methods in terms of image processing quality, speed and the ability to select smoothing parameters automatically.
Abstract: Summary. We consider a multiscale model for intensities in photon-limited images using a Bayesian framework. A typical Dirichlet prior on relative intensities is not efficient in picking up structures owing to the continuity of intensities. We propose a novel prior using the socalled ‘Chinese restaurant process’ to create structures in the form of equal intensities of some neighbouring pixels. Simulations are conducted using several photon-limited images, which are common in X-ray astronomy and other high energy photon-based images. Applications to astronomical images from the Chandra X-ray Observatory satellite are shown.The new methodology outperforms most existing methods in terms of image processing quality, speed and the ability to select smoothing parameters automatically.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a model for the analysis of non-stationary point processes with almost periodic rate of occurrence, which deals with the arrivals of events which are unequally spaced and show a pattern of periodicity or almost periodicity, such as stock transactions and earthquakes.
Abstract: Summary. We propose a model for the analysis of non-stationary point processes with almost periodic rate of occurrence. The model deals with the arrivals of events which are unequally spaced and show a pattern of periodicity or almost periodicity, such as stock transactions and earthquakes. We model the rate of occurrence of a non-homogeneous Poisson process as the sum of sinusoidal functions plus a baseline. Consistent estimates of frequencies, phases and amplitudes which form the sinusoidal functions are constructed mainly by the Bartlett periodogram. The estimates are shown to be asymptotically normally distributed. Computational issues are discussed and it is shown that the frequency estimates must be resolved with order o(T−1) to guarantee the asymptotic unbiasedness and consistency of the estimates of phases and amplitudes, where T is the length of the observation period. The prediction of the next occurrence is carried out and the mean-squared prediction error is calculated by Monte Carlo integration. Simulation and real data examples are used to illustrate the theoretical results and the utility of the model.