scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2009"


Journal ArticleDOI
TL;DR: This work considers approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non‐Gaussian response variables and can directly compute very accurate approximations to the posterior marginals.
Abstract: Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.

4,164 citations


Journal ArticleDOI
TL;DR: Sparse additive models as discussed by the authors combine ideas from sparse linear modeling and additive non-parametric regression, and derive an algorithm for fitting the models that is practical and effective even when the number of covariates is larger than the sample size.
Abstract: Summary. We present a new class of methods for high dimensional non-parametric regression and classification called sparse additive models. Our methods combine ideas from sparse linear modelling and additive non-parametric regression. We derive an algorithm for fitting the models that is practical and effective even when the number of covariates is larger than the sample size. Sparse additive models are essentially a functional version of the grouped lasso of Yuan and Lin. They are also closely related to the COSSO model of Lin and Zhang but decouple smoothing and sparsity, enabling the use of arbitrary non-parametric smoothers. We give an analysis of the theoretical properties of sparse additive models and present empirical results on synthetic and real data, showing that they can be effective in fitting sparse non-parametric models in high dimensional data.

542 citations


Journal ArticleDOI
TL;DR: A generic on‐line version of the expectation–maximization (EM) algorithm applicable to latent variable models of independent observations that is suitable for conditional models, as illustrated in the case of the mixture of linear regressions model.
Abstract: In this contribution, we propose a generic online (also sometimes called adaptive or recursive) version of the Expectation-Maximisation (EM) algorithm applicable to latent variable models of independent observations. Compared to the algorithm of Titterington (1984), this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback-Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e., that of the maximum likelihood estimator. In addition, the proposed approach is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model.

495 citations


Journal ArticleDOI
TL;DR: In this article, the authors further enlarge the scope of applicability of the traditional Bayesian information criterion type criteria to the situation with a diverging number of parameters for both unpenalized and penalized estimators.
Abstract: Contemporary statistical research frequently deals with problems involving a diverging number of parameters. For those problems, various shrinkage methods (e.g. the lasso and smoothly clipped absolute deviation) are found to be particularly useful for variable selection. Nevertheless, the desirable performances of those shrinkage methods heavily hinge on an appropriate selection of the tuning parameters. With a fixed predictor dimension, Wang and co-worker have demonstrated that the tuning parameters selected by a Bayesian information criterion type criterion can identify the true model consistently. In this work, similar results are further extended to the situation with a diverging number of parameters for both unpenalized and penalized estimators. Consequently, our theoretical results further enlarge not only the scope of applicability of the traditional Bayesian information criterion type criteria but also that of those shrinkage estimation methods.

421 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of multiple testing under dependence in a compound decision theoretic framework is considered, where the observed data are assumed to be generated from an underlying two-state hidden Markov model.
Abstract: Summary. The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two-state hidden Markov model. We propose oracle and asymptotically optimal datadriven procedures that aim to minimize the false non-discovery rate FNR subject to a constraint on the false discovery rate FDR. It is shown that the performance of a multiple-testing procedure can be substantially improved by adaptively exploiting the dependence structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Both theoretical properties and numerical performances of the procedures proposed are investigated. It is shown that the procedures proposed control FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered non-null cases. The new procedure is applied to an influenza-like illness surveillance study for detecting the timing of epidemic periods.

268 citations


Journal ArticleDOI
TL;DR: It is shown that ridge regression, the lasso and the elastic net are special cases of covariance‐regularized regression, and it is demonstrated that certain previously unexplored forms of covariant regularized regression can outperform existing methods in a range of situations.
Abstract: In recent years, many methods have been developed for regression in high-dimensional settings. We propose covariance-regularized regression, a family of methods that use a shrunken estimate of the inverse covariance matrix of the features in order to achieve superior prediction. An estimate of the inverse covariance matrix is obtained by maximizing its log likelihood, under a multivariate normal model, subject to a constraint on its elements; this estimate is then used to estimate coefficients for the regression of the response onto the features. We show that ridge regression, the lasso, and the elastic net are special cases of covariance-regularized regression, and we demonstrate that certain previously unexplored forms of covariance-regularized regression can outperform existing methods in a range of situations. The covariance-regularized regression framework is extended to generalized linear models and linear discriminant analysis, and is used to analyze gene expression data sets with multiple class and survival outcomes.

247 citations


Journal ArticleDOI
TL;DR: In this paper, the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values.
Abstract: Summary. Spline-based approaches to non-parametric and semiparametric regression, as well as to regression of scalar outcomes on functional predictors, entail choosing a parameter controlling the extent to which roughness of the fitted function is penalized. We demonstrate that the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values. These ideas are illustrated by application of functional principal component regression, a method for regressing scalars on functions, to two chemometric data sets.

201 citations


Journal ArticleDOI
TL;DR: A new algorithm, DASSO, is proposed for fitting the entire coefficient path of the Dantzig selector with a similar computational cost to the least angle regression algorithm that is used to compute the lasso.
Abstract: Summary. We propose a new algorithm, DASSO, for fitting the entire coefficient path of the Dantzig selector with a similar computational cost to the least angle regression algorithm that is used to compute the lasso. DASSO efficiently constructs a piecewise linear path through a sequential simplex-like algorithm, which is remarkably similar to the least angle regression algorithm. Comparison of the two algorithms sheds new light on the question of how the lasso and Dantzig selector are related. In addition, we provide theoretical conditions on the design matrix X under which the lasso and Dantzig selector coefficient estimates will be identical for certain tuning parameters. As a consequence, in many instances, we can extend the powerful non-asymptotic bounds that have been developed for the Dantzig selector to the lasso. Finally, through empirical studies of simulated and real world data sets we show that in practice, when the bounds hold for the Dantzig selector, they almost always also hold for the lasso.

195 citations


Journal ArticleDOI
TL;DR: In this paper, a regression model for the intensity function and tractable second-order properties (K-function) is proposed for parameter estimation for inhomogeneous spatial point processes.
Abstract: Summary The paper is concerned with parameter estimation for inhomogeneous spatial point processes with a regression model for the intensity function and tractable second-order properties (K-function) Regression parameters are estimated by using a Poisson likelihood score estimating function and in the second step minimum contrast estimation is applied for the residual clustering parameters Asymptotic normality of parameter estimates is established under certain mixing conditions and we exemplify how the results may be applied in ecological studies of rainforests

177 citations


Journal ArticleDOI
TL;DR: In this article, robust Mahalanobis distances were used to detect the presence of outliers in a sample of multivariate normal data. But the robustness of the robust Mahanobis distance was not evaluated.
Abstract: We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples.

169 citations


Journal ArticleDOI
TL;DR: The null distribution of the test statistic is asymptotically pivotal with a well-known (asymptotic) distribution as mentioned in this paper, which has excellent finite sample performance and is illustrated on temperature data from England.
Abstract: Principal component analysis has become a fundamental tool of functional data analysis it represents tne functional data as X i(t) = μ\ t) + ∑ 1 ≤/∠∞ηi,l + v l( t), wnere μ IS the common mean, v l are the eigenfunctions of the covariance operator and the η i,l are the scores Inferential procedures assume that the mean function μ(t) is the same for all values of i If, in fact, the observations do not come from one population, but rather their mean changes at some point(s), the results of principal component analysis are confounded by the change(s) It is therefore important to develop a methodology to test the assumption of a common functional mean We develop such a test using quantities which can be readily computed in the R package fda The null distribution of the test statistic is asymptotically pivotal with a well-known asymptotic distribution The asymptotic test has excellent finite sample performance Its application is illustrated on temperature data from England

Journal ArticleDOI
TL;DR: In this paper, the authors discuss asymptotic properties of penalized spline smoothing if the spline basis increases with the sample size and show that the posterior distribution of spline coefficients is approximately normal.
Abstract: The paper discusses asymptotic properties of penalized spline smoothing if the spline basis increases with the sample size. The proof is provided in a generalized smoothing model allowing for non-normal responses. The results are extended in two ways. First, assuming the spline coefficients to be a priori normally distributed links the smoothing framework to generalized linear mixed models. We consider the asymptotic rates such that the Laplace approximation is justified and the resulting fits in the mixed model correspond to penalized spline estimates. Secondly, we make use of a fully Bayesian viewpoint by imposing an a priori distribution on all parameters and coefficients. We argue that with the postulated rates at which the spline basis dimension increases with the sample size the posterior distribution of the spline coefficients is approximately normal. The validity of this result is investigated in finite samples by comparing Markov chain Monte Carlo results with their asymptotic approximation in a simulation study.

Journal ArticleDOI
TL;DR: This paper proposed a new computational approach for fitting such models that is based on the Laplace method for integrals that makes the consideration of high dimensional random-effects structures feasible, which requires much fewer repeated measurements per individual to produce reliable results.
Abstract: A common objective in longitudinal studies is the joint modelling of a longitudinal response with a time-to-event outcome. Random effects are typically used in the joint modelling framework to explain the interrelationships between these two processes. However, estimation in the presence of random effects involves intractable integrals requiring numerical integration. We propose a new computational approach for fitting such models that is based on the Laplace method for integrals that makes the consideration of high dimensional random-effects structures feasible. Contrary to the standard Laplace approximation, our method requires much fewer repeated measurements per individual to produce reliable results.

Journal ArticleDOI
TL;DR: In this article, a general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented, based on the eigenvalue-eigenvector decomposition of one scatter matrix relative to another.
Abstract: Summary. A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based on the eigenvalue–eigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant co-ordinate system for the multivariate data. Consequently, we view this method as a method for invariant co-ordinate selection. By plotting the data with respect to this new invariant co-ordinate system, various data structures can be revealed. For example, under certain independent components models, it is shown that the invariant co- ordinates correspond to the independent components. Another example pertains to mixtures of elliptical distributions. In this case, it is shown that a subset of the invariant co-ordinates corresponds to Fisher's linear discriminant subspace, even though the class identifications of the data points are unknown. Some illustrative examples are given.

Journal ArticleDOI
TL;DR: In this paper, two Bayesian approaches to non-parametric monotone function estimation are proposed, one based on a hierarchical Bayes framework and a characterization of smooth functions given by Ramsay that allows unconstrained estimation, and the other using a Bayesian regression spline model of Smith and Kohn.
Abstract: The paper proposes two Bayesian approaches to non-parametric monotone function estimation. The first approach uses a hierarchical Bayes framework and a characterization of smooth monotone functions given by Ramsay that allows unconstrained estimation. The second approach uses a Bayesian regression spline model of Smith and Kohn with a mixture distribution of constrained normal distributions as the prior for the regression coefficients to ensure the monotonicity of the resulting function estimate. The small sample properties of the two function estimators across a range of functions are provided via simulation and compared with existing methods. Asymptotic results are also given that show that Bayesian methods provide consistent function estimators for a large class of smooth functions. An example is provided involving economic demand functions that illustrates the application of the constrained regression spline estimator in the context of a multiple-regression model where two functions are constrained to be monotone.

Journal ArticleDOI
TL;DR: In this article, the authors propose a Bayesian mixture model for dimension reduction by representing the sample of n curves through a smaller set of canonical curves, and propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors.
Abstract: Summary. In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ‘damaged’ areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k -levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to 1 and their connection with Dirichlet process mixtures.

Journal ArticleDOI
TL;DR: In this paper, the concept of scale was introduced as a continuous quantity rather than dyadic levels, and the wavelet transform was adapted for function estimation both on graphs and for irregular spatial data in more than one dimension.
Abstract: For regularly spaced one-dimensional data, wavelet shrinkage has proven to be a compelling method for non-parametric function estimation. We create three new multiscale methods that provide wavelet-like transforms both for data arising on graphs and for irregularly spaced spatial data in more than one dimension. The concept of scale still exists within these transforms, but as a continuous quantity rather than dyadic levels. Further, we adapt recent empirical Bayesian shrinkage techniques to enable us to perform multiscale shrinkage for function estimation both on graphs and for irregular spatial data. We demonstrate that our methods perform very well when compared with several other methods for spatial regression for both real and simulated data. Although we concentrate on multiscale shrinkage (regression) we present our new 'wavelet transforms' as generic tools intended to be the basis of methods that might benefit from a multiscale representation of data either on graphs or for irregular spatial data.

Journal ArticleDOI
TL;DR: In this article, the authors extend the classical pseudopolar treatment of multivariate extremes to develop an asymptotically motivated representation of extremal dependence that also encompasses asymmetry, and provide significant extensions of both the theoretical and practical tools that are available for joint tail modelling.
Abstract: Summary. A fundamental issue in applied multivariate extreme value analysis is modelling dependence within joint tail regions. The primary focus of this work is to extend the classical pseudopolar treatment of multivariate extremes to develop an asymptotically motivated representation of extremal dependence that also encompasses asymptotic independence. Starting with the usual mild bivariate regular variation assumptions that underpin the coefficient of tail dependence as a measure of extremal dependence, our main result is a characterization of the limiting structure of the joint survivor function in terms of an essentially arbitrary non-negative measure that must satisfy some mild constraints. We then construct parametric models from this new class and study in detail one example that accommodates asymptotic dependence, asymptotic independence and asymmetry within a straightforward parsimonious parameterization. We provide a fast simulation algorithm for this example and detail likelihood-based inference including tests for asymptotic dependence and symmetry which are useful for submodel selection. We illustrate this model by application to both simulated and real data. In contrast with the classical multivariate extreme value approach, which concentrates on the limiting distribution of normalized componentwise maxima, our framework focuses directly on the structure of the limiting joint survivor function and provides significant extensions of both the theoretical and the practical tools that are available for joint tail modelling.

Journal ArticleDOI
TL;DR: In this article, a Bayesian non-parametric methodology has been proposed to deal with the issue of prediction within species sampling problems, which concerns the evaluation, conditional on a sample of size n, of the species variety featured by an additional sample of length m. In this paper, we focus on the two-parameter Poisson-Dirichlet model.
Abstract: Summary. A Bayesian non-parametric methodology has been recently proposed to deal with the issue of prediction within species sampling problems. Such problems concern the evaluation, conditional on a sample of size n, of the species variety featured by an additional sample of size m. Genomic applications pose the additional challenge of having to deal with large values of both n and m. In such a case the computation of the Bayesian non-parametric estimators is cumbersome and prevents their implementation. We focus on the two-parameter Poisson–Dirichlet model and provide completely explicit expressions for the corresponding estimators, which can be easily evaluated for any sizes of n and m. We also study the asymptotic behaviour of the number of new species conditionally on the observed sample: such an asymptotic result, combined with a suitable simulation scheme, allows us to derive asymptotic highest posterior density intervals for the estimates of interest. Finally, we illustrate the implementation of the proposed methodology by the analysis of five expressed sequence tags data sets.

Journal ArticleDOI
TL;DR: In this article, a Bayesian information criterion-based model selection procedure based on the quadratic inference function is proposed, which does not require the full likelihood or quasi-likelihood.
Abstract: Summary. Model selection for marginal regression analysis of longitudinal data is challenging owing to the presence of correlation and the difficulty of specifying the full likelihood, particularly for correlated categorical data. The paper introduces a novel Bayesian information criterion type model selection procedure based on the quadratic inference function, which does not require the full likelihood or quasi-likelihood. With probability approaching 1, the criterion selects the most parsimonious correct model. Although a working correlation matrix is assumed, there is no need to estimate the nuisance parameters in the working correlation matrix; moreover, the model selection procedure is robust against the misspecification of the working correlation matrix. The criterion proposed can also be used to construct a data-driven Neyman smooth test for checking the goodness of fit of a postulated model. This test is especially useful and often yields much higher power in situations where the classical directional test behaves poorly. The finite sample performance of the model selection and model checking procedures is demonstrated through Monte Carlo studies and analysis of a clinical trial data set.

Journal ArticleDOI
TL;DR: In this paper, the problem of controlling the familywise error rate FWER when such estimators are used as plug-in estimators in single-step or step-down multiple-test procedures is investigated.
Abstract: Summary. Estimation of the number or proportion of true null hypotheses in multiple-testing problems has become an interesting area of research. The first important work in this field was performed by Schweder and Spjotvoll. Among others, they proposed to use plug-in estimates for the proportion of true null hypotheses in multiple-test procedures to improve the power. We investigate the problem of controlling the familywise error rate FWER when such estimators are used as plug-in estimators in single-step or step-down multiple-test procedures. First we investigate the case of independent p-values under the null hypotheses and show that a suitable choice of plug-in estimates leads to control of FWER in single-step procedures. We also investigate the power and study the asymptotic behaviour of the number of false rejections. Although step-down procedures are more difficult to handle we briefly consider a possible solution to this problem. Anyhow, plug-in step-down procedures are not recommended here. For dependent p-values we derive a condition for asymptotic control of FWER and provide some simulations with respect to FWER and power for various models and hypotheses.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a frequency domain approach for irregularly spaced data on road data and defined non-parametric and parametric spectral density estimators in a way similar to the classical approach.
Abstract: Summary. The purpose of the paper is to propose a frequency domain approach for irregularly spaced data on Rd. We extend the original definition of a periodogram for time series to that for irregularly spaced data and define non-parametric and parametric spectral density estimators in a way that is similar to the classical approach. Introduction of the mixed asymptotics, which are one of the asymptotics for irregularly spaced data, makes it possible to provide asymptotic theories to the spectral estimators. The asymptotic result for the parametric estimator is regarded as a natural extension of the classical result for regularly spaced data to that for irregularly spaced data. Empirical studies are also included to illustrate the frequency domain approach in comparisons with the existing spatial and frequency domain approaches.

Journal ArticleDOI
TL;DR: In this paper, the authors considered a more realistic semi-parametric INAR(p) model where there are essentially no restrictions on the innovation distribution and provided an (semiparametrically) efficient estimator of both the auto-regression parameters and the distribution.
Abstract: Summary. Integer-valued auto-regressive (INAR) processes have been introduced to model non-negative integer-valued phenomena that evolve over time. The distribution of an INAR(p) process is essentially described by two parameters: a vector of auto-regression coefficients and a probability distribution on the non-negative integers, called an immigration or innovation distribution. Traditionally, parametric models are considered where the innovation distribution is assumed to belong to a parametric family. The paper instead considers a more realistic semiparametric INAR(p) model where there are essentially no restrictions on the innovation distribution. We provide an (semiparametrically) efficient estimator of both the auto-regression parameters and the innovation distribution.

Journal ArticleDOI
TL;DR: It is shown that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative.
Abstract: We discuss a Bayesian discovery procedure for multiple comparison problems. We show that under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule based on a threshold of the posterior probability of the alternative. Under a semi-parametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure (ODP), recently introduced by Storey (2007a). Improving the approximation leads us to a Bayesian discovery procedure (BDP), which exploits the multiple shrinkage in clusters implied by the assumed nonparametric model. We compare the BDP and the ODP estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumor samples. We extend the setting of the ODP by discussing modifications of the loss function that lead to different single thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.

Journal ArticleDOI
TL;DR: A general dynamical model as a framework for causal interpretation is developed and a definition of causal influence using the concepts of a ‘physical system’ is proposed, which makes it possible to link descriptive and explicative statistical models, and encompasses quantitative processes and events.
Abstract: Summary. We develop a general dynamical model as a framework for causal interpretation. We first state a criterion of local independence in terms of measurability of processes that are involved in the Doob–Meyer decomposition of stochastic processes; then we define direct and indirect influence. We propose a definition of causal influence using the concepts of a ‘physical system’. This framework makes it possible to link descriptive and explicative statistical models, and encompasses quantitative processes and events. One of the features of the paper is the clear distinction between the model for the system and the model for the observation. We give a dynamical representation of a conventional joint model for human immunodeficiency virus load and CD4 cell counts. We show its inadequacy to capture causal influences whereas in contrast known mechanisms of infection by the human immunodeficiency virus can be expressed directly through a system of differential equations.

Journal ArticleDOI
TL;DR: This article proposed a general shrinkage estimation strategy for the entire inverse regression estimation family that is capable of simultaneous dimension reduction and variable selection without requiring any traditional model, meanwhile retaining the root n estimation consistency of the dimension reduction basis.
Abstract: . The family of inverse regression estimators that was recently proposed by Cook and ven effective in dimension reduction hv transformina the hiah dimensional predictor Ni has proven effective in dimension reduction by transforming the high dimensional predictor vector to its low dimensional projections. We propose a general shrinkage estimation strategy for the entire inverse regression estimation family that is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimators achieve consistency in variable selection without requiring any traditional model, meanwhile retaining the root n estimation consistency of the dimension reduction basis. We also show the effectiveness of the new estimators through both simulation and real data analysis.

Journal ArticleDOI
TL;DR: In this article, a deterministic scan Gibbs sampler was used to combine missing data in the unobserved solution components, and parameters, alternating between missing data and the observed solution components.
Abstract: Hypoelliptic diffusion processes can be used to model a variety of phenomena in applications ranging from molecular dynamics to audio signal analysis. We study parameter estimation for such processes in situations where we observe some components of the solution at discrete times. Since exact likelihoods for the transition densities are typically not known, approximations are used that are expected to work well in the limit of small intersample times Δt and large total observation times N Δt. Hypoellipticity together with partial observation leads to ill conditioning requiring a judicious combination of approximate likelihoods for the various parameters to be estimated. We combine these in a deterministic scan Gibbs sampler alternating between missing data in the unobserved solution components, and parameters. Numerical experiments illustrate asymptotic consistency of the method when applied to simulated data. The paper concludes with an application of the Gibbs sampler to molecular dynamics data.

Journal ArticleDOI
TL;DR: This work imposes neighbourhood structures on each regression response and determines the members of these neighbourhoods which are least favourable in the sense of minimizing the Kullback–Leibler divergence.
Abstract: We study the construction of experimental designs, the purpose of which is to aid in the discrimination between two regression models, each of which might be only approximately specified. A rough description of our approach is that we impose neighbourhood structures on each regression response, and determine the members of these neighbourhoods which are least favourable in the sense of minimizing the Kullback-Leibler divergence. Designs are obtained which maximize this minimum divergence. Both static and sequential approaches are studied. We then consider sequential designs whose purpose is initially to discriminate, but which move their emphasis towards efficient estimation or prediction as one model becomes favoured over the other.

Journal ArticleDOI
TL;DR: Tilting methods are introduced that can be implemented very rapidly, and it is shown how to use bootstrap methods to assess the accuracy of the variable ranking and variable elimination procedures.
Abstract: Summary. Many contemporary classifiers are constructed to provide good performance for very high dimensional data. However, an issue that is at least as important as good classification is determining which of the many potential variables provide key information for good decisions. Responding to this issue can help us to determine which aspects of the datagenerating mechanism (e.g. which genes in a genomic study) are of greatest importance in terms of distinguishing between populations. We introduce tilting methods for addressing this problem. We apply weights to the components of data vectors, rather than to the data vectors themselves (as is commonly the case in related work). In addition we tilt in a way that is governed by L2-distance between weight vectors, rather than by the more commonly used Kullback–Leibler distance. It is shown that this approach, together with the added constraint that the weights should be non-negative, produces an algorithm which eliminates vector components that have little influence on the classification decision. In particular, use of the L2-distance in this problem produces properties that are reminiscent of those that arise when L1-penalties are employed to eliminate explanatory variables in very high dimensional prediction problems, e.g. those involving the lasso. We introduce techniques that can be implemented very rapidly, and we show how to use bootstrap methods to assess the accuracy of our variable ranking and variable elimination procedures.

Journal ArticleDOI
TL;DR: In this paper, a hierarchical model and estimation procedure for pooling principal axes across several populations is developed, based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of center and spread for a population of orthogonal matrices.
Abstract: Summary Although the covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity For example, some pairs of variables may be positively correlated across most groups, whereas the correlation between other pairs may be consistently negative In such cases much of the similarity across covariance matrices can be described by similarities in their principal axes, which are the axes that are defined by the eigenvectors of the covariance matrices Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks For example, eigenvector matrices can be pooled to form a central set of principal axes and, to the extent that the axes are similar, covariance estimates for populations having small sample sizes can be stabilized by shrinking their principal axes towards the across-population centre To this end, the paper develops a hierarchical model and estimation procedure for pooling principal axes across several populations The model for the across-group heterogeneity is based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of ‘centre’ and ‘spread’ for a population of orthogonal matrices