scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2008"


Journal ArticleDOI
TL;DR: In this article, the authors introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size.
Abstract: Summary. Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

2,204 citations


Journal ArticleDOI
TL;DR: An efficient algorithm is presented, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem.
Abstract: Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.

1,709 citations


Journal ArticleDOI
TL;DR: This work achieves the flexibility to accommodate non‐stationary, non‐Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets in the form of a computational template encompassing these diverse settings.
Abstract: With scientific data available at geocoded locations, investigators are increasingly turning to spatial process models for carrying out statistical inference. Over the last decade, hierarchical models implemented through Markov chain Monte Carlo methods have become especially popular for spatial modelling, given their flexibility and power to fit models that would be infeasible with classical methods as well as their avoidance of possibly inappropriate asymptotics. However, fitting hierarchical spatial models often involves expensive matrix decompositions whose computational complexity increases in cubic order with the number of spatial locations, rendering such models infeasible for large spatial data sets. This computational burden is exacerbated in multivariate settings with several spatially dependent response variables. It is also aggravated when data are collected at frequent time points and spatiotemporal process models are used. With regard to this challenge, our contribution is to work with what we call predictive process models for spatial and spatiotemporal data. Every spatial (or spatiotemporal) process induces a predictive process model (in fact, arbitrarily many of them). The latter models project process realizations of the former to a lower dimensional subspace, thereby reducing the computational burden. Hence, we achieve the flexibility to accommodate non-stationary, non-Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets. We discuss attractive theoretical properties of these predictive processes. We also provide a computational template encompassing these diverse settings. Finally, we illustrate the approach with simulated and real data sets.

1,083 citations


Journal ArticleDOI
TL;DR: In this article, a flexible family of non-stationary covariance functions is defined by using a set of basis functions that is fixed in number, which leads to a spatial prediction method that is called fixed rank kriging.
Abstract: Summary. Spatial statistics for very large spatial data sets is challenging. The size of the data set, n, causes problems in computing optimal spatial predictors such as kriging, since its computa tional cost is of order A73. In addition, a large data set is often defined on a large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when n is very large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covari ance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very large data set of total column ozone data, observed over the entire globe, where n is of the order of hundreds of thousands.

980 citations


Journal ArticleDOI
Simon N. Wood1
TL;DR: The paper develops the first computationally efficient method for direct generalized additive model smoothness selection, which is highly stable, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes that are based on working model smooths selection.
Abstract: Summary. Existing computationally efficient methods for penalized likelihood generalized addi tive model fitting employ iterative smoothness selection on working linear models (or working mixed models). Such schemes fail to converge for a non-negligible proportion of models, with failure being particularly frequent in the presence of concurvity. If smoothness selection is per formed by optimizing 'whole model' criteria these problems disappear, but until now attempts to do this have employed finite-difference-based optimization schemes which are computationally inefficient and can suffer from false convergence. The paper develops the first computationally efficient method for direct generalized additive model smoothness selection. It is highly sta ble, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes that are based on working model smoothness selection. The method also offers a reliable way of fitting generalized additive mixed models.

633 citations


Journal ArticleDOI
TL;DR: It is shown how the marginal likelihood can be computed via Markov chain Monte Carlo methods on modified posterior distributions for each model, which then allows Bayes factors or posterior model probabilities to be calculated.
Abstract: Model choice plays an increasingly important role in statistics. From a Bayesian perspective a crucial goal is to compute the marginal likelihood of the data for a given model. However, this is typically a difficult task since it amounts to integrating over all model parameters. The aim of the paper is to illustrate how this may be achieved by using ideas from thermodynamic integration or path sampling. We show how the marginal likelihood can be computed via Markov chain Monte Carlo methods on modified posterior distributions for each model. This then allows Bayes factors or posterior model probabilities to be calculated. We show that this approach requires very little tuning and is straightforward to implement. The new method is illustrated in a variety of challenging statistical settings.

360 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that the empirical distinction between MAR and MNAR is not possible, in the sense that each MNAR model fit to a set of observed data can be reproduced exactly by an MAR counterpart.
Abstract: Over the last decade a variety of models to analyse incomplete multivariate and longitudinal data have been proposed, many of which allowing for the missingness to be not at random, in the sense that the unobserved measurements influence the process governing missingness, in addition to influences coming from observed measurements and/or covariates. The fundamental problems that are implied by such models, to which we refer as sensitivity to unverifiable modelling assumptions, has, in turn, sparked off various strands of research in what is now termed sensitivity analysis. The nature of sensitivity originates from the fact that a missingness not at random (MNAR) model is not fully verifiable from the data, rendering the empirical distinction between MNAR and missingness at random (MAR), where only covariates and observed outcomes influence missingness, difficult or even impossible, unless we are willing to accept the posited MNAR model in an unquestioning way. We show that the empirical distinction between MAR and MNAR is not possible, in the sense that each MNAR model fit to a set of observed data can be reproduced exactly by an MAR counterpart. Of course, such a pair of models will produce different predictions of the unobserved outcomes, given the observed outcomes. Theoretical considerations are supplemented with an illustration that is based on the Slovenian public opinion survey, which has been analysed before in the context of sensitivity analysis.

214 citations


Journal ArticleDOI
TL;DR: In this paper, a small area estimation approach that combines small area random effects with a smooth, non-parametrically specified trend is proposed, where penalized splines are used as the representation for the nonparametric trend and the resulting model is readily fitted by using existing model fitting approaches such as restricted maximum likelihood.
Abstract: The paper proposes a small area estimation approach that combines small area random effects with a smooth, non-parametrically specified trend. By using penalized splines as the representation for the non-parametric trend, it is possible to express the non-parametric small area estimation problem as a mixed effect model regression. The resulting model is readily fitted by using existing model fitting approaches such as restricted maximum likelihood. We present theoretical results on the prediction mean-squared error of the estimator proposed and on likelihood ratio tests for random effects, and we propose a simple non-parametric bootstrap approach for model inference and estimation of the small area prediction mean-squared error. The applicability of the method is demonstrated on a survey of lakes in north-eastern USA.

179 citations


Journal ArticleDOI
TL;DR: A class of smoothers are proposed that are appropriate for smoothing over difficult regions of 2 which can be represented in terms of a low rank basis and one or two quadratic penalties, and their low rank means that their use is computationally efficient.
Abstract: Summary. Conventional smoothing methods sometimes perform badly when used to smooth data over complex domains, by smoothing inappropriately across boundary features, such as peninsulas. Solutions to this smoothing problem tend to be computationally complex, and not to provide model smooth functions which are appropriate for incorporating as components of other models, such as generalized additive models or mixed additive models. We propose a class of smoothers that are appropriate for smoothing over difficult regions of 2 which can be represented in terms of a low rank basis and one or two quadratic penalties. The key features of these smoothers are that they do not ‘smooth across’ boundary features, that their representation in terms of a basis and penalties allows straightforward incorporation as components of generalized additive models, mixed models and other non-standard models, that smoothness selection for these model components is straightforward to accomplish in a computationally efficient manner via generalized cross-validation, Akaike's information criterion or restricted maximum likelihood, for example, and that their low rank means that their use is computationally efficient.

147 citations


Journal ArticleDOI
TL;DR: This work develops functional principal components analysis for this situation and demonstrates the prediction of individual trajectories from sparse observations and can handle missing data and lead to predictions of the functional principal component scores which serve as random effects in this model.
Abstract: Summary In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time The repeated observations could be binomial, Poisson or of another discrete type or could be continuousThe timings of the repeated measurements are often sparse and irregular We introduce a latent Gaussian process model for such data, establishing a connection to functional data analysis The functional methods proposed are non-parametric and computationally straightforward as they do not involve a likelihood We develop functional principal components analysis for this situation and demonstrate the prediction of individual trajectories from sparse observations This method can handle missing data and leads to predictions of the functional principal component scores which serve as random effects in this modelThese scores can then be used for further statistical analysis, such as inference, regression, discriminant analysis or clustering We illustrate these non-parametric methods with longitudinal data on primary biliary cirrhosis and show in simulations that they are competitive in comparisons with generalized estimating equations and generalized linear mixed models

141 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce particle filters for a class of partially-observed continuous-time dynamic models where the signal is given by a multivariate diffusion process, and they build on recent methodology for exact simulation of the diffusion process and the unbiased estimation of the transition density as described in Beskos et al. (2006).
Abstract: In this paper we introduce novel particle filters for a class of partially-observed continuous-time dynamic models where the signal is given by a multivariate diffusion process. We consider a variety of observation schemes, including diffusion observed with error, observation of a subset of the components of the multivariate diffusion and arrival times of a Poisson process whose intensity is a known function of the diffusion (Cox process). Unlike currently available methods, our particle filters do not require approximations of the transition and/or the observation density using time-discretisations. Instead, they build on recent methodology for the exact simulation of the diffusion process and the unbiased estimation of the transition density as described in Beskos et al. (2006). In particular, we introduce the Generalised Poisson Estimator, which generalises the Poisson Estimator of Beskos et al. (2006). Thus, our filters avoid the systematic biases caused by time-discretisations and they have significant computational advantages over alternative continuous-time filters. These advantages are supported theoretically by a central limit theorem.

Journal ArticleDOI
TL;DR: In this article, it is shown that α-investing is an adaptive sequential methodology that encompasses a large family of procedures for testing multiple hypotheses, and α-Investing is shown to control mFDRR, which is the ratio of the expected number of false rejections to the expected value of the ratio.
Abstract: Summary. α-investing is an adaptive sequential methodology that encompasses a large family of procedures for testing multiple hypotheses. All control mFDR, which is the ratio of the expected number of false rejections to the expected number of rejections. mFDR is a weaker criterion than the false discovery rate, which is the expected value of the ratio. We compensate for this weakness by showing that α-investing controls mFDR at every rejected hypothesis. α-investing resembles α-spending that is used in sequential trials, but it has a key difference. When a test rejects a null hypothesis, α-investing earns additional probability towards subsequent tests. α-investing hence allows us to incorporate domain knowledge into the testing procedure and to improve the power of the tests. In this way, α-investing enables the statistician to design a testing procedure for a specific problem while guaranteeing control of mFDR.

Journal ArticleDOI
TL;DR: In this article, a nonparametric estimate of conditional quantiles is proposed, which avoids the problem of crossing quantile curves by using an initial estimate of the conditional distribution function in a flrst step.
Abstract: In this paper a new nonparametric estimate of conditional quantiles is proposed, that avoids the problem of crossing quantile curves [calculated for various p 2 (0;1)]: The method uses an initial estimate of the conditional distribution function in a flrst step and solves the problem of inversion and monotonization with respect to p 2 (0;1) simultaneously. It is demonstrated that the new estimates are asymptotically normal distributed and asymptotically flrst order equivalent to quantile estimates obtained by local constant or local linear smoothing of the conditional distribution function. The performance of the new procedure is illustrated by means of a simulation study and some comparisons with the currently available procedures which are similar in spirit with the proposed method are presented.

Journal ArticleDOI
TL;DR: This paper develops a simulation‐based approach to sequential parameter learning and filtering in general state space models based on approximating the target posterior by a mixture of fixed lag smoothing distributions that avoids reweighting particles and hence sample degeneracy problems that plague particle filters that use sequential importance sampling.
Abstract: Summary. The paper develops a simulation-based approach to sequential parameter learning and filtering in general state space models. Our approach is based on approximating the target posterior by a mixture of fixed lag smoothing distributions. Parameter inference exploits a sufficient statistic structure and the methodology can be easily implemented by modifying state space smoothing algorithms. We avoid reweighting particles and hence sample degeneracy problems that plague particle filters that use sequential importance sampling. The method is illustrated by using two examples: a benchmark auto-regressive model with observation error and a high dimensional dynamic spatiotemporal model. We show that the method provides accurate inference in the presence of outliers, model misspecification and high dimensionality.

Journal ArticleDOI
TL;DR: A stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters.
Abstract: Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster-specific random effects. The inclusion of cluster-specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well-known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.

Journal ArticleDOI
TL;DR: In this paper, a linear structural nested direct effect model with direct effect parameters that can be estimated by using inverse probability weighting by a conditional distribution of the mediator is presented. But the model is not consistent and can be extremely unstable when the mediators are continuous.
Abstract: When regression models adjust for mediators on the causal path from exposure to outcome, the regression coefficient of exposure is commonly viewed as a measure of the direct exposure effect. This interpretation can be misleading, even with a randomly assigned exposure. This is because adjustment for post-exposure measurements introduces bias whenever their association with the outcome is confounded by more than just the exposure. By the same token, adjustment for such confounders stays problematic when these are themselves affected by the exposure. Robins accommodated this by introducing linear structural nested direct effect models with direct effect parameters that can be estimated by using inverse probability weighting by a conditional distribution of the mediator. The resulting estimators are consistent, but inefficient, and can be extremely unstable when the mediator is absolutely continuous. We develop direct effect estimators which are not only more efficient but also consistent under a less demanding model for a conditional expectation of the outcome. We find that the one estimator which avoids inverse probability weighting altogether performs best. This estimator is intuitive, computationally straightforward and, as demonstrated by simulation, competes extremely well with ordinary least squares estimators in settings where standard regression is valid.

Journal ArticleDOI
TL;DR: In this paper, a new class of graphical models capturing the dependence structure of events that occur in time is proposed, where the graphs represent so-called local independences, meaning that the intensities of certain types of events are independent of some (but not necessarilly all) events in the past.
Abstract: Summary. A new class of graphical models capturing the dependence structure of events that occur in time is proposed.The graphs represent so-called local independences, meaning that the intensities of certain types of events are independent of some (but not necessarilly all) events in the past. This dynamic concept of independence is asymmetric, similar to Granger non-causality, so the corresponding local independence graphs differ considerably from classical graphical models. Hence a new notion of graph separation, which is called δ-separation, is introduced and implications for the underlying model as well as for likelihood inference are explored. Benefits regarding facilitation of reasoning about and understanding of dynamic dependences as well as computational simplifications are discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors present a model class that provides a framework for modelling marginal independences in contingency tables, drawing on analogies with multivariate Gaussian models for marginal independence.
Abstract: Summary. Log-linear models are a classical tool for the analysis of contingency tables. In particular, the subclass of graphical log-linear models provides a general framework for modelling conditional independences. However, with the exception of special structures, marginal independence hypotheses cannot be accommodated by these traditional models. Focusing on binary variables, we present a model class that provides a framework for modelling marginal independences in contingency tables. The approach that is taken is graphical and draws on analogies with multivariate Gaussian models for marginal independence. For the graphical model representation we use bidirected graphs, which are in the tradition of path diagrams. We show how the models can be parameterized in a simple fashion, and how maximum likelihood estimation can be performed by using a version of the iterated conditional fitting algorithm. Finally we consider combining these models with symmetry restrictions.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce new types of graphical Gaussian models by placing symmetry restrictions on the concentration or correlation matrix, where parameters that are associated with edges or vertices of the same colour are restricted to being identical.
Abstract: Summary. We introduce new types of graphical Gaussian models by placing symmetry restrictions on the concentration or correlation matrix. The models can be represented by coloured graphs, where parameters that are associated with edges or vertices of the same colour are restricted to being identical. We study the properties of such models and derive the necessary algorithms for calculating maximum likelihood estimates. We identify conditions for restrictions on the concentration and correlation matrices being equivalent. This is for example the case when symmetries are generated by permutation of variable labels. For such models a particularly simple maximization of the likelihood function is available.

Journal ArticleDOI
TL;DR: A technique is suggested, related to the concept of ‘detection boundary’ that was developed by Ingster and by Donoho and Jin, for comparing the theoretical performance of classifiers constructed from small training samples of very large vectors, and results are obtained for a variety of distance‐based methods.
Abstract: Summary. We suggest a technique, related to the concept of ‘detection boundary’ that was developed by Ingster and by Donoho and Jin, for comparing the theoretical performance of classifiers constructed from small training samples of very large vectors. The resulting ‘classification boundaries’ are obtained for a variety of distance-based methods, including the support vector machine, distance-weighted discrimination and kth-nearest-neighbour classifiers, for thresholded forms of those methods, and for techniques based on Donoho and Jin's higher criticism approach to signal detection. Assessed in these terms, standard distance-based methods are shown to be capable only of detecting differences between populations when those differences can be estimated consistently. However, the thresholded forms of distance-based classifiers can do better, and in particular can correctly classify data even when differences between distributions are only detectable, not estimable. Other methods, including higher criticism classifiers, can on occasion perform better still, but they tend to be more limited in scope, requiring substantially more information about the marginal distributions. Moreover, as tail weight becomes heavier the classification boundaries of methods designed for particular distribution types can converge to, and achieve, the boundary for thresholded nearest neighbour approaches. For example, although higher criticism has a lower classification boundary, and in this sense performs better, in the case of normal data, the boundaries are identical for exponentially distributed data when both sample sizes equal 1.

Journal ArticleDOI
TL;DR: In this paper, conditionally uncorrelated components (CUCs) are proposed to represent matrix-valued processes in a parsimonious way, and a bootstrap method is proposed for testing the existence of CUCs.
Abstract: Summary. We propose to model multivariate volatility processes on the basis of the newly defined conditionally uncorrelated components (CUCs). This model represents a parsimonious representation for matrix-valued processes. It is flexible in the sense that each CUC may be fitted separately with any appropriate univariate volatility model. Computationally it splits one high dimensional optimization problem into several lower dimensional subproblems. Consistency for the estimated CUCs has been established. A bootstrap method is proposed for testing the existence of CUCs. The methodology proposed is illustrated with both simulated and real data sets.

Journal ArticleDOI
TL;DR: This paper provides a two-step algorithm that produces parameter estimates using only unconstrained estimation and gives an application to demographic hazard modelling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity.
Abstract: Summary. In many situations information from a sample of individuals can be supplemented by population level information on the relationship between a dependent variable and explan atory variables. Inclusion of the population level information can reduce bias and increase the efficiency of the parameter estimates. Population level information can be incorporated via con straints on functions of the model parameters. In general the constraints are non-linear, making the task of maximum likelihood estimation more difficult. We develop an alternative approach exploiting the notion of an empirical likelihood. It is shown that, within the framework of gener alized linear models, the population level information corresponds to linear constraints, which are comparatively easy to handle. We provide a two-step algorithm that produces parameter estimates by using only unconstrained estimation. We also provide computable expressions for the standard errors. We give an application to demographic hazard modelling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity.

Journal ArticleDOI
Jiashun Jin1
TL;DR: In this paper, the universal oracle equivalence of the proportion is constructed based on the underlying characteristic function, which reduces the problem of estimating the proportion to estimating the oracle, which is relatively easier to handle.
Abstract: D0}. We propose a general approach to construct the universal oracle equivalence of the proportion. The construction is based on the underlying characteristic function. The oracle equivalence reduces the problem of estimating the proportion to the problem of estimating the oracle, which is relatively easier to handle. In fact, the oracle equivalence naturally yields a family of estimators for the proportion, which are consistent under mild conditions, uniformly across a wide class of parameters. The approach compares favourably with recent works by Meinshausen and Rice, and Genovese and Wasserman. In particular, the consistency is proved for an unprecedentedly broad class of situations; the class is almost the largest that can be hoped for without further constraints on the model. We also discuss various extensions of the approach, report results on simulation experiments and make connections between the approach and several recent procedures in large-scale multiple testing, including the false discovery rate approach and the local false discovery rate approach.

Journal ArticleDOI
TL;DR: In this article, an estimator that combines likelihood approaches for mixed effects models, with kernel methods, is proposed for forest coverage in Galicia, Spain, where the objective is to estimate forest coverage.
Abstract: Summary. The paper presents a study of the generalized partially linear model including random effects in its linear part. We propose an estimator that combines likelihood approaches for mixed effects models, with kernel methods. Following the methodology of Hardie and co-workers, we introduce a test for the hypothesis of a parametric mixed effects model against the alternative of a semiparametric mixed effects model. The critical values are estimated by using a bootstrap procedure. The asymptotic theory for the methods is provided, as are the results of a simulation study. These verify the feasibility and the excellent behaviour of the methods for samples of even moderate size. The usefulness of the methodology is illustrated with an application in which the objective is to estimate forest coverage in Galicia, Spain.

Journal ArticleDOI
TL;DR: In this paper, two procedures for selecting variables in the semiparametric linear regression model for censored data are described, one procedure penalizes a vector of estimating equations and simultaneously estimates regression coefficients and selects submodels.
Abstract: Summary. We describe two procedures for selecting variables in the semiparametric linear regression model for censored data. One procedure penalizes a vector of estimating equations and simultaneously estimates regression coefficients and selects submodels. A second proce dure controls systematically the proportion of unimportant variables through forward selection and the addition of pseudorandom variables. We explore both rank-based statistics and Buckley James statistics in the setting proposed and evaluate the performance of all methods through extensive simulation studies and one real data set.

Journal ArticleDOI
TL;DR: In this article, an estimator that is robust against the choice of the conditional expectation via an empirical likelihood is proposed, which achieves a gain in efficiency whether the conditional score is correctly specified or not.
Abstract: Summary. The paper considers estimating a parameter β that defines an estimating function U. y, x, β/ for an outcome variable y and its covariate x when the outcome is missing in some of the observations. We assume that, in addition to the outcome and the covariate, a surrogate outcome is available in every observation. The efficiency of existing estimators for β depends critically on correctly specifying the conditional expectation of U given the surrogate and the covariate. When the conditional expectation is not correctly specified, which is the most likely scenario in practice, the efficiency of estimation can be severely compromised even if the propensity function (of missingness) is correctly specified. We propose an estimator that is robust against the choice of the conditional expectation via an empirical likelihood. We demonstrate that the estimator proposed achieves a gain in efficiency whether the conditional score is correctly specified or not. When the conditional score is correctly specified, the estimator reaches the semiparametric variance bound within the class of estimating functions that are generated by U. The practical performance of the estimator is evaluated by using simulation and a data set that is based on the 1996 US presidential election.

Journal ArticleDOI
TL;DR: In this paper, a profile pseudo-partial-likelihood estimation method is proposed to estimate the parameters of the linear part of a partially linear hazard regression model with varying coefficients for multivariate survival data, and asymptotic normality is obtained for estimators of the finite parameters and varying-coefficient functions.
Abstract: Summary. The paper studies estimation of partially linear hazard regression models with varying coefficients for multivariate survival data. A profile pseudo-partial-likelihood estimation method is proposed. The estimation of the parameters of the linear part is accomplished via maximization of the profile pseudo-partial-likelihood, whereas the varying-coefficient functions are considered as nuisance parameters that are profiled out of the likelihood. It is shown that the estimators of the parameters are root n consistent and the estimators of the non-parametric coefficient func tions achieve optimal convergence rates. Asymptotic normality is obtained for the estimators of the finite parameters and varying-coefficient functions. Consistent estimators of the asymptotic variances are derived and empirically tested, which facilitate inference for the model. We prove that the varying-coefficient functions can be estimated as well as if the parametric components were known and the failure times within each subject were independent. Simulations are con ducted to demonstrate the performance of the estimators proposed. A real data set is analysed to illustrate the methodology proposed.

Journal ArticleDOI
TL;DR: In this paper, a Markov chain Monte Carlo (MCMCMC) algorithm is proposed to sample from the posterior distribution of binary trait data, which is based on using a birth-death process for the evolution of the elements of sets of traits.
Abstract: Summary. Binary trait data record the presence or absence of distinguishing traits in individuals. We treat the problem of estimating ancestral trees with time depth from binary trait data. Simple analysis of such data is problematic. Each homology class of traits has a unique birth event on the tree, and the birth event of a trait that is visible at the leaves is biased towards the leaves. We propose a model-based analysis of such data and present a Markov chain Monte Carlo algorithm that can sample from the resulting posterior distribution. Our model is based on using a birth–death process for the evolution of the elements of sets of traits. Our analysis correctly accounts for the removal of singleton traits, which are commonly discarded in real data sets. We illustrate Bayesian inference for two binary trait data sets which arise in historical linguistics. The Bayesian approach allows for the incorporation of information from ancestral languages. The marginal prior distribution of the root time is uniform. We present a thorough analysis of the robustness of our results to model misspecification, through analysis of predictive distributions for external data, and fitting data that are simulated under alternative observation models. The reconstructed ages of tree nodes are relatively robust, whereas posterior probabilities for topology are not reliable.

Journal ArticleDOI
TL;DR: In this paper, a robust estimation procedure based on the choice of a representative trimmed subsample through an initial robust clustering procedure, and subsequent improvements based on maximum likelihood was introduced, where data-driven restrictions on the parameters, requiring that every distribution in the mixture must be sufficiently represented in the initial clustered region, allow singularities to be avoided and guarantee the existence of the estimator.
Abstract: Summary. We introduce a robust estimation procedure that is based on the choice of a representative trimmed subsample through an initial robust clustering procedure, and subsequent improvements based on maximum likelihood. To obtain the initial trimming we resort to the trimmed k-means, a simple procedure designed for finding the core of the clusters under appropriate configurations. By handling the trimmed data as censored, maximum likelihood estimation provides in each step the location and shape of the next trimming. Data-driven restrictions on the parameters, requiring that every distribution in the mixture must be sufficiently represented in the initial clustered region, allow singularities to be avoided and guarantee the existence of the estimator. Our analysis includes robustness properties and asymptotic results as well as worked examples.

Journal ArticleDOI
TL;DR: In this paper, the authors derived a novel expression for the likelihood for mark-recapture-recovery data, which is equivalent to the traditional likelihood in the case where no covariate data are missing.
Abstract: Summary. Regular censusing of wild animal populations produces data for estimating their annual survival. However, there can be missing covariate data; for instance time varying covariates that are measured on individual animals often contain missing values. By considering the transitions that occur from each occasion to the next, we derive a novel expression for the likelihood for mark–recapture–recovery data, which is equivalent to the traditional likelihood in the case where no covariate data are missing, and which provides a natural way of dealing with covariate data that are missing, for whatever reason. Unlike complete-case analysis, this approach does not exclude incompletely observed life histories, uses all available data and produces consistent estimators. In a simulation study it performs better overall than alternative methods when there are missing covariate data.