scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 1992"


Journal ArticleDOI
TL;DR: In this paper, a model is proposed for the analysis of censored data which combines a logistic formulation for the probability of occurrence of an event with a proportional hazards specification for the time of occurrence.
Abstract: SUMMARY A model is proposed for the analysis of censored data which combines a logistic formulation for the probability of occurrence of an event with a proportional hazards specification for the time of occurrence of the event. The proposed model is a semiparametric generalization of a parametric model due to Farewell (1982). Estimates of the regression parameters are obtained by maximizing a Monte Carlo approximation of a marginal likelihood and the EM algorithm is used to estimate the baseline survivor function. We present some simulation results to verify the validity of the suggested estimation procedure. It appears that the semiparametric estimates are reasonably efficient with acceptable bias whereas the parametric estimates can be highly dependent on the parametric assumptions.

483 citations


Journal ArticleDOI
TL;DR: In this paper, a complete-data log likelihood ratio based procedure was proposed to obtain significance levels from multiply-imputed data, which does not require access to the completed data point estimates and variance-covariance matrices.
Abstract: SUMMARY Existing procedures for obtaining significance levels from multiply-imputed data either (i) require access to the completed-data point estimates and variance-covariance matrices, which may not be available in practice when the dimensionality of the estimand is high, or (ii) directly combine p-values with less satisfactory results. Taking advantage of the well-known relationship between the Wald and log likelihood ratio test statistics, we propose a complete-data log likelihood ratio based procedure. It is shown that, for any number of multiple imputations, the proposed procedure is equivalent in large samples to the existing procedure based on the point estimates and the variance-covariance matrices, yet it only requires the point estimates and evaluations of the complete-data log likelihood ratio statistic as a function of these estimates and the completed data. The proposed procedure, therefore, is especially attractive with highly multiparameter incomplete-data problems since it does not involve the computation of any matrices.

323 citations


Journal ArticleDOI
TL;DR: This article proposed a hot deck imputation for item nonresponse in sample surveys, which can lead to serious underestimation of the true variance, when the proportion of missing values for an item is appreciable.
Abstract: SUMMARY Hot deck imputation is commonly employed for item nonresponse in sample surveys. It is also a common practice to treat the imputed values as if they are true values, and then compute the variance estimates using standard formulae. This procedure, however, could lead to serious underestimation of the true variance, when the proportion of missing values for an item is appreciable. We propose a jackknife variance estimator for stratified multistage surveys which is obtained by first adjusting the imputed values for each pseudo-replicate and then applying the standard jackknife formula. The proposed jackknife variance estimator is shown to be consistent as the sample size increases, assuming equal response probabilities within imputation classes and using a particular hot deck imputation.

270 citations


Journal ArticleDOI
TL;DR: In this article, the covariance between counting process martingales is used to characterize the dependence between two failure time variates, and a representation of the bivariate survivor function is obtained in terms of the marginal survivor functions and this covariance function.
Abstract: SUMMARY The covariance between counting process martingales is used to characterize the dependence between two failure time variates. A representation of the bivariate survivor function is obtained in terms of the marginal survivor functions and this covariance function. A closely related representation expresses the bivariate survivor function in terms of marginal survivor functions and a conditional covariance function, leading to a new nonparametric survivor function estimator. Generalizations to higher dimensional failure time variates are also given. Simulation evaluations of the survivor function estimator are presented, and generalizations to regression problems are outlined.

254 citations


Journal ArticleDOI
TL;DR: The efficient rounding method is a multiplier method of apportionment which otherwise is known as the method of John Quincy Adams or the methodof smallest divisors.
Abstract: SUMMARY Discretization methods to round an approximate design into an exact design for a given sample size n are discussed. There is a unique method, called efficient rounding, which has the smallest loss of efficiency under a wide family of optimality criteria. The efficient rounding method is a multiplier method of apportionment which otherwise is known as the method of John Quincy Adams or the method of smallest divisors.

217 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a weighted estimator and derived a design-based estimator for the variance of the estimated parameters using methods not previously used in the context of analysis of survey data.
Abstract: SUMMARY The proportional hazards model makes a number of assumptions which may not always be completely satisfied. However, fitting such models can have both descriptive and analytical value. We discuss the implications of the survey design for large-scale studies. We propose a weighted estimator and derive a design-based estimator for the variance of the estimated parameters. The derivation requires using methods not previously used in the context of analysis of survey data. We conduct a small simulation study comparing the proposed method with others in the literature.

204 citations


Journal ArticleDOI
TL;DR: The authors examined the performance of mixed-effects logistic regression analysis when a main component of the model, the mixture distribution, is misspecified, and showed that estimates of model parameters, including the effects of covariates, typically are asymptotically biased, i.e. inconsistent.
Abstract: SUMMARY Mixed-effects logistic models are often used to analyze binary response data which have been gathered in clusters or groups. Responses are assumed to follow a logistic model within clusters, with an intercept which varies across clusters according to a specified probability distribution G. In this paper we examine the performance of mixed-effects logistic regression analysis when a main component of the model, the mixture distribution, is misspecified. We show that, when the mixture distribution is misspecified, estimates of model parameters, including the effects of covariates, typically are asymptotically biased, i.e. inconsistent. However, we present some approximations which suggest that the magnitude of the bias in the estimated covariate effects is typically small. These findings are corroborated by a set of simulations which also suggest that valid variance estimates of estimated covariate effects can be obtained when the mixture distribution is misspecified.

202 citations


Journal ArticleDOI
TL;DR: In this paper, a non-informative reference prior is developed using the reference prior approach for multiparameter problems in which there may be parameters of interest and nuisance parameters.
Abstract: SUMMARY Noninformative priors are developed, using the reference prior approach, for multiparameter problems in which there may be parameters of interest and nuisance parameters. For a given grouping of parameters and ordering of the groups, intuitively, according to inferential importance, an algorithm for determining the associated reference prior is presented. The algorithm is illustrated on the multinomial problem, with discussion of the variety and success of various groupings and ordering strategies.

198 citations


Journal ArticleDOI
TL;DR: This paper showed that matching on estimated rather than population propensity scores can lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible.
Abstract: SUMMARY Matched sampling is a standard technique for controlling bias in observational studies due to specific covariates. Since Rosenbaum & Rubin (1983), multivariate matching methods based on estimated propensity scores have been used with increasing frequency in medical, educational, and sociological applications. We obtain analytic expressions for the effect of matching using linear propensity score methods with normal distributions. These expressions cover cases where the propensity score is either known, or estimated using either discriminant analysis or logistic regression, as is typically done in current practice. The results show that matching using estimated propensity scores not only reduces bias along the population propensity score, but also controls variation of components orthogonal to it. Matching on estimated rather than population propensity scores can therefore lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible. Approximations are given for the magnitude of this variance reduction, which can be computed using estimates obtained from the matching pools. Related expressions for bias reduction are also presented which suggest that, in difficult matching situations, the use of population scores leads to greater bias reduction than the use of estimated scores.

197 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to provide conditions under which it would be possible to estimate the causal effect of a time-dependent treatment or exposure on time to an event of interest in the presence of time- dependent confounding covariates and to construct estimators for the treatment effect under these conditions.
Abstract: SUMMARY Cox & Oakes (1984, p. 66) introduced the 'strong version' of the accelerated failure time model with time-dependent exposures. We provide conditions under which this model could be used to estimate, from observational data, the causal effect of a time- varying exposure or treatment on time to an event of interest in the presence of time- dependent confounding variables. We propose a class of semiparametric tests and estimators for the model parameters. This class contains an estimator that is semipara- metric efficient in the sense of Begun et al. (1983). The purpose of this paper is first to provide conditions under which it would be possible to estimate, from observational data, the causal effect of a time-dependent treatment or exposure on time to an event of interest in the presence of time-dependent confounding covariates and, then, to construct estimators for the treatment effect under these condi- tions. Our approach will be based on the semiparametric estimation of the parameters of the 'strong version' of the accelerated failure time model with time-dependent exposures or treatments introduced by Cox & Oakes (1984, p. 66). The usual approach to the estimation of the effect of a time-varying treatment on survival is to model the hazard of failure at t as a function of past treatment history using a time-dependent proportional hazards model. We show that the usual approach may be biased, whether or not one further adjusts for past confounder history in the analysis, when (a) there exists a time-dependent risk factor for, or predictor of, the event of interest that also predicts subsequent treatment, and (b) past treatment history predicts subsequent risk factor level. The following four examples demonstrate conditions (a) and (b) will often be true in an observational study in which there is 'treatment by indication'. The drug zidovudine, formerly AZT, used in the treatment of the acquired immunodeficiency syndrome, is a direct red-blood cell toxin that is contra-indicated in anaemic subjects, i.e. subjects with depressed red-cell counts, since the toxic effects of zidovudine can worsen the anaemia. Further, anaemic patients are at increased risk of death. Thus in an observational study of the effect of zidovudine on survival of patients with the acquired immunodeficiency syndrome, anaemia is both a risk factor for death

190 citations


Journal ArticleDOI
TL;DR: In this article, a large-sample distribution theory for maximum estimated likelihood estimates is developed, which is nonparametric with respect to P(S I Y, X, X) in the context of estimating,3 from the regression model P (YI X), relating response Y to covariates X, assuming that only a surrogate response S is available for most study subjects.
Abstract: SUMMARY In the context of estimating ,3 from the regression model P ( YI X), relating response Y to covariates X, suppose that only a surrogate response S is available for most study subjects. Suppose that for a random subsample of the study cohort, termed the validation sample, the true outcome Y is available in addition to S. We consider maximum likelihood estimation of ,3 from such data and show that it is nonrobust to misspecification of the distribution relating the surrogate to the true outcome, P(S I Y, X). An alternative semiparametric method is also considered, which is nonparametric with respect to P(S I Y, X). Large-sample distribution theory for maximum estimated likelihood estimates is developed. An illustrative example is presented.

Journal ArticleDOI
TL;DR: In this article, a number of special representations for the joint distribution of qualitative, mostly binary, and quantitative variables are considered for the conditional Gaussian models and regression chain models, and the possibilities for choosing between the models empirically are examined, as well as the testing of independence and conditional independence and the estimation of parameters.
Abstract: SUMMARY A number of special representations are considered for the joint distribution of qualitative, mostly binary, and quantitative variables. In addition to the conditional Gaussian models and to conditional Gaussian regression chain models some emphasis is placed on models derived from an underlying multivariate normal distribution and on models in which discrete probabilities are specified linearly in terms of unknown parameters. The possibilities for choosing between the models empirically are examined, as well as the testing of independence and conditional independence and the estimation of parameters. Often the testing of independence is exactly or nearly the same for a number of different models.

Journal ArticleDOI
TL;DR: In this paper, a class of semiparametric accelerated failure time models is introduced that are useful for modelling the relationship of survival distributions to time dependent covariates, and the estimators within this class achieve the semi-parametric efficiency bound.
Abstract: SUMMARY A class of semiparametric accelerated failure time models is introduced that are useful for modelling the relationship of survival distributions to time dependent covariates. A class of semiparametric rank estimators is derived for the parameters in this model when the survival data are right censored. These estimates are shown to be consistent, and asymptotically normal with variances that can be consistently estimated. It is also shown that estimators within this class achieve the semiparametric efficiency bound. We consider an accelerated failure time model for time-to-event data in which the survival distribution is scaled by a factor which may be a function of time-dependent covariates. These models directly model the effect of covariates on the length of survival. This is in contrast to the more commonly-used proportional hazards model which models the hazard rate itself. In some applications it may be easier to visualize the concept that a treatment intervention or exposure to an environmental contaminant increases or decreases the length of survival by a certain proportion, as compared to the concept that the hazard rate is changed. For time independent covariates, say, Z = (Z1,.. , Zp)', the class of accelerated failure time models assumes that the survival distribution at time t, given a set of covariates Z = z, is given by F(t I z) = F0{ h (z) t}. The proportionality constant h (z) can be interpreted as the scale factor by which lifetime is decreased as a function of the covariates z. Often h(z) is taken to be log linear or h(z) = exp (,B'z). In such cases, if z = 0, we can interpret Fo(t) as the 'baseline' survival distribution or the survival distribution for individuals with covariates all equal to zero. A useful way of envisioning this model is to consider a hypothetical random variable, U, that corresponds to an individual's survival time if that individual had covariate values all equal to zero. This baseline failure time, U, would be modified for values of the covariate different than zero. For example, if we denote the actual survival time by T, then for an individual with covariate value Z we have T =- e'ZU. The parameters, ,B, in this model have a direct interpretation in terms of the increase or decrease in lifespan as a function of the covariates. The assumption that U is independent of Z in this hypothetical construct, induces the probabilistic model

Journal ArticleDOI
TL;DR: In this article, Gibbs sampling is applied to calculate Bayes estimates for a hierarchical capture-recapture model in a real example, and the results show that the Gibbs sampling can be used for a variety of applications.
Abstract: Capture-recapture models are widely used in the estimation of population sizes. Based on data augmentation considerations, we show how Gibbs sampling can be applied to calculate Bayes estimates in this setting. As a result, formulations which were previously avoided because of analytical and numerical intractability can now be easily considered for practical application. We illustrate this potential by using Gibbs sampling to calculate Bayes estimates for a hierarchical capture-recapture model in a real example

Journal ArticleDOI
TL;DR: In this article, an estimator for the proportion of "immunes" in a population is proposed, when a sample of censored failure times is available, and the estimator is shown to be consistent and asymptotically normal, under modest conditions on the censoring mechanism.
Abstract: SUMMARY An estimator for the proportion of 'immunes' in a population is proposed, when a sample of censored failure times is available. Our suggestion is to use one minus the maximum observed value of the Kaplan-Meier empirical distribution function. This estimator is shown to be consistent and asymptotically normal, under modest conditions on the censoring mechanism. Simulations suggest that the estimator is approximately normal for quite small sample sizes, provided the immune proportion is not too close to zero. A simple nonparametric statistic is proposed to test whether the assumptions of the analysis are likely to be valid.

Journal ArticleDOI
TL;DR: In this article, the validity of posterior probability statements follows from probability calculus when the likelihood is the density of the observations, and a more intuitive definition of validity is introduced, based on coverage of posterior sets.
Abstract: SUMMARY The validity of posterior probability statements follows from probability calculus when the likelihood is the density of the observations. To investigate other cases, a second, more intuitive definition of validity is introduced, based on coverage of posterior sets. This notion of validity suggests that the likelihood must be the density of a statistic, not necessarily sufficient, for posterior probability statements to be valid. A convenient numerical method is proposed to invalidate the use of certain likelihoods for Bayesian analysis. Integrated, marginal, and conditional likelihoods, derived to avoid nuisance parameters, are also discussed.

Journal ArticleDOI
TL;DR: In this paper, a modified version of an existing plug-in bandwidth selector is proposed, which is generalized to stationary error variables by estimating a functional of the residual covariance function.
Abstract: SUMMARY Standard techniques for selecting the bandwidth of a kernel estimator from the data in a nonparametric regression model perform badly when the errors are correlated. In this paper we propose a modified version of an existing plug-in bandwidth selector. The method is generalized to stationary error variables by estimating a functional of the residual covariance function. The proposed bandwidth selector shows good properties in asymptotic theory and in simulations without assuming a parametric model for the error process.

Journal ArticleDOI
TL;DR: One-sided group sequential tests for normal responses which minimize expected sample size are derived in this paper, where minimization is at a single value of the normal mean or integrated with respect to a normal density.
Abstract: SUMMARY We derive one-sided group sequential tests for normal responses which minimize expected sample size; minimization is at a single value of the normal mean or integrated with respect to a normal density. The methods employed are much faster and also numerically more stable than those of Jennison (1987). They provide solutions for cases with as many as 200 groups and, for small numbers of groups, facilitate optimization over the choice of group sizes. We present a new method for constructing near-optimal tests when group sizes are unpredictable, based on interpolating the error spending function of an optimal test with a large number of groups.

Journal ArticleDOI
TL;DR: In this paper, the authors developed an efficient and dependable algorithm for calculating highly accurate approximate intervals on a routine basis, for parameters 0 defined in the framework of a multiparameter exponential family.
Abstract: SUMMARY Fisher's theory of maximum likelihood estimation routinely provides approximate confidence intervals for a parameter of interest 0, the standard intervals 0? Za5,s where 0 is the maximum likelihood estimator, 5' is an estimate of standard error based on differentiation of the log likelihood function, and Za is a normal percentile point Recent work has produced systems of better approximate confidence intervals, which look more like exact intervals when exact intervals exist, and in general have coverage probabilities an order of magnitude more accurate than the standard intervals This paper develops an efficient and dependable algorithm for calculating highly accurate approximate intervals on a routine basis, for parameters 0 defined in the framework of a multiparameter exponential family The better intervals require only a few times as much computational effort as the standard intervals A variety of numerical and theoretical arguments are used to show that the algorithm works well, and that the improvement over the standard intervals can be striking in realistic situations

Journal ArticleDOI
TL;DR: The Web of Science Record was created on 2006-04-04, modified on 2016-08-08 as discussed by the authors, and was used for the Web of science Record (WORR).
Abstract: Reference STAP-ARTICLE-1992-004View record in Web of Science Record created on 2006-04-04, modified on 2016-08-08

Journal ArticleDOI
TL;DR: Koopman et al. as mentioned in this paper presented an exact score for time series models in state space form in Biometrika, 79(40, 823-826).
Abstract: The full-text of this article is not currently available in ORA. Citation: Koopman, S. J. & Shephard, N. (1992). 'Exact score for time series models in state space form', Biometrika, 79(40, 823-826. [Available at http://biomet.oxfordjournals.org/].

Journal ArticleDOI
TL;DR: In this paper, it was shown that the use of a normal approximation to the modified signed likelihood ratio statistic r* is equivalent to using a saddlepoint approximation, where r is of order In.
Abstract: SUMMARY For a number of tests in exponential families we show that the use of a normal approximation to the modified signed likelihood ratio statistic r* is equivalent to the use of a saddlepoint approximation. This is also true in a large deviation region where the signed likelihood ratio statistic r is of order In.

Journal ArticleDOI
TL;DR: A unified approach for constructing large-sample optimal designs when the optimality criterion is of the minimax type is presented and an equivalence theorem is formulated and computer algorithms for generating minimax optimal designs are proposed.
Abstract: �������� SUMMARY We present a unified approach for constructing large-sample optimal designs when the optimality criterion is of the minimax type. The assumed model is a general linear regression model with a known efficiency function defined on a closed and bounded space. An equivalence theorem is formulated and computer algorithms for generating minimax optimal designs are proposed. It is shown that this methodology is simple and rather general. It can be used to construct, for example, minimax variance optimal designs, minimax with respect to the single parameters designs and E-optimal designs. An application of this procedure to find an optimal design for minimizing the maximum predictive variance over a compact region in a heteroscedastic design problem is included.

Journal ArticleDOI
TL;DR: In this article, a method of obtaining an M-estimator in a linear model when the responses are subject to right censoring is proposed, and the central limit theorem for the estimator using squared error loss, i.e. least squares, is derived using counting process martingale techniques.
Abstract: SUMMARY We propose a method of obtaining an M-estimator in a linear model when the responses are subject to right censoring. The central limit theorem for the estimator using squared error loss, i.e. least squares, is derived using counting process martingale techniques. The estimation method is applied to the Stanford heart transplant data for illustration.

Journal ArticleDOI
TL;DR: In this paper, two types of nonparametric estimators based on kernel smoothing methods are considered, one is obtained by convolving a kernel with a cumulative hazard estimator and the second one is in the form of a ratio of two statistics.
Abstract: SUMMARY Left truncation and right censoring arise frequently in practice for life data. This paper is concerned with the estimation of the hazard rate function for such data. Two types of nonparametric estimators based on kernel smoothing methods are considered. The first one is obtained by convolving a kernel with a cumulative hazard estimator. The second one is in the form of a ratio of two statistics. Local properties including consistency, asymptotic normality and mean squared error expressions are presented for both estimators. These properties facilitate locally adaptive bandwidth choice. The two types of estimators are then compared based on their theoretical and empirical performances. The effect of overlooking the truncation factor is demonstrated through the Channing House data.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the estimators currently used for this type of experiment, and suggest some new ones, and discuss their properties, particularly their bias and variance, with the aim of choosing one or two estimators which are the 'best' to use, especially in handling a heterogeneous population.
Abstract: SUMMARY One way in which a capture-recapture experiment can be designed is to count each capture occasion as a separate sample, and tally the number of individuals caught once, twice, and so on. This paper discusses the estimators currently used for this type of experiment, and suggests some new ones. Each of the estimators is categorized by its derivation. Following this, their properties are discussed, particularly their bias and variance, with the aim of choosing one or two of the estimators which are the 'best' to use, especially in handling a heterogeneous population. It is found that the bias adjusted estimator of Chao (1989) is the best to use when the number of captures is relatively small, and that the estimator of Darroch & Ratcliff (1980) should be used otherwise.

Journal ArticleDOI
TL;DR: This paper explored the impact of variable selection on statistical inferences in linear regression models and found that the size of the nominal confidence sets tend to be inflated if they are derived based on the selected model.
Abstract: SUMMARY We explore the impact of variable selection on statistical inferences in linear regression models. In particular, the generalized final prediction error criterion of Shibata (1984) is considered and it is found, among other things, that inferences on the regression coefficients are impaired by the variable selection procedure. Most notably, the size of the nominal confidence sets tend to be inflated if they are derived based on the selected model. On the other hand, variable selection does not seem to have much impact on the inferences for the error variance. Our results complement those obtained by Potscher (1991) in which testing procedures are used for variable selection.

Journal ArticleDOI
TL;DR: In this article, an approximate pivot is constructed for estimating a normal mean θ following a truncated sequential probability ratio test and shown to provide a useful method for constructing confidence bounds and intervals.
Abstract: An approximate pivot is constructed for the problem of estimating a normal mean θ following a truncated sequential probability ratio test and shown to provide a useful method for constructing confidence bounds and intervals. Letting t denote the sample size and S t the sum of observations, the approximate pivot is constructed by standardizing S t * =t 1/2 (S t -tθ) the mean and variance of which are no longer 0 and 1, due to the optional stopping

Journal ArticleDOI
TL;DR: General aspects of nonlinearity in the context of component of variance models are discussed, and an approximate likelihood is proposed and its accurate performance is examined numerically using examples of exponential regression and the analysis of several related 2 x 2 tables.
Abstract: SUMMARY General aspects of nonlinearity in the context of component of variance models are discussed, and two special topics are examined in detail. Firstly, simple procedures, both formal and informal, are proposed for describing departures from normal-theory linear models. Transformation models are shown to be a special case of a more general formulation, and data on blood pressure are analyzed in illustration. Secondly, an approximate likelihood is proposed and its accurate performance is examined numerically using examples of exponential regression and the analysis of several related 2 x 2 tables. In the latter example, the approximate score test has improved power over the MantelHaenszel test.

Journal ArticleDOI
TL;DR: In this article, the exact joint distribution theory for the size and shape of planar Gaussian configurations is used to estimate the shape of mouse vertebrae, which is an extension of earlier work which considered marginal shape analysis.
Abstract: SUMMARY The paper deals with statistical analysis of landmark data using the exact joint distribution theory for the size and shape of planar Gaussian configurations. This is an extension of earlier work which considered marginal shape analysis. Special cases of the size and shape distribution are examined and the isotropic Gaussian model is investigated in particular detail. Various properties are studied, including conditional distributions and a curve of regression. Finally we consider some possible approaches to inference, including exact maximum likelihood estimation of size and shape. A practical biological application is given which investigates the size and shape of mouse vertebrae.