scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1995"


Journal ArticleDOI

3,250 citations


Journal ArticleDOI
TL;DR: Two methods are given for the joint estimation of parameters in models for competing risks in survival analysis, fitted using a data duplication method for Cox's proportional hazards regression model.
Abstract: Two methods are given for the joint estimation of parameters in models for competing risks in survival analysis. In both cases Cox's proportional hazards regression model is fitted using a data duplication method. In principle either method can be used for any number of different failure types, assuming independent risks. Advantages of the augmented data approach are that it limits over-parametrisation and it runs immediately on existing software. The methods are used to reanalyse data from two well-known published studies, providing new insights.

911 citations


Journal ArticleDOI
TL;DR: In this article, a strategy of using an average information matrix is shown to be computationally convenient and efficient for estimating variance components by restricted maximum likelihood (REML) in the mixed linear model.
Abstract: A strategy of using an average information matrix is shown to be computationally convenient and efficient for estimating variance components by restricted maximum likelihood (REML) in the mixed linear model. Three applications are described. The motivation for the algorithm was the estimation of variance components in the analysis of wheat variety means from 1,071 experiments representing 10 years and 60 locations in New South Wales. We also apply the algorithm to the analysis of designed experiments by incomplete block analysis and spatial analysis of field experiments.

868 citations


Journal ArticleDOI
TL;DR: Both parametric and semi-parametric estimators of the association parameter are efficient at independence, and the parameter estimates in the margins have high efficiency and are robust to misspecification of dependency structures.
Abstract: We investigate two-stage parametric and two-stage semi-parametric estimation procedures for the association parameter in copula models for bivariate survival data where censoring in either or both components is allowed. We derive asymptotic properties of the estimators and compare their performance by simulations. Both parametric and semi-parametric estimators of the association parameter are efficient at independence, and the parameter estimates in the margins have high efficiency and are robust to misspecification of dependency structures. In addition, we propose a consistent variance estimator for the semi-parametric estimator of the association parameter. We apply the proposed methods to an AIDS data set for illustration.

648 citations



Journal ArticleDOI
TL;DR: In this paper, a flexible method of extending a study based on conditional power is proposed, where the significance of the treatment difference at the planned end is used to determine the number of additional observations needed and the critical value necessary for use after accruing those additional observations.
Abstract: We propose a flexible method of extending a study based on conditional power. The possibility for extension when the p value at the planned end is small but not statistically significant is built in to the design of the study. The significance of the treatment difference at the planned end is used to determine the number of additional observations needed and the critical value necessary for use after accruing those additional observations. It may therefore be thought of as a two-stage procedure. Even though the observed treatment difference at stage 1 is used to make decisions, the Type I error rate is protected.

441 citations


Journal ArticleDOI
TL;DR: This manual describes how GLIM 4 may be used for statistical analysis in its most general sense, including data manipulation and display, model fitting, and prediction.
Abstract: This manual describes how GLIM 4 may be used for statistical analysis in its most general sense, including data manipulation and display, model fitting, and prediction. This is a thorough re-working of the previous GLIM manual to take account of updates to the software - essential reading for all research statisticians everywhere.

423 citations


Journal ArticleDOI
TL;DR: A formal modelling framework for analysis of data obtained using the robust design of the Jolly-Seber method is provided and likelihood functions for the complete data structure under a variety of models are developed and examined.
Abstract: The Jolly-Seber method has been the traditional approach to the estimation of demographic parameters in long-term capture-recapture studies of wildlife and fish species. This method involves restrictive assumptions about capture probabilities that can lead to biased estimates, especially of population size and recruitment. Pollock (1982, Journal of Wildlife Management 46, 752-757) proposed a sampling scheme in which a series of closely spaced samples were separated by longer intervals such as a year. For this "robust design," Pollock suggested a flexible ad hoc approach that combines the Jolly-Seber estimators with closed population estimators, to reduce bias caused by unequal catchability, and to provide estimates for parameters that are unidentifiable by the Jolly-Seber method alone. In this paper we provide a formal modelling framework for analysis of data obtained using the robust design. We develop likelihood functions for the complete data structure under a variety of models and examine the relationship among the models. We compute maximum likelihood estimates for the parameters by applying a conditional argument, and compare their performance against those of ad hoc and Jolly-Seber approaches using simulation.

376 citations



Journal ArticleDOI
TL;DR: A small sample criterion (AICc) for the selection of extended quasi-likelihood models provides a more nearly unbiased estimator for the expected Kullback-Leibler information and often selects better models than AIC in small samples.
Abstract: We develop a small sample criterion (AICc) for the selection of extended quasi-likelihood models. In contrast to the Akaike information criterion (AIC). AICc provides a more nearly unbiased estimator for the expected Kullback-Leibler information. Consequently, it often selects better models than AIC in small samples. For the logistic regression model, Monte Carlo results show that AICc outperforms AIC, Pregibon's (1979, Data Analytic Methods for Generalized Linear Models. Ph.D. thesis. University of Toronto) Cp*, and the Cp selection criteria of Hosmer et al. (1989, Biometrics 45, 1265-1270). Two examples are presented.

310 citations


Journal ArticleDOI
TL;DR: This paper develops a class of models to deal with missing data from longitudinal studies that allow the primary response, conditional on the random parameter, to follow a generalizedlinear model and approximate the generalized linear model by conditioning on the data that describes missingness.
Abstract: This paper develops a class of models to deal with missing data from longitudinal studies. We assume that separate models for the primary response and missingness (e.g., number of missed visits) are linked by a common random parameter. Such models have been developed in the econometrics (Heckman, 1979, Econometrica 47, 153-161) and biostatistics (Wu and Carroll, 1988, Biometrics 44, 175-188) literature for a Gaussian primary response. We allow the primary response, conditional on the random parameter, to follow a generalized linear model and approximate the generalized linear model by conditioning on the data that describes missingness. The resultant approximation is a mixed generalized linear model with possibly heterogeneous random effects. An example is given to illustrate the approximate approach, and simulations are performed to critique the adequacy of the approximation for repeated binary data.

Journal ArticleDOI
TL;DR: In this paper, a method is described for estimating the relative incidence of clinical events in defined time intervals after vaccination compared to a control period using only data on cases, derived from a Poisson cohort model by conditioning on the occurrence of an event and on vaccination histories.
Abstract: A method is described for estimating the relative incidence of clinical events in defined time intervals after vaccination compared to a control period using only data on cases. The method is derived from a Poisson cohort model by conditioning on the occurrence of an event and on vaccination histories. Methods of analysis for event-dependent vaccination histories and survival data are discussed. Asymptotic arguments suggest that the method retains high efficiency relative to the full cohort analysis under conditions which commonly apply to studies of vaccine safety.


Journal ArticleDOI
TL;DR: In this paper, a score test is presented to test whether the number of zeros is too large for a Poisson distribution to fit the data well, when analyzing Poisson-count data sometimes a lot of zero are observed.
Abstract: When analyzing Poisson-count data sometimes a lot of zeros are observed. When there are too many zeros a zero-inflated Poisson distribution can be used. A score test is presented to test whether the number of zeros is too large for a Poisson distribution to fit the data well.

Journal ArticleDOI
TL;DR: Phase II study designs are proposed that evaluate both clinical response and toxicity, and that are similar in structure to Simon's two-stage designs.
Abstract: Phase II study designs are proposed that evaluate both clinical response and toxicity, and that are similar in structure to Simon's two-stage designs. Sample sizes and decision criteria are chosen to minimize the maximum expected accrual, given that the treatment is unacceptable either in terms of clinical response or toxicity. This is achieved subject to control of error rates, either uniformly over all possible correlation structures linking response and toxicity, or alternatively, under an assumption of independence between response and toxicity. In the latter case, bounds on the error rates show that effective control is still uniformly achieved even if the independence assumption is relaxed.

Journal ArticleDOI
TL;DR: The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982), and a logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part ofThe model.
Abstract: A mixture model is an attractive approach for analyzing failure time data in which there are thought to be two groups of subjects, those who could eventually develop the endpoint and those who could not develop the endpoint. The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982). A logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part of the model. The estimator arises naturally out of the EM algorithm approach for fitting failure time mixture models as described by Larson and Dinse (1985). The procedure is applied to some experimental data from radiation biology and is evaluated in a Monte Carlo simulation study. The simulation study suggests the semi-parametric procedure is almost as efficient as the correct fully parametric procedure for estimating the regression coefficient in the incidence, but less efficient for estimating the latency distribution.


Journal ArticleDOI
TL;DR: This paper considers logistic regression models for multivariate binary responses, where the association between the responses is largely regarded as a nuisance characteristic of the data, and considers the estimator based on independence estimating equations (IEE), which assumes that the responses are independent.
Abstract: Clustered binary data occur commonly in both the biomedical and health sciences In this paper, we consider logistic regression models for multivariate binary responses, where the association between the responses is largely regarded as a nuisance characteristic of the data In particular, we consider the estimator based on independence estimating equations (IEE), which assumes that the responses are independent This estimator has been shown to be nearly efficient when compared with maximum likelihood (ML) and generalized estimating equations (GEE) in a variety of settings The purpose of this paper is to highlight a circumstance where assuming independence can lead to quite substantial losses of efficiency In particular, when the covariate design includes within-cluster covariates, assuming independence can lead to a considerable loss of efficiency in estimating the regression parameters associated with those covariates

Journal ArticleDOI
TL;DR: This paper evaluates information theoretic approaches to selection of a parsimonious model and compares them to the use of likelihood ratio tests using four a levels and finds two information theoretics criteria have a balance between underfitting and overfitting when compared to models where the average minimum RSS was known.
Abstract: Analysis of capture-recapture data is critically dependent upon selection of a proper model for inference. Model selection is particularly important in the analysis of multiple, interrelated data sets. This paper evaluates information theoretic approaches to selection of a parsimonious model and compares them to the use of likelihood ratio tests using four a levels. The purpose of the evaluation is to compare model selection strategies based on the quality of the inference, rather than on the degree to which differing selection strategies select the "true model." A measure of squared bias and variance (termed RSS) is used as a basis for comparing different data-based selection strategies, assuming that a minimum RSS value is a reasonable target. In general, the information theoretic approaches consistently selected models with a smaller RSS than did the likelihood ratio testing approach. Two information theoretic criteria have a balance between underfitting and overfitting when compared to models where the average minimum RSS was known. Other findings are presented along with a discussion of the concept of a "true model" and dimension consistency in model selection.

Journal ArticleDOI
TL;DR: In this article, the classical F-test of the one-way ANOVA is extended to the case of unequal error variances, and an exact test for comparing variances of a number of populations is also developed.
Abstract: By taking a generalized approach to findingp values, the classical F-test of the one-way ANOVA is extended to the case of unequal error variances. The relationship of this result to other solutions in the literature is discussed. An exact test for comparing variances of a number of populations is also developed. Scheff6's procedure of multiple comparison is extended to the case of unequal variances. The possibility and the approach that one can take to extend the results to simple designs involving more than one factor are briefly discussed.

Journal ArticleDOI
TL;DR: This chapter discusses experimental design, statistical inference, analysis of variance and post-hoc analysis, and selection of statistical tests for research design problems.
Abstract: Introduction SECTION 1: EXPERIMENTAL DESIGN 1. The anatomy of an experiment 2. The anatomy of a scientific paper 3. Evaluation of a scientific article 4. Experimental Design SECTION 2: STATISTICAL INFERENCE, ANALYSIS OF VARIANCE AND POST-HOC 5. Statistical inference 6. Analysis of variance (ANOVA) 7. Post-hoc analysis SECTION 3: STATISTICAL TESTS 8. Parametric tests 9. Special ANOVA designs and analyses 10. Popular post-hoc multiple comparison tests 11. Nonparametric tests 12. Selection of statistical tests SECTION 4: RESEARCH DESIGN PROBLEMS AND THEIR CRITIQUES Research Design Problem: Sample Topic Index of Research Design Problems Postscript Appendix References

Journal ArticleDOI
TL;DR: In this paper, the relationship between concentration addition and independent action has been analyzed for the Weibull, logistic, and normal distribution functions and the response probability due to concentration addition exceeds that due to independent action and vice versa.
Abstract: The assessment of the combined effects of substances is usually based on one of two different concepts: concentration addition or independent action. Both concepts are founded on different pharmacological assumptions about sites and modes of actions of substances, but in toxicology and ecotoxicology such knowledge is rare for most chemicals. In order to validate experimental results and to allow for precautious assessments, the quantitative relationships between concentration addition and independent action are therefore of interest. In this paper, we derive for the Weibull, the logistic, and the normal distribution functions the concentrations where the response probability due to concentration addition exceeds that due to independent action and vice versa. This is done (a) by analytically comparing both models for low and high mixture concentrations and (b) by numerically calculating the response probabilities when concentration addition and independent action agree. It is shown that the relationships between the models for joint action depend on the distribution functions, the corresponding slope parameters, and on the mixture concentrations administered.

Journal ArticleDOI
TL;DR: A flexible parametric procedure is given to model the hazard function as a linear combination of cubic B-splines and to obtain maximum likelihood estimates from censored survival data, which yields smooth estimates of the hazard and survivorship functions that are intermediate in structure between strongly parametric and non-parametric models.
Abstract: SUMMARY A flexible parametric procedure is given to model the hazard function as a linear combination of cubic B-splines and to obtain maximum likelihood estimates from censored survival data The approach yields smooth estimates of the hazard and suivivorship functions that are intermediate in structure between strongly parametric and non-parametric models A simple method is described for selecting the number and location of knots Simulation results show favorable root mean square error compared to non-parametric estimates for both the hazard and survivorship functions Three methods are given to calculate confidence intervals based on the delta method, profile likelihood, and bootstrap, respectively The procedure is applied to estimate hazard rates for acquired immunodeficiency syndrome (AIDS) following infection with human immunodeficiency virus (HIV) Spline methods can accommodate complex censoring mechanisms such as those that arise in the AIDS setting To illustrate, HIV infection incidence is estimated for a cohort of hemophiliacs in which the dates of HIV infection are interval-censored and some subjects were born after the onset of the HIV epidemic This report develops flexible parametric methods based on splines to estimate the hazard function hi(t) that describes the rate of disease onset among persons still susceptible at time t The observed data derive from a random sample Yi, i = 1, n , n of disease onset times with survivorship function S(t) = PY > t} The Yi are right-censored by independent random variables Ci drawn from H(C) = P{C > c} One observes Xi = min(Yi, Ci) and 3i = I(Y > Ci), the indicator for the event that Yi is censored The hazard function

Journal ArticleDOI
TL;DR: A mixture model consisting of a censored lognormal distribution and a point distribution located below the detection limit is proposed for antibody concentration values as determined by quantitative assays, which could result in a high proportion of nonresponders to vaccine.
Abstract: Antibody concentration values as determined by quantitative assays often are left-censored due to detection limits or limits established for purpose of specificity. Standard analyses which assume the data arise from a single lognormal response distribution may not be appropriate, when more observations are censored than would be expected under such a model. Interference from maternal antibodies due to vaccination at an early age, for example, could result in a high proportion of nonresponders to vaccine. A mixture model consisting of a censored lognormal distribution and a point distribution located below the detection limit is proposed for such situations. Antibody data from a study of measles vaccine are used to illustrate the utility of this approach and the interpretation of the model parameters.

Journal ArticleDOI
TL;DR: It is shown how plots based on the residuals from a proportional hazards model may be used to reveal the correct functional form for covariates in the model.
Abstract: We show how plots based on the residuals from a proportional hazards model may be used to reveal the correct functional form for covariates in the model. A smoothed plot of the martingale residues was suggested for this purpose by Therneau, Grambsch, and Fleming (1990, Biometrika 77, 147-160); however, its consistency required that the covariates be independent. They also noted that the plot could be biased for large covariate effects. We introduce two refinements which overcome these difficulties. The first is based on a ratio of scatter plot smooths, where the numerator is the smooth of the observed count plotted against the covariate, and the denominator is a smooth of the expected count. This is related to the Arjas goodness-of-fit plot (1988, Journal of the American Statistical Association 83, 204-212). The second technique smooths the martingale residuals divided by the expected count, using expected count as a weight. This latter approach is related to a GLM partial residual plot, as well as to the iterative methods of Hastie and Tibshirani (1990, Biometrics 46, 1005-1016) and Gentleman and Crowley (1991, Biometrics 47, 1283-1296). Applications to survival data sets are given.

Journal ArticleDOI
TL;DR: The development and nature of palaeopathology the nature of sample the question of diagnoses measures of disease frequency comparing prevalences analytical epidemiology a guide to best practice a question of occupation.
Abstract: The development and nature of palaeopathology the nature of sample the question of diagnoses measures of disease frequency comparing prevalences analytical epidemiology a guide to best practice a question of occupation.

Journal ArticleDOI
TL;DR: A simulation study demonstrates the importance of accurate modeling of the spatial correlation structure in data with large numbers of spatially correlated observations such as those found in neuroimaging studies.
Abstract: This paper proposes a generalized estimating equations approach for the analysis of spatially correlated binary data when there are large numbers of spatially correlated observations on a moderate number of subjects. This approach is useful when the scientific focus is on modeling the marginal mean structure. Proper modeling of the spatial correlation structure is shown to provide large efficiency gains along with precise standard error estimates for inference on mean structure parameters. Generalized estimating equations for estimating the parameters of both the mean and spatial correlation structure are proposed. The use of semivariogram models for parameterizing the correlation structure is discussed, and estimation of the sample semivariogram is proposed as a technique for choosing parametric models and starting values for generalized estimating equations estimation. The methodology is illustrated with neuroimaging data collected as part of the National Institute of Neurological Disorders and Stroke (NINDS) Stroke Data Bank. A simulation study demonstrates the importance of accurate modeling of the spatial correlation structure in data with large numbers of spatially correlated observations such as those found in neuroimaging studies.

Journal ArticleDOI
TL;DR: In this article, the authors analyze errors-in-variables models in full generality under a Bayesian formulation, and compute the necessary posterior distributions, utilizing various computational techniques.
Abstract: SUMMARY Use of errors-in-variables models is appropriate in many practical experimental problems. However, inference based on such models is by no means straightforward. In previous analyses, simplifying assumptions have been made in order to ease this intractability, but assumptions of this nature are unfortunate and restrictive. In this paper, we analyse errors-in-variables models in full generality under a Bayesian formulation. In order to compute the necessary posterior distributions, we utilize various computational techniques. Two specific non-linear errors-in-variables regression examples are considered; the first is a re-analysed Berkson-type model, and the second is a classical errors-in-variables model. Our analyses are compared and contrasted with those presented elsewhere in the literature.

Journal ArticleDOI
TL;DR: In this paper, an analysis of covariance model where the covariate effect is assumed only to be smooth is considered, and tests of equality and of parallelism across groups are constructed.
Abstract: An analysis of covariance model where the covariate effect is assumed only to be smooth is considered. The possibility of different shapes of covariate effect in different groups is also allowed and tests of equality and of parallelism across groups are constructed. These are implemented using Gasser-Maller smoothing, whose properties enable problems of bias to be avoided. Accurate moment-based approximations are available for the distribution of each test statistic. Some data on Spanish Onions are used to contrast the non-parametric approach with that of a nonlinear, but parametric, model. A simulation study is also used to explore the properties of the non-parametric tests.