scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 2000"


Journal ArticleDOI
TL;DR: In this paper, a rank-based data augmentation technique is proposed for estimating the number of missing studies that might exist in a meta-analysis and the effect that these studies might have had on its outcome.
Abstract: We study recently developed nonparametric methods for estimating the number of missing studies that might exist in a meta-analysis and the effect that these studies might have had on its outcome. These are simple rank-based data augmentation techniques, which formalize the use of funnel plots. We show that they provide effective and relatively powerful tests for evaluating the existence of such publication bias. After adjusting for missing studies, we find that the point estimate of the overall effect size is approximately correct and coverage of the effect size confidence intervals is substantially improved, in many cases recovering the nominal confidence levels entirely. We illustrate the trim and fill method on existing meta-analyses of studies in clinical trials and psychometrics.

9,163 citations


Journal ArticleDOI
TL;DR: This brief note presents a general proof that the estimator is unbiased for cluster-correlated data regardless of the setting.
Abstract: There is a simple robust variance estimator for cluster-correlated data. While this estimator is well known, it is poorly documented, and its wide range of applicability is often not understood. The estimator is widely used in sample survey research, but the results in the sample survey literature are not easily applied because of complications due to unequal probability sampling. This brief note presents a general proof that the estimator is unbiased for cluster-correlated data regardless of the setting. The result is not new, but a simple and general reference is not readily available. The use of the method will benefit from a general explanation of its wide applicability.

2,506 citations


Journal ArticleDOI
TL;DR: This work proposes summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which is presented as ROC(t), and presents an example where ROC (t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer.
Abstract: ROC curves are a popular method for displaying sensitivity and specificity of a continuous diagnostic marker, X, for a binary disease variable, D. However, many disease outcomes are time dependent, D(t), and ROC curves that vary as a function of time may be more appropriate. A common example of a time-dependent variable is vital status, where D(t) = 1 if a patient has died prior to time t and zero otherwise. We propose summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which we denote as ROC(t). A typical complexity with survival data is that observations may be censored. Two ROC curve estimators are proposed that can accommodate censored data. A simple estimator is based on using the Kaplan-Meier estimator for each possible subset X > c. However, this estimator does not guarantee the necessary condition that sensitivity and specificity are monotone in X. An alternative estimator that does guarantee monotonicity is based on a nearest neighbor estimator for the bivariate distribution function of (X, T), where T represents survival time (Akritas, M. J., 1994, Annals of Statistics 22, 1299-1327). We present an example where ROC(t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer and an example where the ROC(t) curve displays the impact of modifying eligibility criteria for sample size and power in HIV prevention trials.

2,177 citations


Journal ArticleDOI
TL;DR: In this article, the authors adapt Lambert's methodology to an upper bounded count situation, thereby obtaining a zero-inflated binomial (ZIP) model, and add to the flexibility of these fixed effects models by incorporating random effects so that, e.g., the within-subject correlation and between-subject heterogeneity typical of repeated measures data can be accommodated.
Abstract: Summary. In a 1992 Technometrics paper, Lambert (1992, 34, 1–14) described zero-inflated Poisson (ZIP) regression, a class of models for count data with excess zeros. In a ZIP model, a count response variable is assumed to be distributed as a mixture of a Poisson(λ) distribution and a distribution with point mass of one at zero, with mixing probability p. Both p and λ are allowed to depend on covariates through canonical link generalized linear models. In this paper, we adapt Lambert's methodology to an upper bounded count situation, thereby obtaining a zero-inflated binomial (ZIP) model. In addition, we add to the flexibility of these fixed effects models by incorporating random effects so that, e.g., the within-subject correlation and between-subject heterogeneity typical of repeated measures data can be accommodated. We motivate, develop, and illustrate the methods described here with an example from horticulture, where both upper bounded count (binomial-type) and unbounded count (Poisson-type) data with excess zeros were collected in a repeated measures designed experiment.

829 citations


Journal ArticleDOI
TL;DR: Evidence is presented that bactrim therapy improves survival but that the standard intent‐to‐treat comparison failed to detect this survival advantage because a large fraction of the subjects either crossed over to the other therapy or stopped therapy altogether.
Abstract: AIDS Clinical Trial Group (ACTG) randomized trial 021 compared the effect of bactrim versus aerosolized pentamidine (AP) as prophylaxis therapy for pneumocystis pneumonia (PCP) in AIDS patients. Although patients randomized to the bactrim arm experienced a significant delay in time to PCP, the survival experience in the two arms was not significantly different (p = .32). In this paper, we present evidence that bactrim therapy improves survival but that the standard intent-to-treat comparison failed to detect this survival advantage because a large fraction of the subjects either crossed over to the other therapy or stopped therapy altogether. We obtain our evidence of a beneficial bactrim effect on survival by artificially regarding the subjects as dependently censored at the first time the subject either stops or switches therapy; we then analyze the data with the inverse probability of censoring weighted Kaplan-Meier and Cox partial likelihood estimators of Robins (1993, Proceedings of the Biopharmaceutical Section, American Statistical Association, pp. 24-33) that adjust for dependent censoring by utilizing data collected on time-dependent prognostic factors.

679 citations


Journal ArticleDOI
TL;DR: Maximum likelihood techniques for the joint estimation of the incidence and latency regression parameters in this model are developed using the nonparametric form of the likelihood and an EM algorithm and are applied to a data set of tonsil cancer patients treated with radiation therapy.
Abstract: Summary. Some failure time data come from a population that consists of some subjects who are susceptible to and others who are nonsusceptible to the event of interest. The data typically have heavy censoring at the end of the follow-up period, and a standard survival analysis would not always be appropriate. In such situations where there is good scientific or empirical evidence of a nonsusceptible population, the mixture or cure model can be used (Farewell, 1982, Biometrics38, 1041–1046). It assumes a binary distribution to model the incidence probability and a parametric failure time distribution to model the latency. Kuk and Chen (1992, Biometrika79, 531–541) extended the model by using Cox's proportional hazards regression for the latency. We develop maximum likelihood techniques for the joint estimation of the incidence and latency regression parameters in this model using the nonparametric form of the likelihood and an EM algorithm. A zero-tail constraint is used to reduce the near nonidentifiability of the problem. The inverse of the observed information matrix is used to compute the standard errors. A simulation study shows that the methods are competitive to the parametric methods under ideal conditions and are generally better when censoring from loss to follow-up is heavy. The methods are applied to a data set of tonsil cancer patients treated with radiation therapy.

549 citations


Journal ArticleDOI
TL;DR: Finite mixture partitions of animals and/or samples are used to give a unified linear-logistic framework for fitting all eight models of Otis et al. by maximum likelihood.
Abstract: Summary. Agresti (1994, Biometrics50, 494–500) and Norris and Pollock (1996a, Biometrics52, 639–649) suggested using methods of finite mixtures to partition the animals in a closed capture-recapture experiment into two or more groups with relatively homogeneous capture probabilities. This enabled them to fit the models Mh, Mbh (Norris and Pollock), and Mth (Agresti) of Otis et al. (1978, Wildlife Monographs62, 1–135). In this article, finite mixture partitions of animals and/or samples are used to give a unified linear-logistic framework for fitting all eight models of Otis et al. by maximum likelihood. Likelihood ratio tests are available for model comparisons. For many data sets, a simple dichotomy of animals is enough to substantially correct for heterogeneity-induced bias in the estimation of population size, although there is the option of fitting more than two groups if the data warrant it.

516 citations


Journal ArticleDOI
TL;DR: A simulation study shows the method's accuracy and safety are comparable with CRM's while the former takes a much shorter trial duration: a trial that would take up to 12 years to complete by the CRM could be reduced to 2–4 years by the TITE‐CRM.
Abstract: Traditional designs for phase I clinical trials require each patient (or small group of patients) to be completely followed before the next patient or group is assigned. In situations such as when evaluating late-onset effects of radiation or toxicities from chemopreventive agents, this may result in trials of impractically long duration. We propose a new method, called the time-to-event continual reassessment method (TITE-CRM), that allows patients to be entered in a staggered fashion. It is an extension of the continual reassessment method (CRM; O'Quigley, Pepe, and Fisher, 1990, Biometrics 46, 33-48). We also note that this time-to-toxicity approach can be applied to extend other designs for studies of short-term toxicities. We prove that the recommended dose given by the TITE-CRM converges to the correct level under certain conditions. A simulation study shows our method's accuracy and safety are comparable with CRM's while the former takes a much shorter trial duration: a trial that would take up to 12 years to complete by the CRM could be reduced to 2-4 years by our method.

464 citations


Journal ArticleDOI
TL;DR: This paper examines in detail the interpretation of both fixed effects and random effects parameters inLogistic regression with random effects, and discusses different alternative measures of heterogeneity and suggests using a median odds ratio measure that is a function of the originalrandom effects parameters.
Abstract: Logistic regression with random effects is used to study the relationship between explanatory variables and a binary outcome in cases with nonindependent outcomes. In this paper, we examine in detail the interpretation of both fixed effects and random effects parameters. As heterogeneity measures, the random effects parameters included in the model are not easily interpreted. We discuss different alternative measures of heterogeneity and suggest using a median odds ratio measure that is a function of the original random effects parameters. The measure allows a simple interpretation, in terms of well-known odds ratios, that greatly facilitates communication between the data analyst and the subject-matter researcher. Three examples from different subject areas, mainly taken from our own experience, serve to motivate and illustrate different aspects of parameter interpretation in these models.

417 citations


Journal ArticleDOI
TL;DR: A general nonparametric mixture model that extends models and improves estimation methods proposed by other researchers and extends Cox's proportional hazards regression model by allowing a proportion of event-free patients and investigating covariate effects on that proportion.
Abstract: Nonparametric methods have attracted less attention than their parametric counterparts for cure rate analysis. In this paper, we study a general nonparametric mixture model. The proportional hazards assumption is employed in modeling the effect of covariates on the failure time of patients who are not cured. The EM algorithm, the marginal likelihood approach, and multiple imputations are employed to estimate parameters of interest in the model. This model extends models and improves estimation methods proposed by other researchers. It also extends Cox's proportional hazards regression model by allowing a proportion of event-free patients and investigating covariate effects on that proportion. The model and its estimation method are investigated by simulations. An application to breast cancer data, including comparisons with previous analyses using a parametric model and an existing nonparametric model by other researchers, confirms the conclusions from the parametric model but not those from the existing nonparametric model.

404 citations


Journal ArticleDOI
TL;DR: This work proposes a model based on multiplicative frailties with a multivariate log-normal joint distribution with a generalization of the one presented by McGilchrist (1993, Biometrics 49, 221-225) based on Laplace approximation of the likelihood function.
Abstract: Summary. There exists a growing literature on the estimation of gamma distributed multiplicative shared frailty models. There is, however, often a need to model more complicated frailty structures, but attempts to extend gamma frailties run into complications. Motivated by hip replacement data with a more complicated dependence structure, we propose a model based on multiplicative frailties with a multivariate log-normal joint distribution. We give a justification and an estimation procedure for this generally structured frailty model, which is a generalization of the one presented by McGilchrist (1993, Biometrics49, 221-225). The estimation is based on Laplace approximation of the likelihood function. This leads to estimating equations based on a penalized fixed effects partial likelihood, where the marginal distribution of the frailty terms determines the penalty term. The tuning parameters of the penalty function, i.e., the frailty variances, are estimated by maximizing an approximate profile likelihood. The performance of the approximation is evaluated by simulation, and the frailty model is fitted to the hip replacement data.

Journal ArticleDOI
TL;DR: The proposed statistic is a score statistic derived from a marginal regression model and bears some relation to McNemar's statistic, which can be considered as an analog of McNemars test for the problem of comparing predictive values, parameters that condition on test outcome.
Abstract: Positive and negative predictive values of a diagnostic test are key clinically relevant measures of test accuracy. Surprisingly, statistical methods for comparing tests with regard to these parameters have not been available for the most common study design in which each test is applied to each study individual. In this paper, we propose a statistic for comparing the predictive values of two diagnostic tests using this paired study design. The proposed statistic is a score statistic derived from a marginal regression model and bears some relation to McNemar's statistic. As McNemar's statistic can be used to compare sensitivities and specificities of diagnostic tests, parameters that condition on disease status, our statistic can be considered as an analog of McNemar's test for the problem of comparing predictive values, parameters that condition on test outcome. We report on the results of a simulation study designed to examine the properties of this test under a variety of conditions. The method is illustrated with data from a study of methods for diagnosis of coronary artery disease.

Journal ArticleDOI
TL;DR: This work proposes a nonparametric Bayesian approach for the detection of clusters of elevated (or lowered) risk based on Green's (1995, Biometrika 82, 711-732) reversible jump MCMC methodology.
Abstract: Summary. An interesting epidemiological problem is the analysis of geographical variation in rates of disease incidence or mortality. One goal of such an analysis is to detect clusters of elevated (or lowered) risk in order to identify unknown risk factors regarding the disease. We propose a nonparametric Bayesian approach for the detection of such clusters based on Green's (1995, Biometrika82, 711–732) reversible jump MCMC methodology. The prior model assumes that geographical regions can be combined in clusters with constant relative risk within a cluster. The number of clusters, the location of the clusters, and the risk within each cluster is unknown. This specification can be seen as a change-point problem of variable dimension in irregular, discrete space. We illustrate our method through an analysis of oral cavity cancer mortality rates in Germany and compare the results with those obtained by the commonly used Bayesian disease mapping method of Besag, York, and Mollie (1991, Annals of the Institute of Statistical Mathematics, 43, 1–59).

Journal ArticleDOI
TL;DR: A revision of the penalty term in BIC is proposed so that it is defined in terms of the number of uncensored events instead of thenumber of observations, which corresponds to a more realistic prior on the parameter space and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.
Abstract: We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995, Journal of the American Statistical Association 90, 928-934) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is defined in terms of the number of uncensored events instead of the number of observations. For a simple censored data model, this revision results in a better approximation to the exact Bayes factor based on a conjugate unit-information prior. In the Cox proportional hazards regression model, we propose defining BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.

Journal Article
TL;DR: A sensitivity analysis is suggested which is based on fitting a model to the funnel plot, and it is suggested that studies available for review are biased in favour of those with positive outcomes.
Abstract: Publication bias is a major problem, perhaps the major problem, in meta-analysis (or systematic reviews). Small studies are more likely to be published if their results are 'significant' than if their results are negative or inconclusive, and so the studies available for review are biased in favour of those with positive outcomes. Correcting for this bias is not possible without making untestable assumptions. In this paper, a sensitivity analysis is suggested which is based on fitting a model to the funnel plot. Some examples are discussed.

Journal ArticleDOI
TL;DR: Simulations and a psychiatric example are presented to demonstrate the effective use of procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using MarkovChain Monte Carlo techniques.
Abstract: In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the intention of hierarchical modeling. Problems arise when it is not clear how many disease classes are appropriate, creating a need for model selection and diagnostic techniques. Previous work has shown that the Pearson chi 2 statistic and the log-likelihood ratio G2 statistic are not valid test statistics for evaluating latent class models. Other methods, such as information criteria, provide decision rules without providing explicit information about where discrepancies occur between a model and the data. Identifiability issues further complicate these problems. This paper develops procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using Markov chain Monte Carlo techniques. Simulations and a psychiatric example are presented to demonstrate the effective use of these methods.

Journal ArticleDOI
TL;DR: Nonparametric statistics for comparing two mean frequency functions and for combining data on recurrent events and death, together with consistent variance estimators, are developed and an application to a cancer clinical trial is provided.
Abstract: This article is concerned with the analysis of recurrent events in the presence of a terminal event such as death. We consider the mean frequency function, defined as the marginal mean of the cumulative number of recurrent events over time. A simple nonparametric estimator for this quantity is presented. It is shown that the estimator, properly normalized, converges weakly to a zero-mean Gaussian process with an easily estimable covariance function. Nonparametric statistics for comparing two mean frequency functions and for combining data on recurrent events and death are also developed. The asymptotic null distributions of these statistics, together with consistent variance estimators, are derived. The small-sample properties of the proposed estimators and test statistics are examined through simulation studies. An application to a cancer clinical trial is provided.

Journal ArticleDOI
TL;DR: It is shown that inference can be achieved with binary regression techniques applied to indicator variables constructed from pairs of test results, one component of the pair being from a diseasedsubject and the other from a non diseased subject.
Abstract: The accuracy of a medical diagnostic test is often summarized in a receiver operating characteristic (ROC) curve. This paper puts forth an interpretation for each point on the ROC curve as being a conditional probability of a test result from a random diseased subject exceeding that from a random nondiseased subject. This interpretation gives rise to new methods for making inference about ROC curves. It is shown that inference can be achieved with binary regression techniques applied to indicator variables constructed from pairs of test results, one component of the pair being from a diseased subject and the other from a nondiseased subject. Within the generalized linear model (GLM) binary regression framework, ROC curves can be estimated, and we highlight a new semiparametric estimator. Covariate effects can also be evaluated with the GLM models. The methodology is applied to a pancreatic cancer dataset where we use the regression framework to compare two different serum biomarkers. Asymptotic distribution theory is developed to facilitate inference and to provide insight into factors influencing variability of estimated model parameters.

Journal ArticleDOI
TL;DR: It is shown that the likely amount of relative or absolute predictive accuracy is often low even if there are highly significant and relatively strong prognostic factors.
Abstract: Summary. We suggest a new measure of the proportion of the variation of possibly censored survival times explained by a given proportional hazards model. The proposed measure, termed V, shares several favorable properties with an earlier V1 but also improves the handling of censoring. The statistic contrasts distance measures between individual 1/0 survival processes and fitted survival curves with and without covariate information. These distance measures, Dx and D, respectively, are themselves informative as summaries of absolute rather than relative predictive accuracy. We recommend graphical comparisons of survival curves for prognostic index groups to improve the understanding of obtained values for V, Dx, and D. Their use and interpretation is exemplified for a Yorkshire lung cancer study on survival. From this and an overview for several well-known clinical data sets, we show that the likely amount of relative or absolute predictive accuracy is often low even if there are highly significant and relatively strong prognostic factors.

Journal ArticleDOI
TL;DR: In observational studies that match several controls to each treated subject, substantially greater bias reduction is possible if the number of controls is not fixed but rather is allowed to vary from one matched set to another.
Abstract: In observational studies that match several controls to each treated subject, substantially greater bias reduction is possible if the number of controls is not fixed but rather is allowed to vary from one matched set to another In certain cases, matching with a fixed number of controls may remove only 50% of the bias in a covariate, whereas matching with a variable number of controls may remove 90% of the bias, even though both control groups have the same number of controls in total An example of matching in a study of surgical mortality is discussed in detail

Journal ArticleDOI
TL;DR: In this paper, the authors review the major developments in wildlife population assessment in the past century, including mark-recapture, distance sampling, and harvest models, and speculate on how these fields will develop in the next century.
Abstract: We review the major developments in wildlife population assessment in the past century. Three major areas are considered: mark-recapture, distance sampling, and harvest models. We speculate on how these fields will develop in the next century. Topics for which we expect to see methodological advances include integration of modeling with Geographic Information Systems, automated survey design algorithms, advances in model-based inference from sample survey data, a common inferential framework for wildlife population assessment methods, improved methods for estimating population trends, the embedding of biological process models into inference, substantially improved models for conservation management, advanced spatiotemporal models of ecosystems, and greater emphasis on incorporating model selection uncertainty into inference. We discuss the kind of developments that might be anticipated in these topics.

Journal ArticleDOI
TL;DR: It is argued that regression models with random coefficients offer a more scientifically defensible framework for epidemiologic analysis than the fixed-effects models now prevalent in epidemiology.
Abstract: Regression models with random coefficients arise naturally in both frequentist and Bayesian approaches to estimation problems. They are becoming widely available in standard computer packages under the headings of generalized linear mixed models, hierarchical models, and multilevel models. I here argue that such models offer a more scientifically defensible framework for epidemiologic analysis than the fixed-effects models now prevalent in epidemiology. The argument invokes an antiparsimony principle attributed to L. J. Savage, which is that models should be rich enough to reflect the complexity of the relations under study. It also invokes the countervailing principle that you cannot estimate anything if you try to estimate everything (often used to justify parsimony). Regression with random coefficients offers a rational compromise between these principles as well as an alternative to analyses based on standard variable-selection algorithms and their attendant distortion of uncertainty assessments. These points are illustrated with an analysis of data on diet, nutrition, and breast cancer.

Journal ArticleDOI
Wei Pan1
TL;DR: Through simulation, it is demonstrated that the resulting estimate of the regression coefficient and its associated standard error provide a promising alternative to the nonparametric maximum likelihood estimate.
Abstract: Summary. We propose a general semiparametric method based on multiple imputation for Cox regression with interval-censored data. The method consists of iterating the following two steps. First, from finite-interval-censored (but not right-censored) data, exact failure times are imputed using Tanner and Wei's poor man's or asymptotic normal data augmentation scheme based on the current estimates of the regression coefficient and the baseline survival curve. Second, a standard statistical procedure for right-censored data, such as the Cox partial likelihood method, is applied to imputed data to update the estimates. Through simulation, we demonstrate that the resulting estimate of the regression coefficient and its associated standard error provide a promising alternative to the nonparametric maximum likelihood estimate. Our proposal is easily implemented by taking advantage of existing computer programs for right–censored data.

Journal ArticleDOI
TL;DR: A permutation test for continuous unpaired data is developed and studied through simulations to test the equality of receiver operating characteristic curves based on continuous paired data.
Abstract: We developed a permutation test in our earlier paper (Venkatraman and Begg, 1996, Biometrika 83, 835-848) to test the equality of receiver operating characteristic curves based on continuous paired data. Here we extend the underlying concepts to develop a permutation test for continuous unpaired data, and we study its properties through simulations.

Journal ArticleDOI
TL;DR: An alternative curve-free method in which the probabilities of toxicity are modeled directly as an unknown multidimensional parameter is presented, and a product-of-beta prior (PBP) is introduced and shown to bring about logical improvements.
Abstract: Summary. Consider the problem of finding the dose that is as high as possible subject to having a controlled rate of toxicity. The problem is commonplace in oncology Phase I clinical trials. Such a dose is often called the maximum tolerated dose (MTD) since it represents a necessary trade-off between efficacy and toxicity. The continual reassessment method (CRM) is an improvement over traditional up-and-down schemes for estimating the MTD. It is based on a Bayesian approach and on the assumption that the dose-toxicity relationship follows a specific response curve, e.g., the logistic or power curve. The purpose of this paper is to illustrate how the assumption of a specific curve used in the CRM is not necessary and can actually hinder the efficient use of prior inputs. An alternative curve-free method in which the probabilities of toxicity are modeled directly as an unknown multidimensional parameter is presented. To that purpose, a product-of-beta prior (PBP) is introduced and shown to bring about logical improvements. Practical improvements are illustrated by simulation results.

Journal ArticleDOI
TL;DR: Significant progress has been achieved and further developments are expected in many other areas, including the accelerated failure time model, multivariate failure time data, interval‐censored data, dependent censoring, dynamic treatment regimes and causal inference, joint modeling of failure time and longitudinal data, and Baysian methods.
Abstract: The field of survival analysis emerged in the 20th century and experienced tremendous growth during the latter half of the century. The developments in this field that have had the most profound impact on clinical trials are the Kaplan-Meier (1958, Journal of the American Statistical Association 53, 457-481) method for estimating the survival function, the log-rank statistic (Mantel, 1966, Cancer Chemotherapy Report 50, 163-170) for comparing two survival distributions, and the Cox (1972, Journal of the Royal Statistical Society, Series B 34, 187-220) proportional hazards model for quantifying the effects of covariates on the survival time. The counting-process martingale theory pioneered by Aalen (1975, Statistical inference for a family of counting processes, Ph.D. dissertation, University of California, Berkeley) provides a unified framework for studying the small- and large-sample properties of survival analysis statistics. Significant progress has been achieved and further developments are expected in many other areas, including the accelerated failure time model, multivariate failure time data, interval-censored data, dependent censoring, dynamic treatment regimes and causal inference, joint modeling of failure time and longitudinal data, and Baysian methods.

Journal ArticleDOI
TL;DR: A latent variable model is proposed for the situation where repeated measures over time are obtained on each outcome and these outcomes are assumed to measure an underlying quantity of main interest from different perspectives.
Abstract: Multiple outcomes are often used to properly characterize an effect of interest. This paper proposes a latent variable model for the situation where repeated measures over time are obtained on each outcome. These outcomes are assumed to measure an underlying quantity of main interest from different perspectives. We relate the observed outcomes using regression models to a latent variable, which is then modeled as a function of covariates by a separate regression model. Random effects are used to model the correlation due to repeated measures of the observed outcomes and the latent variable. An EM algorithm is developed to obtain maximum likelihood estimates of model parameters. Unit-specific predictions of the latent variables are also calculated. This method is illustrated using data from a national panel study on changes in methadone treatment practices.

Journal ArticleDOI
TL;DR: A semiparametric approach to the proportional hazards regression analysis of interval-censored data and found multiple imputation to yield an easily computed variance estimate that appears to be more reliable than asymptotic methods with small to moderately sized data sets.
Abstract: We propose a semiparametric approach to the proportional hazards regression analysis of interval-censored data. An EM algorithm based on an approximate likelihood leads to an M-step that involves maximizing a standard Cox partial likelihood to estimate regression coefficients and then using the Breslow estimator for the unknown baseline hazards. The E-step takes a particularly simple form because all incomplete data appear as linear terms in the complete-data log likelihood. The algorithm of Turnbull (1976, Journal of the Royal Statistical Society, Series B 38, 290-295) is used to determine times at which the hazard can take positive mass. We found multiple imputation to yield an easily computed variance estimate that appears to be more reliable than asymptotic methods with small to moderately sized data sets. In the right-censored survival setting, the approach reduces to the standard Cox proportional hazards analysis, while the algorithm reduces to the one suggested by Clayton and Cuzick (1985, Applied Statistics 34, 148-156). The method is illustrated on data from the breast cancer cosmetics trial, previously analyzed by Finkelstein (1986, Biometrics 42, 845-854) and several subsequent authors.

Journal ArticleDOI
TL;DR: This work reviews five nonstationary models that it regard as most useful and concludes that antedependence models should be given much greater consideration than they have historically received.
Abstract: An important theme of longitudinal data analysis in the past two decades has been the development and use of explicit parametric models for the data's variance-covariance structure. A variety of these models have been proposed, of which most are second-order stationary. A few are flexible enough to accommodate nonstationarity, i.e., nonconstant variances and/or correlations that are not a function solely of elapsed time between measurements. We review five nonstationary models that we regard as most useful: (1) the unstructured covariance model, (2) unstructured antedependence models, (3) structured antedependence models, (4) autoregressive integrated moving average and similar models, and (5) random coefficients models. We evaluate the relative strengths and limitations of each model, emphasizing when it is inappropriate or unlikely to be useful. We present three examples to illustrate the fitting and comparison of the models and to demonstrate that nonstationary longitudinal data can be modeled effectively and, in some cases, quite parsimoniously. In these examples, the antedependence models generally prove to be superior and the random coefficients models prove to be inferior. We conclude that antedependence models should be given much greater consideration than they have historically received.

Journal ArticleDOI
TL;DR: A method for applying the proportional hazards model to the analysis of multivariate interval‐censored failure time data from a study of CMV in HIV‐infected patients is outlined.
Abstract: Summary. This paper focuses on the methodology developed for analyzing a multivariate interval-censored data set from an AIDS observational study. A purpose of the study was to determine the natural history of the opportunistic infection cytomeglovirus (CMV) in an HIV-infected individual. For this observational study, laboratory tests were performed at scheduled clinic visits to test for the presence of the CMV virus in the blood and in the urine (called CMV shedding in the blood and urine). The study investigators were interested in determining whether the stage of HIV disease at study entry was predictive of an increased risk for CMV shedding in either the blood or the urine. If all patients had made each clinic visit, the data would be multivariate grouped failure time data and published methods could be used. However, many patients missed several visits, and when they returned, their lab tests indicated a change in their blood and/or urine CMV shedding status, resulting in interval-censored failure time data. This paper outlines a method for applying the proportional hazards model to the analysis of multivariate interval-censored failure time data from a study of CMV in HIV-infected patients.