scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A joint model for longitudinal continuous and time-to-event outcomes with direct marginal interpretation.

01 Jul 2013-Biometrical Journal (Wiley)-Vol. 55, Iss: 4, pp 572-588
TL;DR: This paper proposes a so-called marginalized joint model for longitudinal continuous and repeated time-to-event outcomes on the one hand and a marginalized joint models for bivariate repeated time to event outcomes onThe other, which can be fitted relatively easily using standard statistical software.
Abstract: Joint modeling of various longitudinal sequences has received quite a bit of attention in recent times. This paper proposes a so-called marginalized joint model for longitudinal continuous and repeated time-to-event outcomes on the one hand and a marginalized joint model for bivariate repeated time-to-event outcomes on the other. The model has several appealing features. It flexibly allows for association among measurements of the same outcome at different occasions as well as among measurements on different outcomes recorded at the same time. The model also accommodates overdispersion. The time-to-event outcomes are allowed to be censored. While the model builds upon the generalized linear mixed model framework, it is such that model parameters enjoy a direct marginal interpretation. All of these features have been considered before, but here we bring them together in a unified, flexible framework. The model framework's properties are scrutinized using a simulation study. The models are applied to data from a chronic heart failure study and to a so-called comet assay, encountered in preclinical research. Almost surprisingly, the models can be fitted relatively easily using standard statistical software.
Citations
More filters
Journal ArticleDOI
TL;DR: In this review the biostatistical developments since about the year 2000 onwards are discussed, mainly structured for repeated-dose studies, mutagenicity, carcinogenicity, reproductive and ecotoxicological assays.
Abstract: The basic conclusions in almost all reports on new drug applications and in all publications in toxicology are based on statistical methods. However, serious contradictions exist in practice: designs with small samples sizes but use of asymptotic methods (i.e. constructed for larger sample sizes), statistically significant findings without biological relevance (and vice versa), proof of hazard vs. proof of safety, testing (e.g. no observed effect level) vs. estimation (e.g. benchmark dose), available statistical theory vs. related user-friendly software. In this review the biostatistical developments since about the year 2000 onwards are discussed, mainly structured for repeated-dose studies, mutagenicity, carcinogenicity, reproductive and ecotoxicological assays. A critical discussion is included on the unnecessarily conservative evaluation proposed in guidelines, the inadequate but almost always used proof of hazard approach, and the limitation of data-dependent decision-tree approaches.

38 citations

Journal ArticleDOI
TL;DR: A joint model for a simultaneous analysis of three types of data: a longitudinal marker, recurrent events, and a terminal event is proposed and applied to a randomized phase III clinical trial of metastatic colorectal cancer, showing that the proposed trivariate model is appropriate for practical use.
Abstract: In oncology, the international WHO and RECIST criteria have allowed the standardization of tumor response evaluation in order to identify the time of disease progression. These semi-quantitative measurements are often used as endpoints in phase II and phase III trials to study the efficacy of new therapies. However, through categorization of the continuous tumor size, information can be lost and they can be challenged by recently developed methods of modeling biomarkers in a longitudinal way. Thus, it is of interest to compare the predictive ability of cancer progressions based on categorical criteria and quantitative measures of tumor size (left-censored due to detection limit problems) and/or appearance of new lesions on overall survival. We propose a joint model for a simultaneous analysis of three types of data: a longitudinal marker, recurrent events, and a terminal event. The model allows to determine in a randomized clinical trial on which particular component treatment acts mostly. A simulation study is performed and shows that the proposed trivariate model is appropriate for practical use. We propose statistical tools that evaluate predictive accuracy for joint models to compare our model to models based on categorical criteria and their components. We apply the model to a randomized phase III clinical trial of metastatic colorectal cancer, conducted by the Federation Francophone de Cancerologie Digestive (FFCD 2000-05 trial), which assigned 410 patients to two therapeutic strategies with multiple successive chemotherapy regimens.

35 citations


Additional excerpts

  • ...An example of joint analysis of recurrent events and longitudinal data is a marginal model considering overdispersion (Efendi et al., 2013)....

    [...]

Journal ArticleDOI
TL;DR: This comprehensively review the literature for implementation of joint models involving more than a single event time per subject, including the association structure, estimation approaches, software implementations, and clinical applications.
Abstract: Methodological development and clinical application of joint models of longitudinal and time-to-event outcomes have grown substantially over the past two decades. However, much of this research has concentrated on a single longitudinal outcome and a single event time outcome. In clinical and public health research, patients who are followed up over time may often experience multiple, recurrent, or a succession of clinical events. Models that utilise such multivariate event time outcomes are quite valuable in clinical decision-making. We comprehensively review the literature for implementation of joint models involving more than a single event time per subject. We consider the distributional and modelling assumptions, including the association structure, estimation approaches, software implementations, and clinical applications. Research into this area is proving highly promising, but to-date remains in its infancy.

21 citations


Cites background or methods from "A joint model for longitudinal cont..."

  • ...(2013) [54] No Continuous LMM Normal MVN...

    [...]

  • ...[54] then exploited the ideas of Heagerty and Zeger [69] to establish marginal effects....

    [...]

  • ...…al. (2012) [58] X ✓ X Transformation models Normal Zhu et al. (2012) [43] ✓ X X Proportional hazards with piecewise constant baseline hazard functions N/A Efendi et al. (2013) [54] X ✓ X Weibull-gamma-normal model Gamma§ Njagi et al. (2013) [14] X ✓ X Weibull-gamma-normal model Gamma§ Tang et al.…...

    [...]

  • ...…Normal MVN Liu et al. (2008) [56] No Continuous LMM Normal Normal Liu & Huang (2009) [57] No Continuous LMM Normal Normal Kim et al. (2012) [58] No Continuous LMM Normal MVN Efendi et al. (2013) [54] No Continuous LMM Normal MVN Njagi et al. (2013) [14] No Continuous, binary or count –…...

    [...]

  • ...(2013) [54] Random effects parametrization MLE: via partial marginalization [76]; i....

    [...]

Journal ArticleDOI
TL;DR: This is the first approach to combine state‐of‐the art algorithms from the field of machine‐learning with the model class of joint models, providing a fully data‐driven mechanism to select variables and predictor effects in a unified framework of boosting joint models.
Abstract: Joint models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modeling. Commonly, joint models are estimated in likelihood-based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and that do not immediately work for high-dimensional data. In this paper, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables even in high-dimensional data situations. We analyze the performance of the new algorithm in a simulation study and apply it to the Danish cystic fibrosis registry that collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models.

20 citations


Cites methods from "A joint model for longitudinal cont..."

  • ...Interpretation of the separate parts of the model was simplified by the work of Efendi et al. (2013)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the combination of random effects to capture association with further correction for overdispersion can improve the model’s fit considerably and that the resulting models allow to answer research questions that could not be addressed otherwise.
Abstract: In many biomedical studies, one jointly collects longitudinal continuous, binary, and survival outcomes, possibly with some observations missing. Random-effects models, sometimes called shared-parameter models or frailty models, received a lot of attention. In such models, the corresponding variance components can be employed to capture the association between the various sequences. In some cases, random effects are considered common to various sequences, perhaps up to a scaling factor; in others, there are different but correlated random effects. Even though a variety of data types has been considered in the literature, less attention has been devoted to ordinal data. For univariate longitudinal or hierarchical data, the proportional odds mixed model (POMM) is an instance of the generalized linear mixed model (GLMM; Breslow and Clayton, 1993). Ordinal data are conveniently replaced by a parsimonious set of dummies, which in the longitudinal setting leads to a repeated set of dummies. When ordinal...

17 citations

References
More filters
Journal ArticleDOI
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Abstract: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The Ž rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The Ž rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the Ž rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speciŽ c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speciŽ c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The Ž rst half of the chapter presents the methodology, and the second half demonstrates its application through Ž ve different examples. The basis for the general situation is Ž rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coefŽ cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

10,520 citations


"A joint model for longitudinal cont..." refers background in this paper

  • ...The additional random effects are conveniently chosen to be conjugate to the outcome distribution, in the sense of McCullagh and Nelder (1989), which facilitates analytical and numerical calculations, because marginalizing over them can be done in closed form....

    [...]

Journal ArticleDOI
TL;DR: In this article, a unified approach to fitting two-stage random-effects models, based on a combination of empirical Bayes and maximum likelihood estimation of model parameters and using the EM algorithm, is discussed.
Abstract: Models for the analysis of longitudinal data must recognize the relationship between serial observations on the same unit. Multivariate models with general covariance structure are often difficult to apply to highly unbalanced data, whereas two-stage random-effects models can be used easily. In two-stage models, the probability distributions for the response vectors of different individuals belong to a single family, but some random-effects parameters vary across individuals, with a distribution specified at the second stage. A general family of models is discussed, which includes both growth models and repeated-measures models as special cases. A unified approach to fitting these models, based on a combination of empirical Bayes and maximum likelihood estimation of model parameters and using the EM algorithm, is discussed. Two examples are taken from a current epidemiological study of the health effects of air pollution.

8,410 citations

Book
30 Sep 2005
TL;DR: This paper presents a meta-analysis of generalized Linear Mixed Models for Gaussian Longitudinal Data and its applications to Hierarchical Models and Random-effects Models.
Abstract: Introduction.- Motivating Studies.- Generalized Linear Models.- Linear Mixed Models for Gaussian Longitudinal Data.- Model Families.- The Strength of Marginal Models.- Likelihood-based Models.- Generalized Estimating Equations.- Pseudo-likelihood.- Fitting Marginal Models with SAS.- Conditional Models.- Pseudo-likehood.- From Subject-Specific to Random-Effects Models.- Generalized Linear Mixed Models (GLMM).- Fitting Generalized Linear Mixed Models with SAS.- Marginal Versus Random-Effects Models.- Ordinal Data.- The Epilepsy Data.- Non-linear Models.- Psuedo-likelihood for a Hierarchical Model.- Random-effects Models with Serial Correlation.- Non-Gaussian Random Effects.- Joint Continuous and Discrete Responses.- High-dimensional Multivariate Repeated Measurements.- Missing Data Concepts.- Simple Methods, Direct Likelikhood and WGEE.- Multiple Imputation and the Expectation-Maximization Algorithm.- Selection Models.- Pattern-mixture Models.- Sensitivity Analysis.- Incomplete Data and SAS.

1,352 citations


Additional excerpts

  • ...This phenomenon is similar to the one occurring when comparing, for example, the fit of generalized estimating equations with the one from a conventional generalized linear mixed model in, for example, binary repeated measurements (Molenberghs and Verbeke, 2005)....

    [...]

Book
23 Oct 2007
TL;DR: In this book different methods based on the frailty model are described and it is demonstrated how they can be used to analyze clustered survival data.
Abstract: Readers will find in the pages of this book a treatment of the statistical analysis of clustered survival data. Such data are encountered in many scientific disciplines including human and veterinary medicine, biology, epidemiology, public health and demography. A typical example is the time to death in cancer patients, with patients clustered in hospitals. Frailty models provide a powerful tool to analyze clustered survival data. In this book different methods based on the frailty model are described and it is demonstrated how they can be used to analyze clustered survival data. All programs used for these examples are available on the Springer website.

461 citations


"A joint model for longitudinal cont..." refers methods in this paper

  • ...To ensure identifiability, the gamma frailty parameters are restricted by α2 = 1/α1 (Duchateau and Janssen, 2008)....

    [...]

Journal ArticleDOI
TL;DR: This paper presents the R package JM, a package JM that fits joint models for longitudinal and time-to-event data, and describes its use in longitudinal studies.
Abstract: In longitudinal studies measurements are often collected on different types of outcomes for each subject. These may include several longitudinally measured responses (such as blood values relevant to the medical condition under study) and the time at which an event of particular interest occurs (e.g., death, development of a disease or dropout from the study). These outcomes are often separately analyzed; however, in many instances, a joint modeling approach is either required or may produce a better insight into the mechanisms that underlie the phenomenon under study. In this paper we present the R package JM that fits joint models for longitudinal and time-to-event data.

457 citations