scispace - formally typeset

Journal ArticleDOI

A combined overdispersed and marginalized multilevel model

01 Jun 2012-Computational Statistics & Data Analysis (Elsevier Science Publishers B. V.)-Vol. 56, Iss: 6, pp 1944-1951

TL;DR: It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves and is applied to two clinical studies and compared to the existing approach.

AbstractOverdispersion and correlation are two features often encountered when modeling non-Gaussian dependent data, usually as a function of known covariates Methods that ignore the presence of these phenomena are often in jeopardy of leading to biased assessment of covariate effects The beta-binomial and negative binomial models are well known in dealing with overdispersed data for binary and count data, respectively Similarly, generalized estimating equations (GEE) and the generalized linear mixed models (GLMM) are popular choices when analyzing correlated data A so-called combined model simultaneously acknowledges the presence of dependency and overdispersion by way of two separate sets of random effects A marginally specified logistic-normal model for longitudinal binary data which combines the strength of the marginal and hierarchical models has been previously proposed These two are brought together to produce a marginalized longitudinal model which brings together the comfort of marginally meaningful parameters and the ease of allowing for overdispersion and correlation Apart from model formulation, estimation methods are discussed The proposed model is applied to two clinical studies and compared to the existing approach It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves

Topics: Quasi-likelihood (68%), Overdispersion (65%), Generalized linear mixed model (59%), Count data (58%), Random effects model (56%)

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • Overdispersion and correlation are two features often encountered when modeling non-Gaussian dependent data, usually as a function of known covariates.
  • Methods that ignore the presence of these phenomena are often in jeopardy of leading to biased assessment of covariate effects.
  • The beta-binomial and negative binomial models are well known in dealing with overdispersed data for binary and count data, respectively.
  • Similarly, generalized estimating equations (GEE) and the generalized linear mixed models (GLMM) are popular choices when analyzing correlated data.
  • A so-called combined model simultaneously acknowledges the presence of dependency and overdispersion by way of two separate sets of random effects.
  • A marginally specified logistic-normal model for longitudinal binary data which combines the strength of the marginal and hierarchical models has been previously proposed.
  • These two are brought together to produce a marginalized longitudinal model which brings together the comfort of marginally meaningful parameters and the ease of allowing for overdispersion and correlation.
  • Apart from model formulation, estimation methods are discussed.
  • The proposed model is applied to two clinical studies and compared to the existing approach.
  • It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

Made available by Hasselt University Library in https://documentserver.uhasselt.be
A combined overdispersed and marginalized multilevel model
Peer-reviewed author version
Iddi, Samuel & MOLENBERGHS, Geert (2012) A combined overdispersed and
marginalized multilevel model. In: COMPUTATIONAL STATISTICS & DATA
ANALYSIS, 56 (6), p. 1944-1951.
DOI: 10.1016/j.csda.2011.11.021
Handle: http://hdl.handle.net/1942/13622

Leuven Statistics Day, June 2012 1
A Combined Overdispersed and Marginalized Multilevel Model
Samuel Iddi
2
and Geert Molenberghs
1,2
Interuniversity Institute for Biostatistics and statistical Bioinformatics
1
Universiteit Hasselt, Agoralaan 1, 3590 Diepenbeek, Belgium
2
KU Leuven, Kapucijnenvoer 35, 3000 Leuven, Belgium
Abstract
Overdispersion and correlation are two features often encountered when modeling non-Gaussian depen-
dent data, usually as a function of known covariates. Methods that ignore the presence of these phenomena
are often in jeopardy of leading to biased assessment of covariate effects. The beta-binomial and negative
binomial models are well known in dealing with overdispersed data for binary and count data, respectively.
Similarly, generalized estimating equations (GEE) and the generalized linear mixed models (GLMM) are
popular choices when analyzing correlated data. A so-called combined model simultaneously acknowledges
the presence of dependency and overdispersion by way of two separate sets of random effects. A marginally
specified logistic-normal model for longitudinal binary data which combines the strength of the marginal and
hierarchical models has been previously proposed. These two are brought together to produce a marginal-
ized longitudinal model which brings together the comfort of marginally meaningful parameters and the
ease of allowing for overdispersion and correlation. Apart from model formulation, estimation methods are
discussed. The proposed model is applied to two clinical studies and compared to the existing approach. It
turns out that by explicitly allowing for overdispersion random effect, the model significantly improves.
Keywords: Combined model; Correlation; Overdispersion; Partial marginalization
Acknowledgements: The authors gratefully acknowledge the financial support from the IAP research
Network P6/03 of the Belgian Government (Belgian Science Policy).
References
Heagerty, P.J. (1999) Marginally specified logistic-normal models for longitudinal binary data Biometrics,
55, 688–698
Heagerty, P.J. and Zeger, S.L. (2000) Marginalized multilevel models and likelihood inference (with discus-
sion). Statistical Science, 15, 1–26
Griswold, M.E. and Zeger, S.L. (2004) On marginalized multilevel models and their computation (November
2004). Johns Hopkins University, Department of Biostatistics Working Paper #99.
Hinde, J. and Dem´etrio, C.G.B. (1998a) Overdispersion: Models and estimation. Computational Statistics
and Data Analysis, 27, 151–170.
Molenberghs, G. and Verbeke, G. (2005) Models for Discrete Longitudinal Data. New York: Springer.
Molenberghs, G., Verbeke, G., and Dem´etrio, C. (2007) An extended random-effects approach to modeling
repeated, overdispersed count data. Lifetime Data Analysis, 13, 513–531.
Molenberghs, G., Verbeke, G., Dem´etrio, C., and Vieira, A. (2010) A family of generalized linear models
for repeated measures with normal and conjugate random effects. Statistical Science, 25, 325–347.
Vangeneugden, T., Molenb erghs, G., Verbeke, G., and Dem´etrio, C. (2011) Marginal correlation from an
extended random-effects model for repeated and overdispersed counts. Journal of Applied Statistics,
38, 215–232.
Citations
More filters

Journal ArticleDOI
TL;DR: A marginalized ZIP model approach for independent responses to model the population mean count directly is developed, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios.
Abstract: The zero-inflated Poisson (ZIP) regression model is often employed in public health research to examine the relationships between exposures of interest and a count outcome exhibiting many zeros, in excess of the amount expected under sampling from a Poisson distribution. The regression coefficients of the ZIP model have latent class interpretations, which correspond to a susceptible subpopulation at risk for the condition with counts generated from a Poisson distribution and a non-susceptible subpopulation that provides the extra or excess zeros. The ZIP model parameters, however, are not well suited for inference targeted at marginal means, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. We develop a marginalized ZIP model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios. Through simulation studies, the performance of maximum likelihood estimation of the marginalized ZIP model is assessed and compared with other methods of estimating overall exposure effects. The marginalized ZIP model is applied to a recent study of a motivational interviewing-based safer sex counseling intervention, designed to reduce unprotected sexual act counts.

53 citations


Cites methods from "A combined overdispersed and margin..."

  • ...Iddi S, Molenberghs G....

    [...]

  • ...By combining overdispersion, random effects, and marginalized model methods, Iddi and Molenberghs [8] obtain population-averaged interpretations for discrete outcomes....

    [...]


Journal ArticleDOI
TL;DR: It is shown that the proposed extension of the Poisson-normal GLMM strongly outperforms the classical GLMM, and means, variances, and joint probabilities can be expressed in closed form, allowing for exact intra-sequence correlation expressions.
Abstract: Vangeneugden et al. [15] derived approximate correlation functions for longitudinal sequences of general data type, Gaussian and non-Gaussian, based on generalized linear mixed-effects models (GLMM). Their focus was on binary sequences, as well as on a combination of binary and Gaussian sequences. Here, we focus on the specific case of repeated count data, important in two respects. First, we employ the model proposed by Molenberghs et al. [13], which generalizes at the same time the Poisson-normal GLMM and the conventional overdispersion models, in particular the negative-binomial model. The model flexibly accommodates data hierarchies, intra-sequence correlation, and overdispersion. Second, means, variances, and joint probabilities can be expressed in closed form, allowing for exact intra-sequence correlation expressions. Next to the general situation, some important special cases such as exchangeable clustered outcomes are considered, producing insightful expressions. The closed-form expressions are co...

28 citations


Journal ArticleDOI
TL;DR: Analysis of two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects.
Abstract: Count data are collected repeatedly over time in many applications, such as biology, epidemiology, and public health. Such data are often characterized by the following three features. First, correlation due to the repeated measures is usually accounted for using subject-specific random effects, which are assumed to be normally distributed. Second, the sample variance may exceed the mean, and hence, the theoretical mean-variance relationship is violated, leading to overdispersion. This is usually allowed for based on a hierarchical approach, combining a Poisson model with gamma distributed random effects. Third, an excess of zeros beyond what standard count distributions can predict is often handled by either the hurdle or the zero-inflated model. A zero-inflated model assumes two processes as sources of zeros and combines a count distribution with a discrete point mass as a mixture, while the hurdle model separately handles zero observations and positive counts, where then a truncated-at-zero count distribution is used for the non-zero state. In practice, however, all these three features can appear simultaneously. Hence, a modeling framework that incorporates all three is necessary, and this presents challenges for the data analysis. Such models, when conditionally specified, will naturally have a subject-specific interpretation. However, adopting their purposefully modified marginalized versions leads to a direct marginal or population-averaged interpretation for parameter estimates of covariate effects, which is the primary interest in many applications. In this paper, we present a marginalized hurdle model and a marginalized zero-inflated model for correlated and overdispersed count data with excess zero observations and then illustrate these further with two case studies. The first dataset focuses on the Anopheles mosquito density around a hydroelectric dam, while adolescents' involvement in work, to earn money and support their families or themselves, is studied in the second example. Sub-models, which result from omitting zero-inflation and/or overdispersion features, are also considered for comparison's purpose. Analysis of the two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects.

26 citations


Cites background or methods from "A combined overdispersed and margin..."

  • ...Hence, Iddi and Molenberghs [15] merged the concepts of the combined model of Molenberghs et al. [7] and the MMM of Heagerty [13] and proposed a marginalized combined model, so that the resulting estimates have a direct marginal interpretation, together with corresponding inferences....

    [...]

  • ...Based on [17] and [15], specifying a logit link for the marginal model and a probit link for the conditional model leads to computational advantages from the probit-normal relationship, with the marginal parameters still having the odds-ratio interpretation....

    [...]

  • ...In practice, both overdispersion and correlation can occur simultaneously, which led Molenberghs et al. [7] to formulate a flexible and unified modeling framework, which they termed the combined model, to handle a wide range of hierarchical data, including count, binary, and time-to-events....

    [...]

  • ...Hence, the connector functions, as shown in [15], are as follows....

    [...]

  • ...Third, zeros in excess to what can be expected based on the commonly used count distributions may be observed. aDepartment of Epidemiology and Biostatistics, Jimma University, Ethiopia bI-BioStat, CenStat, Universiteit Hasselt, B-3590 Diepenbeek, Belgium cI-BioStat, L-BioStat, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium *Correspondence to: Geert Molenberghs, I-BioStat, Universiteit Hasselt, Martelarenlaan 42, 3000 Hasselt, Belgium....

    [...]


Journal ArticleDOI
Abstract: Associations between ultraviolet radiation (UVR) exposure and non-Hodgkin lymphoma (NHL) have been inconsistent, but few studies have examined these associations for specific subtypes or across race/ethnicities. We evaluated the relationship between ambient UVR exposure and subtype-specific NHL incidence for whites, Hispanics and blacks in the United States for years 2001-2010 (n = 187,778 cases). Incidence rate ratios (IRRs) and 95% confidence intervals (CIs) were calculated for UVR quintiles using Poisson regression. Incidence was lower for the highest UVR quintile for chronic/small lymphocytic/leukemia (CLL/SLL) (IRR = 0.87, 95% CI: 0.77-0.97), mantle cell (IRR = 0.82, 95% CI: 0.69-0.97), lymphoplasmacytic (IRR = 0.58, 95% CI: 0.42-0.80), mucosa-associated lymphoid tissue (MZLMALT) (IRR = 0.74, 95% CI: 0.60-0.90), follicular (FL) (IRR = 0.76, 95% CI: 0.68-0.86), diffuse large B-cell (IRR = 0.84, 95% CI: 0.76-0.94;), peripheral T-cell other (PTCL) (IRR = 0.76, 95% CI: 0.61-0.95) and PTCL not otherwise specified (PNOS) (IRR = 0.77, 95% CI: 0.61-0.98). Trends were significant for MZLMALT, FL, DLBCL, BNOS and PTCL, with FL and DLBCL still significant after Bonferroni correction. We found interaction by race/ethnicity for CLL/SLL, FL, Burkitt, PNOS and MF/SS, with CLL/SLL and FL still significant after Bonferroni correction. Some B-cell lymphomas (CLL/SLL, FL and Burkitt) suggested significant inverse relationships in whites and Hispanics, but not in blacks. Some T-cell lymphomas suggested the most reduced risk for the highest quintile of UVR among blacks (PNOS and MF/SS), though trends were not significant. These findings strengthen the case for an inverse association of UVR exposure, support modest heterogeneity between NHL subtypes and suggest some differences by race/ethnicity.

24 citations


Journal ArticleDOI
TL;DR: A direct-marginalization approach using a reparameterized link function to model exposure and covariate effects directly on the truncated dependent variable mean is proposed and an alternative average-predicted-value, post-estimation approach which uses model-predictions for each person in a designated reference group under different exposure statuses to estimate covariate-adjusted overall exposure effects is discussed.
Abstract: The Tobit model, also known as a censored regression model to account for left- and/or right-censoring in the dependent variable, has been used in many areas of applications, including dental health, medical research and economics. The reported Tobit model coefficient allows estimation and inference of an exposure effect on the latent dependent variable. However, this model does not directly provide overall exposure effects estimation on the original outcome scale. We propose a direct-marginalization approach using a reparameterized link function to model exposure and covariate effects directly on the truncated dependent variable mean. We also discuss an alternative average-predicted-value, post-estimation approach which uses model-predicted values for each person in a designated reference group under different exposure statuses to estimate covariate-adjusted overall exposure effects. Simulation studies were conducted to show the unbiasedness and robustness properties for both approaches under various scenarios. Robustness appears to diminish when covariates with substantial effects are imbalanced between exposure groups; we outline an approach for model choice based on information criterion fit statistics. The methods are applied to the Genetic Epidemiology Network of Arteriopathy (GENOA) cohort study to assess associations between obesity and cognitive function in the non-Hispanic white participants.

15 citations


References
More filters

Book
01 Jan 1983
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,204 citations


Journal ArticleDOI
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

16,152 citations


Journal ArticleDOI
Abstract: categorical data analysis , categorical data analysis , کتابخانه مرکزی دانشگاه علوم پزشکی تهران

10,303 citations


Book
25 Aug 2003

10,033 citations


Journal ArticleDOI
01 May 1972
Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series A (General). SUMMARY The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log-likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components). The implications of the approach in designing statistics courses are discussed.

8,264 citations