scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A combined overdispersed and marginalized multilevel model

01 Jun 2012-Computational Statistics & Data Analysis (Elsevier Science Publishers B. V.)-Vol. 56, Iss: 6, pp 1944-1951
TL;DR: It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves and is applied to two clinical studies and compared to the existing approach.
About: This article is published in Computational Statistics & Data Analysis.The article was published on 2012-06-01 and is currently open access. It has received 26 citations till now. The article focuses on the topics: Quasi-likelihood & Overdispersion.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • Overdispersion and correlation are two features often encountered when modeling non-Gaussian dependent data, usually as a function of known covariates.
  • Methods that ignore the presence of these phenomena are often in jeopardy of leading to biased assessment of covariate effects.
  • The beta-binomial and negative binomial models are well known in dealing with overdispersed data for binary and count data, respectively.
  • Similarly, generalized estimating equations (GEE) and the generalized linear mixed models (GLMM) are popular choices when analyzing correlated data.
  • A so-called combined model simultaneously acknowledges the presence of dependency and overdispersion by way of two separate sets of random effects.
  • A marginally specified logistic-normal model for longitudinal binary data which combines the strength of the marginal and hierarchical models has been previously proposed.
  • These two are brought together to produce a marginalized longitudinal model which brings together the comfort of marginally meaningful parameters and the ease of allowing for overdispersion and correlation.
  • Apart from model formulation, estimation methods are discussed.
  • The proposed model is applied to two clinical studies and compared to the existing approach.
  • It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves.

Did you find this useful? Give us your feedback

Citations
More filters
Journal ArticleDOI
TL;DR: A marginalized ZIP model approach for independent responses to model the population mean count directly is developed, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios.
Abstract: The zero-inflated Poisson (ZIP) regression model is often employed in public health research to examine the relationships between exposures of interest and a count outcome exhibiting many zeros, in excess of the amount expected under sampling from a Poisson distribution. The regression coefficients of the ZIP model have latent class interpretations, which correspond to a susceptible subpopulation at risk for the condition with counts generated from a Poisson distribution and a non-susceptible subpopulation that provides the extra or excess zeros. The ZIP model parameters, however, are not well suited for inference targeted at marginal means, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. We develop a marginalized ZIP model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios. Through simulation studies, the performance of maximum likelihood estimation of the marginalized ZIP model is assessed and compared with other methods of estimating overall exposure effects. The marginalized ZIP model is applied to a recent study of a motivational interviewing-based safer sex counseling intervention, designed to reduce unprotected sexual act counts.

66 citations


Cites methods from "A combined overdispersed and margin..."

  • ...Iddi S, Molenberghs G....

    [...]

  • ...By combining overdispersion, random effects, and marginalized model methods, Iddi and Molenberghs [8] obtain population-averaged interpretations for discrete outcomes....

    [...]

Journal ArticleDOI
TL;DR: Analysis of two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects.
Abstract: Count data are collected repeatedly over time in many applications, such as biology, epidemiology, and public health. Such data are often characterized by the following three features. First, correlation due to the repeated measures is usually accounted for using subject-specific random effects, which are assumed to be normally distributed. Second, the sample variance may exceed the mean, and hence, the theoretical mean-variance relationship is violated, leading to overdispersion. This is usually allowed for based on a hierarchical approach, combining a Poisson model with gamma distributed random effects. Third, an excess of zeros beyond what standard count distributions can predict is often handled by either the hurdle or the zero-inflated model. A zero-inflated model assumes two processes as sources of zeros and combines a count distribution with a discrete point mass as a mixture, while the hurdle model separately handles zero observations and positive counts, where then a truncated-at-zero count distribution is used for the non-zero state. In practice, however, all these three features can appear simultaneously. Hence, a modeling framework that incorporates all three is necessary, and this presents challenges for the data analysis. Such models, when conditionally specified, will naturally have a subject-specific interpretation. However, adopting their purposefully modified marginalized versions leads to a direct marginal or population-averaged interpretation for parameter estimates of covariate effects, which is the primary interest in many applications. In this paper, we present a marginalized hurdle model and a marginalized zero-inflated model for correlated and overdispersed count data with excess zero observations and then illustrate these further with two case studies. The first dataset focuses on the Anopheles mosquito density around a hydroelectric dam, while adolescents' involvement in work, to earn money and support their families or themselves, is studied in the second example. Sub-models, which result from omitting zero-inflation and/or overdispersion features, are also considered for comparison's purpose. Analysis of the two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects.

29 citations


Cites background or methods from "A combined overdispersed and margin..."

  • ...Hence, Iddi and Molenberghs [15] merged the concepts of the combined model of Molenberghs et al. [7] and the MMM of Heagerty [13] and proposed a marginalized combined model, so that the resulting estimates have a direct marginal interpretation, together with corresponding inferences....

    [...]

  • ...Based on [17] and [15], specifying a logit link for the marginal model and a probit link for the conditional model leads to computational advantages from the probit-normal relationship, with the marginal parameters still having the odds-ratio interpretation....

    [...]

  • ...In practice, both overdispersion and correlation can occur simultaneously, which led Molenberghs et al. [7] to formulate a flexible and unified modeling framework, which they termed the combined model, to handle a wide range of hierarchical data, including count, binary, and time-to-events....

    [...]

  • ...Hence, the connector functions, as shown in [15], are as follows....

    [...]

  • ...Third, zeros in excess to what can be expected based on the commonly used count distributions may be observed. aDepartment of Epidemiology and Biostatistics, Jimma University, Ethiopia bI-BioStat, CenStat, Universiteit Hasselt, B-3590 Diepenbeek, Belgium cI-BioStat, L-BioStat, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium *Correspondence to: Geert Molenberghs, I-BioStat, Universiteit Hasselt, Martelarenlaan 42, 3000 Hasselt, Belgium....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the proposed extension of the Poisson-normal GLMM strongly outperforms the classical GLMM, and means, variances, and joint probabilities can be expressed in closed form, allowing for exact intra-sequence correlation expressions.
Abstract: Vangeneugden et al. [15] derived approximate correlation functions for longitudinal sequences of general data type, Gaussian and non-Gaussian, based on generalized linear mixed-effects models (GLMM). Their focus was on binary sequences, as well as on a combination of binary and Gaussian sequences. Here, we focus on the specific case of repeated count data, important in two respects. First, we employ the model proposed by Molenberghs et al. [13], which generalizes at the same time the Poisson-normal GLMM and the conventional overdispersion models, in particular the negative-binomial model. The model flexibly accommodates data hierarchies, intra-sequence correlation, and overdispersion. Second, means, variances, and joint probabilities can be expressed in closed form, allowing for exact intra-sequence correlation expressions. Next to the general situation, some important special cases such as exchangeable clustered outcomes are considered, producing insightful expressions. The closed-form expressions are co...

29 citations

Journal ArticleDOI
TL;DR: In this article, the authors evaluated the relationship between ambient UVR exposure and subtype-specific non-Hodgkin lymphoma (NHL) incidence for whites, Hispanics and blacks in the United States for years 2001-2010.
Abstract: Associations between ultraviolet radiation (UVR) exposure and non-Hodgkin lymphoma (NHL) have been inconsistent, but few studies have examined these associations for specific subtypes or across race/ethnicities. We evaluated the relationship between ambient UVR exposure and subtype-specific NHL incidence for whites, Hispanics and blacks in the United States for years 2001-2010 (n = 187,778 cases). Incidence rate ratios (IRRs) and 95% confidence intervals (CIs) were calculated for UVR quintiles using Poisson regression. Incidence was lower for the highest UVR quintile for chronic/small lymphocytic/leukemia (CLL/SLL) (IRR = 0.87, 95% CI: 0.77-0.97), mantle cell (IRR = 0.82, 95% CI: 0.69-0.97), lymphoplasmacytic (IRR = 0.58, 95% CI: 0.42-0.80), mucosa-associated lymphoid tissue (MZLMALT) (IRR = 0.74, 95% CI: 0.60-0.90), follicular (FL) (IRR = 0.76, 95% CI: 0.68-0.86), diffuse large B-cell (IRR = 0.84, 95% CI: 0.76-0.94;), peripheral T-cell other (PTCL) (IRR = 0.76, 95% CI: 0.61-0.95) and PTCL not otherwise specified (PNOS) (IRR = 0.77, 95% CI: 0.61-0.98). Trends were significant for MZLMALT, FL, DLBCL, BNOS and PTCL, with FL and DLBCL still significant after Bonferroni correction. We found interaction by race/ethnicity for CLL/SLL, FL, Burkitt, PNOS and MF/SS, with CLL/SLL and FL still significant after Bonferroni correction. Some B-cell lymphomas (CLL/SLL, FL and Burkitt) suggested significant inverse relationships in whites and Hispanics, but not in blacks. Some T-cell lymphomas suggested the most reduced risk for the highest quintile of UVR among blacks (PNOS and MF/SS), though trends were not significant. These findings strengthen the case for an inverse association of UVR exposure, support modest heterogeneity between NHL subtypes and suggest some differences by race/ethnicity.

26 citations

Journal ArticleDOI
TL;DR: A direct-marginalization approach using a reparameterized link function to model exposure and covariate effects directly on the truncated dependent variable mean is proposed and an alternative average-predicted-value, post-estimation approach which uses model-predictions for each person in a designated reference group under different exposure statuses to estimate covariate-adjusted overall exposure effects is discussed.
Abstract: The Tobit model, also known as a censored regression model to account for left- and/or right-censoring in the dependent variable, has been used in many areas of applications, including dental health, medical research and economics. The reported Tobit model coefficient allows estimation and inference of an exposure effect on the latent dependent variable. However, this model does not directly provide overall exposure effects estimation on the original outcome scale. We propose a direct-marginalization approach using a reparameterized link function to model exposure and covariate effects directly on the truncated dependent variable mean. We also discuss an alternative average-predicted-value, post-estimation approach which uses model-predicted values for each person in a designated reference group under different exposure statuses to estimate covariate-adjusted overall exposure effects. Simulation studies were conducted to show the unbiasedness and robustness properties for both approaches under various scenarios. Robustness appears to diminish when covariates with substantial effects are imbalanced between exposure groups; we outline an approach for model choice based on information criterion fit statistics. The methods are applied to the Genetic Epidemiology Network of Arteriopathy (GENOA) cohort study to assess associations between obesity and cognitive function in the non-Hispanic white participants.

17 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, different formulations for the overdispersion mechanism can lead to different variance functions which can be placed within a general family of estimation methods, including maximum likelihood, moment methods, extended quasi-likelihood, pseudo-like likelihood and non-parametric maximum likelihood.

463 citations

Journal ArticleDOI
TL;DR: Topiramate is a highly efficacious and generally well tolerated new AED and incremental efficacy in the add-on setting is not observed at topiramate dosages above 600 mg/day; however, higher doses may prove beneficial to individual patients who tolerate them.
Abstract: We conducted a multicenter, double-blind, randomized, parallel, placebo-controlled trial in 190 patients to evaluate the safety and efficacy of three dosages of topiramate (600, 800, and 1,000 mg/day) as adjunctive therapy for patients with refractory partial epilepsy. During an 18-week double-blind treatment period, median percent reductions from baseline in average monthly seizure rates were 1% for placebo, 41% for topiramate 600 mg/day and topiramate 800 mg/day, and 38% for topiramate 1,000 mg/day. There was a 50% or greater reduction from baseline in seizure frequency in 9% of patients in the placebo group and in 44% for topiramate 600 mg/day, 40% for topiramate 800 mg/day, and 38% for topiramate 1,000 mg/day. No placebo patients were improved by 75 to 100% in seizure frequency, whereas 20% of the topiramate patients were improved to this degree. All intent-to-treat drug-placebo comparisons including seizure reduction, percent responders, and investigator and patient global evaluations significantly (p < or = 0.02) favored topiramate. Treatment-emergent adverse events consisted mainly of neurologic symptoms commonly observed during antiepileptic drug (AED) therapy. Sixteen percent of patients on topiramate discontinued therapy due to adverse events. Results of this study indicate that topiramate is a highly efficacious and generally well tolerated new AED. When large groups of patients are compared, incremental efficacy in the add-on setting is not observed at topiramate dosages above 600 mg/day; however, higher doses may prove beneficial to individual patients who tolerate them.

340 citations

Journal ArticleDOI
TL;DR: In this article, a likelihood-based method for analysing correlated binary responses based on a multivariate model is discussed, which is related to the pseudo-maximum likelihood approach suggested recently by Zhao & Prentice (1990).
Abstract: SUMMARY In this paper, we discuss a likelihood-based method for analysing correlated binary responses based on a multivariate model. It is related to the pseudo-maximum likelihood approach suggested recently by Zhao & Prentice (1990). Their parameterization results in a simple pairwise model, in which the association between responses is modelled in terms of correlations, while the present paper uses conditional log odds-ratios. With this approach, higher-order associations can be incorporated in a natural way. One important advantage of this parameterization is that the maximum likelihood estimates of the marginal mean parameters are robust to misspecification of the time dependence. We describe an iterative two-stage procedure for obtaining the maximum likelihood estimates. Two examples are presented to illustrate this methodology.

332 citations


"A combined overdispersed and margin..." refers methods in this paper

  • ...It is therefore sensible to turn to the concepts laid out in Heagerty (1999) and Heagerty and Zeger (2000), who proposed a so-calledmarginally specified logistic-normalmodel for longitudinal binary data, combining the strength of GEE and GLMM....

    [...]

  • ...This model, called the marginalized multilevel model (MMM) also enjoys the advantages of utilizing likelihood-based estimation, such as the availability of expressions for the full probability distribution of the response (Fitzmaurice and Laird, 1993;Molenberghs and Lesaffre, 1994), and produces valid inferences when data are missing at random (MAR), among others....

    [...]

Journal ArticleDOI
TL;DR: It is suggested that nearly 1.2 million people in the UK have a fungal nail infection and the majority had not sought medical advice, although over 80% stated that they would do so if they were aware that their nail disorder was of fungal origin.
Abstract: A computer omnibus survey to determine the prevalence of onychomycosis in the United Kingdom was carried out in the early part of 1990. A total population of 9332 adults, aged 16 years and over, was interviewed face-to-face, and a questionnaire completed, which consisted of questions and photographs of various nail dystrophies, including onychomycosis. The results in the population surveyed revealed a prevalence of dermatophyte nail infection of 2.8% in men and 2.6% in women. In the group aged 16-34 years, the prevalence rate was 1.3%; this increased to 2.4% in the group aged 35-50 years, and to 4.7% in those aged 55 years or over. Of those found to have onychomycosis, 27% had sought advice from a chiropodist and less than 12% had consulted a specialist. These results suggest that nearly 1.2 million people in the UK have a fungal nail infection and the majority had not sought medical advice, although over 80% stated that they would do so if they were aware that their nail disorder was of fungal origin. A similar proportion would wish to be treated if an effective treatment was available.

328 citations

Journal ArticleDOI
TL;DR: In this paper, an extension of the bivariate model suggested by Dale is proposed for the analysis of dependent ordinal categorical data, which is constructed by first generalizing the Bivariate Plackett distribution to any dimensions.
Abstract: An extension of the bivariate model suggested by Dale is proposed for the analysis of dependent ordinal categorical data. The so-called multivariate Dale model is constructed by first generalizing the bivariate Plackett distribution to any dimensions. Because the approach is likelihood based, it satisfies properties that are not fulfilled by other popular methods, such as the generalized estimating equations approach. The proposed method models both the marginal and the association structure in a flexible way. The attractiveness of the multivariate Dale model is illustrated in three key examples, covering areas such as crossover trials, longitudinal studies with patients dropping out from the study, and discriminant analysis applications. The differences and similarities with the generalized estimating approach are highlighted.

286 citations