scispace - formally typeset
Search or ask a question

Showing papers on "Random effects model published in 2020"


Journal ArticleDOI
TL;DR: This paper studies statistical inference in the increasingly popular two-sample summary-data Mendelian randomization, finding strong evidence of both systematic and idiosyncratic pleiotropy in MR, echoing some recent discoveries in statistical genetics.
Abstract: Mendelian randomization (MR) is a method of exploiting genetic variation to unbiasedly estimate a causal effect in presence of unmeasured confounding. MR is being widely used in epidemiology and other related areas of population science. In this paper, we study statistical inference in the increasingly popular two-sample summary-data MR design. We show a linear model for the observed associations approximately holds in a wide variety of settings when all the genetic variants satisfy the exclusion restriction assumption, or in genetic terms, when there is no pleiotropy. In this scenario, we derive a maximum profile likelihood estimator with provable consistency and asymptotic normality. However, through analyzing real datasets, we find strong evidence of both systematic and idiosyncratic pleiotropy in MR, echoing the omnigenic model of complex traits that is recently proposed in genetics. We model the systematic pleiotropy by a random effects model, where no genetic variant satisfies the exclusion restriction condition exactly. In this case, we propose a consistent and asymptotically normal estimator by adjusting the profile score. We then tackle the idiosyncratic pleiotropy by robustifying the adjusted profile score. We demonstrate the robustness and efficiency of the proposed methods using several simulated and real datasets.

289 citations


Journal ArticleDOI
TL;DR: By providing coded examples using integrated nested Laplace approximations and Template Model Builder for Bayesian and frequentist analysis via the R packages R-INLA and glmmTMB, it is hoped to make efficient estimation of RSFs and SSFs with random effects accessible to anyone in the field.
Abstract: Popular frameworks for studying habitat selection include resource-selection functions (RSFs) and step-selection functions (SSFs), estimated using logistic and conditional logistic regression, respectively. Both frameworks compare environmental covariates associated with locations animals visit with environmental covariates at a set of locations assumed available to the animals. Conceptually, slopes that vary by individual, that is, random coefficient models, could be used to accommodate inter-individual heterogeneity with either approach. While fitting such models for RSFs is possible with standard software for generalized linear mixed-effects models (GLMMs), straightforward and efficient one-step procedures for fitting SSFs with random coefficients are currently lacking. To close this gap, we take advantage of the fact that the conditional logistic regression model (i.e. the SSF) is likelihood-equivalent to a Poisson model with stratum-specific fixed intercepts. By interpreting the intercepts as a random effect with a large (fixed) variance, inference for random-slope models becomes feasible with standard Bayesian techniques, or with frequentist methods that allow one to fix the variance of a random effect. We compare this approach to other commonly applied alternatives, including models without random slopes and mixed conditional regression models fit using a two-step algorithm. Using data from mountain goats (Oreamnos americanus) and Eurasian otters (Lutra lutra), we illustrate that our models lead to valid and feasible inference. In addition, we conduct a simulation study to compare different estimation approaches for SSFs and to demonstrate the importance of including individual-specific slopes when estimating individual- and population-level habitat-selection parameters. By providing coded examples using integrated nested Laplace approximations (INLA) and Template Model Builder (TMB) for Bayesian and frequentist analysis via the R packages R-INLA and glmmTMB, we hope to make efficient estimation of RSFs and SSFs with random effects accessible to anyone in the field. SSFs with individual-specific coefficients are particularly attractive since they can provide insights into movement and habitat-selection processes at fine-spatial and temporal scales, but these models had previously been very challenging to fit.

147 citations


Journal ArticleDOI
TL;DR: This work focuses on 2 concerns, that is: (a) the concern about random effects versus fixed effects, which is central in the (micro)econometrics/sociology literature; and (b) the concerns about grand mean versus group (or person) mean centering, which are central inThe multilevel literature associated with disciplines like psychology and educational sciences.
Abstract: In many disciplines researchers use longitudinal panel data to investigate the potentially causal relationship between 2 variables. However, the conventions and concerns vary widely across disciplines. Here we focus on 2 concerns, that is: (a) the concern about random effects versus fixed effects, which is central in the (micro)econometrics/sociology literature; and (b) the concern about grand mean versus group (or person) mean centering, which is central in the multilevel literature associated with disciplines like psychology and educational sciences. We show that these 2 concerns are actually addressing the same underlying issue. We discuss diverse modeling methods based on either multilevel regression modeling with the data in long format, or structural equation modeling with the data in wide format, and compare these approaches with simulated data. We extend the multilevel model with random slopes and discuss the consequences of this. Subsequently, we provide guidelines on how to choose between the diverse modeling options. We illustrate the use of these guidelines with an empirical example based on intensive longitudinal data, in which we consider both a time-varying and a time-invariant covariate. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

108 citations


Journal ArticleDOI
TL;DR: This tutorial illustrates how frailties induce selection of healthier individuals among survivors, and shows how shared frailty models can be used to model positively dependent survival outcomes in clustered data.
Abstract: The hazard function plays a central role in survival analysis. In a homogeneous population, the distribution of the time to event, described by the hazard, is the same for each individual. Heterogeneity in the distributions can be accounted for by including covariates in a model for the hazard, for instance a proportional hazards model. In this model, individuals with the same value of the covariates will have the same distribution. It is natural to think that not all covariates that are thought to influence the distribution of the survival outcome are included in the model. This implies that there is unobserved heterogeneity; individuals with the same value of the covariates may have different distributions. One way of accounting for this unobserved heterogeneity is to include random effects in the model. In the context of hazard models for time to event outcomes, such random effects are called frailties, and the resulting models are called frailty models. In this tutorial, we study frailty models for survival outcomes. We illustrate how frailties induce selection of healthier individuals among survivors, and show how shared frailties can be used to model positively dependent survival outcomes in clustered data. The Laplace transform of the frailty distribution plays a central role in relating the hazards, conditional on the frailty, to hazards and survival functions observed in a population. Available software, mainly in R, will be discussed, and the use of frailty models is illustrated in two different applications, one on center effects and the other on recurrent events.

99 citations


Journal ArticleDOI
TL;DR: The general and specific tools used in this study indicated moderate and poor QOL, respectively, and it is necessary to carry out periodic QOL measurements using appropriate tools as part of the general care of CHF patients.
Abstract: Despite various individual studies on the quality of life (QOL) in patients with CHF, a comprehensive study has not yet been conducted; therefore, this study aims to assess the QOL of CHF patients. In the present systematic review and meta-analysis, PubMed, Scopus, and the Web of science databases were searched from January 1, 2000, to December 31, 2018, using QOL and heart failure as keywords. The searches, screenings, quality assessments, and data extractions were conducted separately by two researchers. A total of 70 studies including 25,180 participants entered the final stage. The mean QOL score was 44.1 (95% confidence interval (CI) 40.6, 47.5; I2 = 99.3%) using a specific random effects method in 40 studies carried out on 12,520 patients. Moreover, according to the geographical region, heart failure patients in the Americas had higher scores. In 14 studies, in which a general SF-36 survey was implemented, the average physical component score (PCS) and mental component score (MCS) were 33.3 (95% CI 31.9, 34.7; I2 = 88.0%) and 50.6 (95% CI 43.8, 57.4; I2 = 99.3%), respectively. The general and specific tools used in this study indicated moderate and poor QOL, respectively. Therefore, it is necessary to carry out periodic QOL measurements using appropriate tools as part of the general care of CHF patients.

77 citations


Posted ContentDOI
26 Jul 2020-bioRxiv
TL;DR: PartR2 is introduced, an R package that quantifies part R2 for fixed effect predictors based on (generalized) linear mixed-effect model fits and implements parametric bootstrapping to quantify confidence intervals for each estimate.
Abstract: The coefficient of determination R2 quantifies the amount of variance explained by regression coefficients in a linear model. It can be seen as the fixed-effects complement to the repeatability R (intra-class correlation) for the variance explained by random effects and thus as a tool for variance decomposition. The R2 of a model can be further partitioned into the variance explained by a particular predictor or a combination of predictors using semi-partial (part) R2 and structure coefficients, but this is rarely done due to a lack of software implementing these statistics. Here, we introduce partR2, an R package that quantifies part R2 for fixed effect predictors based on (generalized) linear mixed-effect model fits. The package iteratively removes predictors of interest and monitors the change in R2 as a measure of the amount of variance explained uniquely by a particular predictor or a set of predictors. partR2 also estimates structure coefficients as the correlation between a predictor and fitted values, which provide an estimate of the total contribution of a fixed effect to the overall prediction, independent of other predictors. Structure coefficients are converted to the total variance explained by a predictor, termed ‘inclusive’ R2, as the square of the structure coefficients times total R2. Furthermore, the package reports beta weights (standardized regression coefficients). Finally, partR2 implements parametric bootstrapping to quantify confidence intervals for each estimate. We illustrate the use of partR2 with real example datasets for Gaussian and binomials GLMMs and discuss interactions, which pose a specific challenge for partitioning the explained variance among predictors.

70 citations


Journal ArticleDOI
TL;DR: Results showed that four- or five-level or cross-classified random effects models are not often used although they might account better for the meta-analytic data structure of the analyzed datasets and it was found that the simulation studies done on multilevel meta-analysis with multiple random factors could have used more realistic simulation factor conditions.
Abstract: In meta-analysis, study participants are nested within studies, leading to a multilevel data structure. The traditional random effects model can be considered as a model with a random study effect, but additional random effects can be added in order to account for dependent effects sizes within or across studies. The goal of this systematic review is three-fold. First, we will describe how multilevel models with multiple random effects (i.e., hierarchical three-, four-, five-level models and cross-classified random effects models) are applied in meta-analysis. Second, we will illustrate how in some specific three-level meta-analyses, a more sophisticated model could have been used to deal with additional dependencies in the data. Third and last, we will describe the distribution of the characteristics of multilevel meta-analyses (e.g., distribution of the number of outcomes across studies or which dependencies are typically modeled) so that future simulation studies can simulate more realistic conditions. Results showed that four- or five-level or cross-classified random effects models are not often used although they might account better for the meta-analytic data structure of the analyzed datasets. Also, we found that the simulation studies done on multilevel meta-analysis with multiple random factors could have used more realistic simulation factor conditions. The implications of these results are discussed, and further suggestions are given.

63 citations


Journal ArticleDOI
TL;DR: A better fit is achieved than a random effects NMA, uncertainty is substantially reduced by explaining within‐ and between‐study variation, and estimates are more interpretable.
Abstract: Standard network meta-analysis (NMA) and indirect comparisons combine aggregate data from multiple studies on treatments of interest, assuming that any effect modifiers are balanced across populations. Population adjustment methods relax this assumption using individual patient data from one or more studies. However, current matching-adjusted indirect comparison and simulated treatment comparison methods are limited to pairwise indirect comparisons and cannot predict into a specified target population. Existing meta-regression approaches incur aggregation bias. We propose a new method extending the standard NMA framework. An individual level regression model is defined, and aggregate data are fitted by integrating over the covariate distribution to form the likelihood. Motivated by the complexity of the closed form integration, we propose a general numerical approach using quasi-Monte-Carlo integration. Covariate correlation structures are accounted for by using copulas. Crucially for decision making, comparisons may be provided in any target population with a given covariate distribution. We illustrate the method with a network of plaque psoriasis treatments. Estimated population-average treatment effects are similar across study populations, as differences in the distributions of effect modifiers are small. A better fit is achieved than a random effects NMA, uncertainty is substantially reduced by explaining within- and between-study variation, and estimates are more interpretable.

54 citations


Journal ArticleDOI
12 Aug 2020-PeerJ
TL;DR: The intention is to reveal potential perils and pitfalls in mixed model estimation so that researchers can use these powerful approaches with greater awareness and confidence.
Abstract: Biological systems, at all scales of organisation from nucleic acids to ecosystems, are inherently complex and variable. Biologists therefore use statistical analyses to detect signal among this systemic noise. Statistical models infer trends, find functional relationships and detect differences that exist among groups or are caused by experimental manipulations. They also use statistical relationships to help predict uncertain futures. All branches of the biological sciences now embrace the possibilities of mixed-effects modelling and its flexible toolkit for partitioning noise and signal. The mixed-effects model is not, however, a panacea for poor experimental design, and should be used with caution when inferring or deducing the importance of both fixed and random effects. Here we describe a selection of the perils and pitfalls that are widespread in the biological literature, but can be avoided by careful reflection, modelling and model-checking. We focus on situations where incautious modelling risks exposure to these pitfalls and the drawing of incorrect conclusions. Our stance is that statements of significance, information content or credibility all have their place in biological research, as long as these statements are cautious and well-informed by checks on the validity of assumptions. Our intention is to reveal potential perils and pitfalls in mixed model estimation so that researchers can use these powerful approaches with greater awareness and confidence. Our examples are ecological, but translate easily to all branches of biology. Subjects Ecology, Evolutionary Studies, Zoology, Statistics

54 citations



Posted ContentDOI
29 Jan 2020-bioRxiv
TL;DR: L LimeTr as discussed by the authors is an open-source Python package that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the associated marginal likelihood.
Abstract: Mixed effects (ME) models inform a vast array of problems in the physical and social sciences, and are pervasive in meta-analysis. We consider ME models where the random effects component is linear. We then develop an efficient approach for a broad problem class that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the associated marginal likelihood. The software accompanying this paper is disseminated as an open-source Python package called LimeTr . LimeTr is able to recover results more accurately in the presence of outliers compared to available packages for both standard longitudinal analysis and meta-analysis, and is also more computationally efficient than competing robust alternatives. Supplemen- tary materials that reproduce the simulations, as well as run LimeTr and third party code are available online. We also present analyses of global health data, where we use advanced functionality of LimeTr , including constraints to impose monotonicity and concavity for dose-response relationships. Nonlinear observation models allow new analyses in place of classic approximations, such as log-linear models. Robust extensions in all analyses ensure that spurious data points do not drive our understanding of either mean relationships or between-study heterogeneity.

Journal ArticleDOI
TL;DR: This work investigates the problem more closely and provides some guidance on prior specification in the normal-normal hierarchical model and suggests the use of weakly informative priors for the heterogeneity parameter.
Abstract: The normal-normal hierarchical model (NNHM) constitutes a simple and widely used framework for meta-analysis. In the common case of only few studies contributing to the meta-analysis, standard approaches to inference tend to perform poorly, and Bayesian meta-analysis has been suggested as a potential solution. The Bayesian approach, however, requires the sensible specification of prior distributions. While non-informative priors are commonly used for the overall mean effect, the use of weakly informative priors has been suggested for the heterogeneity parameter, in particular in the setting of (very) few studies. To date, however, a consensus on how to generally specify a weakly informative heterogeneity prior is lacking. Here we investigate the problem more closely and provide some guidance on prior specification.

Journal ArticleDOI
TL;DR: This article proposed a penalized maximum likelihood fixed effects (PML-FE) estimator, which retains the complete sample by providing finite estimates of the fixed effects for each unit and explored the small sample performance of PML-FE versus common alternatives via Monte Carlo simulations, evaluating the accuracy of both parameter and effects estimates.
Abstract: Most agree that models of binary time-series-cross-sectional data in political science often possess unobserved unit-level heterogeneity. Despite this, there is no clear consensus on how best to account for these potential unit effects, with many of the issues confronted seemingly misunderstood. For example, one oft-discussed concern with rare events data is the elimination of no-event units from the sample when estimating fixed effects models. Many argue that this is a reason to eschew fixed effects in favor of pooled or random effects models. We revisit this issue and clarify that the main concern with fixed effects models of rare events data is not inaccurate or inefficient coefficient estimation, but instead biased marginal effects. In short, only evaluating event-experiencing units gives an inaccurate estimate of the baseline risk, yielding inaccurate (often inflated) estimates of predictor effects. As a solution, we propose a penalized maximum likelihood fixed effects (PML-FE) estimator, which retains the complete sample by providing finite estimates of the fixed effects for each unit. We explore the small sample performance of PML-FE versus common alternatives via Monte Carlo simulations, evaluating the accuracy of both parameter and effects estimates. Finally, we illustrate our method with a model of civil war onset.

Journal ArticleDOI
TL;DR: Using carefully selected, data driven transformations can improve small area estimation and this paper proposes to tackle the potential lack of validity of the model assumptions by using data-driven scaled transformations as opposed to ad-hoc chosen transformations.
Abstract: Small area models typically depend on the validity of model assumptions. For example, a commonly used version of the Empirical Best Predictor relies on the Gaussian assumptions of the error terms of the linear mixed model, a feature rarely observed in applications with real data. The present paper proposes to tackle the potential lack of validity of the model assumptions by using data-driven scaled transformations as opposed to ad-hoc chosen transformations. Different types of transformations are explored, the estimation of the transformation parameters is studied in detail under a linear mixed model and transformations are used in small area prediction of linear and non-linear parameters. The use of scaled transformations is crucial as it allows for fitting the linear mixed model with standard software and hence it simplifies the work of the data analyst. Mean squared error estimation that accounts for the uncertainty due to the estimation of the transformation parameters is explored using parametric and semi-parametric (wild) bootstrap. The proposed methods are illustrated using real survey and census data for estimating income deprivation parameters for municipalities in the Mexican state of Guerrero. Extensive simulation studies and the results from the application show that using carefully selected, data driven transformations can improve small area estimation.

Journal ArticleDOI
TL;DR: In this article, the first generalized height-diameter model in Romania using three stand predictors as measures of the stand vertical structure, density and competition was developed, and the best calibration design increased the accuracy of the prediction by 50 cm compared to the fixed effects prediction.

Journal ArticleDOI
TL;DR: In this article, weakly informative priors (WIPs) are used for the treatment effect parameter of a Bayesian meta-analysis model, which may also be seen as a form of penalization.
Abstract: Meta-analyses of clinical trials targeting rare events face particular challenges when the data lack adequate numbers of events for all treatment arms. Especially when the number of studies is low, standard random-effects meta-analysis methods can lead to serious distortions because of such data sparsity. To overcome this, we suggest the use of weakly informative priors (WIPs) for the treatment effect parameter of a Bayesian meta-analysis model, which may also be seen as a form of penalization. As a data model, we use a binomial-normal hierarchical model (BNHM) that does not require continuity corrections in case of zero counts in one or both arms. We suggest a normal prior for the log-odds ratio with mean 0 and standard deviation 2.82, which is motivated (a) as a symmetric prior centered around unity and constraining the odds ratio within a range from 1/250 to 250 with 95% probability and (b) as consistent with empirically observed effect estimates from a set of 37 773 meta-analyses from the Cochrane Database of Systematic Reviews. In a simulation study with rare events and few studies, our BNHM with a WIP outperformed a Bayesian method without a WIP and a maximum likelihood estimator in terms of smaller bias and shorter interval estimates with similar coverage. Furthermore, the methods are illustrated by a systematic review in immunosuppression of rare safety events following pediatric transplantation. A publicly available R package, MetaStan, is developed to automate a Bayesian implementation of meta-analysis models using WIPs.

Journal ArticleDOI
TL;DR: Results of the evaluation of two popular frequentist methods and a Bayesian approach using two different prior specifications, are presented, and Multivariate meta-regression or Bayesian estimation using a half-normal prior scaled to 0.5 seem to be promising with respect to the evaluated performance measures in network meta-analysis of sparse networks.
Abstract: The performance of statistical methods is often evaluated by means of simulation studies. In case of network meta-analysis of binary data, however, simulations are not currently available for many practically relevant settings. We perform a simulation study for sparse networks of trials under between-trial heterogeneity and including multi-arm trials. Results of the evaluation of two popular frequentist methods and a Bayesian approach using two different prior specifications are presented. Methods are evaluated using coverage, width of intervals, bias, and root mean squared error (RMSE). In addition, deviations from the theoretical surface under the cumulative rankings (SUCRAs) or P-scores of the treatments are evaluated. Under low heterogeneity and when a large number of trials informs the contrasts, all methods perform well with respect to the evaluated performance measures. Coverage is observed to be generally higher for the Bayesian than the frequentist methods. The width of credible intervals is larger than those of confidence intervals and is increasing when using a flatter prior for between-trial heterogeneity. Bias was generally small, but increased with heterogeneity, especially in netmeta. In some scenarios, the direction of bias differed between frequentist and Bayesian methods. The RMSE was comparable between methods but larger in indirectly than in directly estimated treatment effects. The deviation of the SUCRAs or P-scores from their theoretical values was mostly comparable over the methods but differed depending on the heterogeneity and the geometry of the investigated network. Multivariate meta-regression or Bayesian estimation using a half-normal prior scaled to 0.5 seems to be promising with respect to the evaluated performance measures in network meta-analysis of sparse networks.

Journal ArticleDOI
TL;DR: This study provides a framework for engineers and researchers to identify spatiotemporal patterns of the crashes and explore the factors affecting pedestrian-injury severities especially in those existing crash-prone areas.

Journal ArticleDOI
TL;DR: A Bayesian spatiotemporal random effects (pure) model of relative dengue disease risk estimated by integrated nested Laplace approximation is presented and it is found that every district has a different temporal pattern, indicating that district characteristics influence the temporal variation across space.
Abstract: Dengue disease has serious health and socio-economic consequences. Mapping its occurrence at a fine spatiotemporal scale is a crucial element in the preparation of an early warning system for the prevention and control of dengue and other viral diseases. This paper presents a Bayesian spatiotemporal random effects (pure) model of relative dengue disease risk estimated by integrated nested Laplace approximation. Continuous isopleth mapping based on inverse distance weighting is applied to visualize the disease’s geographical evolution. The model is applied to data for 30 districts in the city of Bandung, Indonesia, for the period January 2009 to December 2016. We compared the Poisson and the negative binomial distributions for the number of dengue cases, both combined with a model which included structured and unstructured spatial and temporal random effects and their interactions. Using several Bayesian and classical model performance criteria and stepwise backward selection, we chose the negative binomial distribution and the temporal model with spatiotemporal interaction for forecasting. The estimation results show that the relative risk decreased generally from 2014. However, it consistently increased in the north-western districts because of environmental and socio-economic conditions. We also found that every district has a different temporal pattern, indicating that district characteristics influence the temporal variation across space.

Journal ArticleDOI
TL;DR: A genome-based restricted maximum likelihood, CORE GREML, is described, which estimates covariance between random effects, a key parameter for estimation, especially when partitioning phenotypic variance by multi-omics layers.
Abstract: As a key variance partitioning tool, linear mixed models (LMMs) using genome-based restricted maximum likelihood (GREML) allow both fixed and random effects. Classic LMMs assume independence between random effects, which can be violated, causing bias. Here we introduce a generalized GREML, named CORE GREML, that explicitly estimates the covariance between random effects. Using extensive simulations, we show that CORE GREML outperforms the conventional GREML, providing variance and covariance estimates free from bias due to correlated random effects. Applying CORE GREML to UK Biobank data, we find, for example, that the transcriptome, imputed using genotype data, explains a significant proportion of phenotypic variance for height (0.15, p-value = 1.5e-283), and that these transcriptomic effects correlate with the genomic effects (genome-transcriptome correlation = 0.35, p-value = 1.2e-14). We conclude that the covariance between random effects is a key parameter for estimation, especially when partitioning phenotypic variance by multi-omics layers. Linear mixed models have bias due to the assumed independence between random effects. Here, the authors describe a genome-based restricted maximum likelihood, CORE GREML, which estimates covariance between random effects. Application to UK Biobank data highlights this as an important parameter for multi-omics analyses of phenotypic variance.

Journal ArticleDOI
TL;DR: In this paper, the issue of spatial confounding between the spatial random effect and the fixed effects in regression analyses has been identified as a concern in the statistical literature, and multiple authors have studied it.
Abstract: The issue of spatial confounding between the spatial random effect and the fixed effects in regression analyses has been identified as a concern in the statistical literature. Multiple authors have...

Journal ArticleDOI
TL;DR: A comprehensive review on Bayesian univariate and multivariate joint models has been undertaken, finding joint modelling has been proved to be beneficial in producing more accurate dynamic prediction; however, there is a lack of sufficient tools to validate the prediction.
Abstract: In clinical research, there is an increasing interest in joint modelling of longitudinal and time-to-event data, since it reduces bias in parameter estimation and increases the efficiency of statistical inference. Inference and prediction from frequentist approaches of joint models have been extensively reviewed, and due to the recent popularity of data-driven Bayesian approaches, a review on current Bayesian estimation of joint model is useful to draw recommendations for future researches. We have undertaken a comprehensive review on Bayesian univariate and multivariate joint models. We focused on type of outcomes, model assumptions, association structure, estimation algorithm, dynamic prediction and software implementation. A total of 89 articles have been identified, consisting of 75 methodological and 14 applied articles. The most common approach to model the longitudinal and time-to-event outcomes jointly included linear mixed effect models with proportional hazards. A random effect association structure was generally used for linking the two sub-models. Markov Chain Monte Carlo (MCMC) algorithms were commonly used (93% articles) to estimate the model parameters. Only six articles were primarily focused on dynamic predictions for longitudinal or event-time outcomes. Methodologies for a wide variety of data types have been proposed; however the research is limited if the association between the two outcomes changes over time, and there is also lack of methods to determine the association structure in the absence of clinical background knowledge. Joint modelling has been proved to be beneficial in producing more accurate dynamic prediction; however, there is a lack of sufficient tools to validate the prediction.

Journal ArticleDOI
TL;DR: A simulation study was conducted to compare estimator performance and demonstrates that the IVhet and quality effects estimators, though biased, have the lowest mean squared error.
Abstract: Studies included in meta-analysis can produce results that depart from the true population parameter of interest due to systematic and/or random errors. Synthesis of these results in meta-analysis aims to generate an estimate closer to the true population parameter by minimizing these errors across studies. The inverse variance heterogeneity (IVhet), quality effects and random effects models of meta-analysis all attempt to do this, but there remains controversy around the estimator that best achieves this goal of reducing error. In an attempt to answer this question, a simulation study was conducted to compare estimator performance. Five thousand iterations at 10 different levels of heterogeneity were run, with each iteration generating one meta-analysis. The results demonstrate that the IVhet and quality effects estimators, though biased, have the lowest mean squared error. These estimators also achieved a coverage probability at or above the nominal level (95%), whereas the coverage probability under the random effects estimator significantly declined (<80%) as heterogeneity increased despite a similar confidence interval width. Based on our findings, we would recommend the use of the IVhet and quality effects models and a discontinuation of traditional random effects models currently in use for meta-analysis.

Posted Content
TL;DR: In this article, the authors use dynamic panel data models to generate density forecasts for daily Covid-19 infections for a panel of countries/regions, assuming that the growth rate of active infections can be represented by autoregressive fluctuations around a downward sloping deterministic trend function with a break.
Abstract: We use dynamic panel data models to generate density forecasts for daily Covid-19 infections for a panel of countries/regions. At the core of our model is a specification that assumes that the growth rate of active infections can be represented by autoregressive fluctuations around a downward sloping deterministic trend function with a break. Our fully Bayesian approach allows us to flexibly estimate the cross-sectional distribution of heterogeneous coefficients and then implicitly use this distribution as prior to construct Bayes forecasts for the individual time series. According to our model, there is a lot of uncertainty about the evolution of infection rates, due to parameter uncertainty and the realization of future shocks. We find that over a one-week horizon the empirical coverage frequency of our interval forecasts is close to the nominal credible level. Weekly forecasts from our model are published at https://laurayuliu.com/covid19-panel-forecast/.

Journal ArticleDOI
TL;DR: In this paper, the problem of forecasting a collection of short time series using cross-sectional information in panel data was considered and point predictors using Tweedie's formula for the posterior mean of heterogeneous coefficients under a correlated random effects distribution were constructed.
Abstract: This paper considers the problem of forecasting a collection of short time series using cross‐sectional information in panel data. We construct point predictors using Tweedie's formula for the posterior mean of heterogeneous coefficients under a correlated random effects distribution. This formula utilizes cross‐sectional information to transform the unit‐specific (quasi) maximum likelihood estimator into an approximation of the posterior mean under a prior distribution that equals the population distribution of the random coefficients. We show that the risk of a predictor based on a nonparametric kernel estimate of the Tweedie correction is asymptotically equivalent to the risk of a predictor that treats the correlated random effects distribution as known (ratio optimality). Our empirical Bayes predictor performs well compared to various competitors in a Monte Carlo study. In an empirical application, we use the predictor to forecast revenues for a large panel of bank holding companies and compare forecasts that condition on actual and severely adverse macroeconomic conditions.

Journal ArticleDOI
TL;DR: First-order fixed effects estimates are the reference point for interpreting random effects as intersectional effects in MAIHDA analyses, but the random effects alone do not provide meaningful estimates of intersectional advantage or disadvantage and are advised to be combined for their estimates to be meaningful.

Journal ArticleDOI
TL;DR: ClusterBootstrap, an R package for the analysis of hierarchical data using generalized linear models with the cluster bootstrap, is introduced and it will become clear that the GLMCB is a promising alternative to mixed models and the ClusterBootstrap package an easy-to-use R implementation of the technique.
Abstract: In the analysis of clustered or hierarchical data, a variety of statistical techniques can be applied. Most of these techniques have assumptions that are crucial to the validity of their outcome. Mixed models rely on the correct specification of the random effects structure. Generalized estimating equations are most efficient when the working correlation form is chosen correctly and are not feasible when the within-subject variable is non-factorial. Assumptions and limitations of another common approach, ANOVA for repeated measurements, are even more worrisome: listwise deletion when data are missing, the sphericity assumption, inability to model an unevenly spaced time variable and time-varying covariates, and the limitation to normally distributed dependent variables. This paper introduces ClusterBootstrap, an R package for the analysis of hierarchical data using generalized linear models with the cluster bootstrap (GLMCB). Being a bootstrap method, the technique is relatively assumption-free, and it has already been shown to be comparable, if not superior, to GEE in its performance. The paper has three goals. First, GLMCB will be introduced. Second, there will be an empirical example, using the ClusterBootstrap package for a Gaussian and a dichotomous dependent variable. Third, GLMCB will be compared to mixed models in a Monte Carlo experiment. Although GLMCB can be applied to a multitude of hierarchical data forms, this paper discusses it in the context of the analysis of repeated measurements or longitudinal data. It will become clear that the GLMCB is a promising alternative to mixed models and the ClusterBootstrap package an easy-to-use R implementation of the technique.

Journal ArticleDOI
TL;DR: In this paper, random effects (RE) models have been used to study the contextual effects of structures such as neighborhoods or schools and applied to age-period-cohort (APC).
Abstract: Random effects (RE) models have been widely used to study the contextual effects of structures such as neighborhoods or schools. The RE approach has recently been applied to age-period-cohort (APC)...

Journal ArticleDOI
TL;DR: In this paper, a covariance pattern mixture model (CPMM) is proposed to solve the problem of convergence of GMMs, which can circumvent the computational difficulties that can plague GMMs without sacrificing the ability to answer the types of questions commonly asked in empirical studies.
Abstract: Growth mixture models (GMMs) are prevalent for modeling unknown population heterogeneity via distinct latent classes. However, GMMs are riddled with convergence issues, often requiring researchers to atheoretically alter the model with cross-class constraints simply to obtain convergence. We discuss how within-class random effects in GMMs exacerbate convergence issues, even though these random effects rarely help answer typical research questions. That is, latent classes provide a discretization of continuous random effects, so including additional random effects within latent classes can unnecessarily complicate the model. These random effects are commonly included in order to properly specify the marginal covariance; however, random effects are inefficient for patterning a covariance matrix, resulting in estimation issues. Such a goal can be achieved more simply through covariance pattern models, which we extend to the mixture model context in this article (covariance pattern mixture models, or CPMMs). We provide evidence from theory, simulation, and an empirical example showing that employing CPMMs (even if they are misspecified) instead of GMMs can circumvent the computational difficulties that can plague GMMs, without sacrificing the ability to answer the types of questions commonly asked in empirical studies. Our results show the advantages of CPMMs with respect to improved class enumeration and less biased class-specific growth trajectories, in addition to their vastly improved convergence rates. The results also show that constraining the covariance parameters across classes in order to bypass convergence issues with GMMs leads to poor results. An extensive software appendix is included to assist researchers in running CPMMs in Mplus.

Journal ArticleDOI
TL;DR: Compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes, it was shown that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation.
Abstract: Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.