scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2022"


Journal ArticleDOI
01 Jan 2022-Neuron
TL;DR: The authors introduce linear and generalized mixed-effects models that consider data dependence and provide clear instruction on how to recognize when they are needed and how to apply them. But the most widely used methods such as t test and ANOVA do not take data dependence into account and thus are often misused.

104 citations


Journal ArticleDOI
TL;DR: In this article , a three-variance-component mixed model was integrated with the multi-locus random-SNP-effect mixed linear model (mrMLM) method to establish a new methodological framework, 3VMRMLM, that detects all types of loci and estimates their effects.

32 citations


Journal ArticleDOI
TL;DR: This article analyzed the consequences of treating a grouping variable with 2-8 levels as fixed or random effect in correctly specified and alternative models (under- or over-parametrized models).
Abstract: Biological data are often intrinsically hierarchical (e.g., species from different genera, plants within different mountain regions), which made mixed-effects models a common analysis tool in ecology and evolution because they can account for the non-independence. Many questions around their practical applications are solved but one is still debated: Should we treat a grouping variable with a low number of levels as a random or fixed effect? In such situations, the variance estimate of the random effect can be imprecise, but it is unknown if this affects statistical power and type I error rates of the fixed effects of interest. Here, we analyzed the consequences of treating a grouping variable with 2-8 levels as fixed or random effect in correctly specified and alternative models (under- or overparametrized models). We calculated type I error rates and statistical power for all-model specifications and quantified the influences of study design on these quantities. We found no influence of model choice on type I error rate and power on the population-level effect (slope) for random intercept-only models. However, with varying intercepts and slopes in the data-generating process, using a random slope and intercept model, and switching to a fixed-effects model, in case of a singular fit, avoids overconfidence in the results. Additionally, the number and difference between levels strongly influences power and type I error. We conclude that inferring the correct random-effect structure is of great importance to obtain correct type I error rates. We encourage to start with a mixed-effects model independent of the number of levels in the grouping variable and switch to a fixed-effects model only in case of a singular fit. With these recommendations, we allow for more informative choices about study design and data analysis and make ecological inference with mixed-effects models more robust for small number of levels.

10 citations


Journal ArticleDOI
TL;DR: In this article , a nonlinear mixed effects CW model for natural spruce-fir-broadleaf mixed forest in northeast China was developed to quantify the effects of stand structure, intra and inter-specific competition, especially in mixed-species forest.

9 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used data from Spanish National Forest Inventory to fit new mixed-effects basal area increment (BAI) models for 29 two-species compositions in Spain.

8 citations


Journal ArticleDOI
TL;DR: In this article , the authors used the Richards M1a generalized mixed-effects model to estimate the chestnut tree height from tree diameter and stand-level variables using the same model.

7 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used Google Street View cars, equipped with high-quality AQ instruments, and measured the concentration of NO2 on every street in Amsterdam and Copenhagen on average seven times over the course of 9 and 16 months, respectively.
Abstract: High-resolution air quality (AQ) maps based on street-by-street measurements have become possible through large-scale mobile measurement campaigns. Such campaigns have produced data-only maps and have been used to produce empirical models [i.e., land use regression (LUR) models]. Assuming that all road segments are measured, we developed a mixed model framework that predicts concentrations by an LUR model, while allowing road segments to deviate from the LUR prediction based on between-segment variation as a random effect. We used Google Street View cars, equipped with high-quality AQ instruments, and measured the concentration of NO2 on every street in Amsterdam (n = 46.664) and Copenhagen (n = 28.499) on average seven times over the course of 9 and 16 months, respectively. We compared the data-only mapping, LUR, and mixed model estimates with measurements from passive samplers (n = 82) and predictions from dispersion models in the same time window as mobile monitoring. In Amsterdam, mixed model estimates correlated rs (Spearman correlation) = 0.85 with external measurements, whereas the data-only approach and LUR model estimates correlated rs = 0.74 and 0.75, respectively. Mixed model estimates also correlated higher rs = 0.65 with the deterministic model predictions compared to the data-only (rs = 0.50) and LUR model (rs = 0.61). In Copenhagen, mixed model estimates correlated rs = 0.51 with external model predictions compared to rs = 0.45 and rs = 0.50 for data-only and LUR model, respectively. Correlation increased for 97 locations (rs = 0.65) with more detailed traffic information. This means that the mixed model approach is able to combine the strength of data-only mapping (to show hyperlocal variation) and LUR models by shrinking uncertain concentrations toward the model output.

6 citations


Journal ArticleDOI
TL;DR: In this paper , mixed models were used to understand whether the level of competition affected the intensity of women's rugby league match play, and the authors determined that if a repeated-measures analysis of variance (ANOVA) were used for the statistical analysis in the present study, at least 48.7% of the data would have been omitted to meet ANOVA assumptions.
Abstract: PURPOSE Sport-science research consistently contains repeated measures and imbalanced data sets. This study calls for further adoption of mixed models when analyzing longitudinal sport-science data sets. Mixed models were used to understand whether the level of competition affected the intensity of women's rugby league match play. METHODS A total of 472 observations were used to compare the mean speed of female rugby league athletes recorded during club-, state-, and international-level competition. As athletes featured in all 3 levels of competition and there were multiple matches within each competition (ie, repeated measures), the authors demonstrated that mixed models are the appropriate statistical approach for these data. RESULTS The authors determined that if a repeated-measures analysis of variance (ANOVA) were used for the statistical analysis in the present study, at least 48.7% of the data would have been omitted to meet ANOVA assumptions. Using a mixed model, the authors determined that mean speed recorded during Trans-Tasman Test matches was 73.4 m·min-1, while the mean speeds for National Rugby League Women and State of Origin matches were 77.6 and 81.6 m·min-1, respectively. Random effects of team, athlete, and match all accounted for variations in mean speed, which otherwise could have concealed the main effects of position and level of competition had less flexible ANOVAs been used. CONCLUSION These data clearly demonstrate the appropriateness of applying mixed models to typical data sets acquired in the professional sport setting. Mixed models should be more readily used within sport science, especially in observational, longitudinal data sets such as movement pattern analyses.

5 citations



Journal ArticleDOI
TL;DR: In this article , a class of nonlinear mixed-effects models called progression models for repeated measures (PMRMs) are introduced, which, based on a continuous time extension of the categorical-time parametrization of MMRMs, enables estimation of novel types of treatment effects, including measures of slowing or delay of the time progression of disease.
Abstract: Mixed models for repeated measures (MMRMs) are ubiquitous when analyzing outcomes of clinical trials. However, the linearity of the fixed‐effect structure in these models largely restrict their use to estimating treatment effects that are defined as linear combinations of effects on the outcome scale. In some situations, alternative quantifications of treatment effects may be more appropriate. In progressive diseases, for example, one may want to estimate if a drug has cumulative effects resulting in increasing efficacy over time or whether it slows the time progression of disease. This article introduces a class of nonlinear mixed‐effects models called progression models for repeated measures (PMRMs) that, based on a continuous‐time extension of the categorical‐time parametrization of MMRMs, enables estimation of novel types of treatment effects, including measures of slowing or delay of the time progression of disease. Compared to conventional estimates of treatment effects where the unit matches that of the outcome scale (eg, 2 points benefit on a cognitive scale), the time‐based treatment effects can offer better interpretability and clinical meaningfulness (eg, 6 months delay in progression of cognitive decline). The PMRM class includes conventionally used MMRMs and related models for longitudinal data analysis, as well as variants of previously proposed disease progression models as special cases. The potential of the PMRM framework is illustrated using both simulated and historical data from clinical trials in Alzheimer's disease with different types of artificially simulated treatment effects. Compared to conventional models it is shown that PMRMs can offer substantially increased power to detect disease‐modifying treatment effects where the benefit is increasing with treatment duration.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the authors compare the performance of generalized estimating equations (GEE) and generalized linear mixed models (GLMM) in neuroscience research and conclude that GEE and GLMM may provide more reliable results when compared to rmANOVA and rmMANOVA.
Abstract: In neuroscience research, longitudinal data are often analyzed using ANOVA and MANOVA for repeated measures (rmANOVA/rmMANOVA). However, these analyses have special requirements: the variances of the differences between all possible pairs of within-subject conditions (i.e., levels of the independent variable) must be equal. They are also limited to fixed repeated time intervals and are sensitive to missing data. In contrast, other models, such as the Generalized Estimating Equations (GEE) and the Generalized Linear Mixed Models (GLMM), suggest another way to think about the data and the studied phenomenon. Instead of forcing the data into the ANOVAs assumptions, it is possible to design a flexible/personalized model according to the nature of the dependent variable. We discuss some advantages of GEE and GLMM as alternatives to rmANOVA and rmMANOVA in neuroscience research, including the possibility of using different distributions for the parameters of the dependent variable, a better approach for different time length points, and better adjustment to missing data. We illustrate these advantages by showing a comparison between rmANOVA and GEE in a real example and providing the data and a tutorial code to reproduce these analyses in R. We conclude that GEE and GLMM may provide more reliable results when compared to rmANOVA and rmMANOVA in neuroscience research, especially in small sample sizes with unbalanced longitudinal designs with or without missing data.

Journal ArticleDOI
TL;DR: In this paper, a three-stage ensemble model was applied to estimate daily mean air temperature from satellite-based land surface temperature (Ts) over Sweden during 2001-2019, at a high spatial resolution of 1 × 1 km2.

Journal ArticleDOI
TL;DR: In this article , the linear mixed effects (LME) model was used for longitudinal studies to examine possible causal factors associated with human health and disease, and the fit and stability of different parameterizations of ANOVA and LME models were compared through simulation.
Abstract: Longitudinal studies are commonly used to examine possible causal factors associated with human health and disease. However, the statistical models, such as two-way ANOVA, often applied in these studies do not appropriately model the experimental design, resulting in biased and imprecise results. Here, we describe the linear mixed effects (LME) model and how to use it for longitudinal studies. We re-analyze a dataset published by Blanton et al. in 2016 that modeled growth trajectories in mice after microbiome implantation from nourished or malnourished children. We compare the fit and stability of different parameterizations of ANOVA and LME models; most models found that the nourished versus malnourished growth trajectories differed significantly. We show through simulation that the results from the two-way ANOVA and LME models are not always consistent. Incorrectly modeling correlated data can result in increased rates of false positives or false negatives, supporting the need to model correlated data correctly. We provide an interactive Shiny App to enable accessible and appropriate analysis of longitudinal data using LME models.

Journal ArticleDOI
TL;DR: In this paper , a latent variable mixed-effects location scale model is developed that combines a longitudinal common factor model and a mixed effects location scales model to characterize within and between-person variation in a common factor.
Abstract: A mixed-effects location scale model allows researchers to study within- and between-person variation in repeated measures. Key components of the model include separate variance models to study predictors of the within-person variance, as well as predictors of the between-person variance of a random effect, such as a random intercept. In this paper, a latent variable mixed-effects location scale model is developed that combines a longitudinal common factor model and a mixed-effects location scale model to characterize within- and between-person variation in a common factor. The model is illustrated using daily reports of positive affect and daily stressors for a large sample of adult women.

Journal ArticleDOI
TL;DR: In this paper , the authors investigated the impact of misspecification of the random effects structure of the model and found that validity is less than 1.0 (anti-conservative) in almost all situations investigated with the exception of case 1 with two sequences.
Abstract: Stepped wedge cluster randomized trials are often analysed using linear mixed effects models that may include random effects for cluster, time and/or treatment. We investigate the impact of misspecification of the random effects structure of the model. Specifically, we considered two cases of misspecification of the random effects in a cross-sectional stepped wedge cluster randomized trials model – fit a linear mixed effects model with random time effects but the true model includes random treatment effects (case 1) or fit a linear mixed effects model with random treatment effect but the true model includes random time effects (case 2) – and derived the variance of the estimated treatment effect under misspecification. We defined two measures of the effect of misspecification: validity and efficiency. Validity is the ratio of the model-based variance of the treatment effect from the mis-specified model divided by the true variance of the treatment effect from the mis-specified model (based on a sandwich estimate of the variance). Efficiency is the ratio of the model-based variance of the treatment effect from the correctly specified model divided by the true variance of the treatment effect from the mis-specified model. We found that validity is less than 1.0 (anti-conservative) in almost all situations investigated with the exception of case 1 with two sequences, when validity could be greater than 1.0. Efficiency is less than 1 in all cases and depends on the intracluster correlation coefficient, the relative magnitude of the variance of the misclassified variance component, and the number of sequences. In general, there is no universal recommendation as to the most robust approach except for the case of a classic stepped wedge cluster randomized trial with only 2 sequences, where fitting a random time model is less likely to lead to anti-conservative inference compared with fitting a random intervention model.

Journal ArticleDOI
TL;DR: In this article , generalized estimating equations (GEE) and generalized linear mixed models (GLMM) were proposed as alternatives to the traditional analysis of variance (ANOVA) and MANOVA for repeated measures (RMANOVA/RMMANOVA).
Abstract: In neuroscience research, longitudinal data are often analysed using analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) for repeated measures (rmANOVA/rmMANOVA). However, these analyses have special requirements: The variances of the differences between all possible pairs of within‐subject conditions (i.e., levels of the independent variable) must be equal. They are also limited to fixed repeated time intervals and are sensitive to missing data. In contrast, other models, such as the generalized estimating equations (GEE) and the generalized linear mixed models (GLMM), suggest another way to think about the data and the studied phenomenon. Instead of forcing the data into the ANOVAs assumptions, it is possible to design a flexible/personalized model according to the nature of the dependent variable. We discuss some advantages of GEE and GLMM as alternatives to rmANOVA and rmMANOVA in neuroscience research, including the possibility of using different distributions for the parameters of the dependent variable, a better approach for different time length points, and better adjustment to missing data. We illustrate these advantages by showing a comparison between rmANOVA and GEE in a real example and providing the data and a tutorial code to reproduce these analyses in R. We conclude that GEE and GLMM may provide more reliable results when compared to rmANOVA and rmMANOVA in neuroscience research, especially in small sample sizes with unbalanced longitudinal designs with or without missing data.

Journal ArticleDOI
27 Apr 2022-PeerJ
TL;DR: In this paper , the authors introduced tree species as a random effect to develop nonlinear mixed-effects CW models for individual trees in multi-species secondary forests, accounting for the effects of competition.
Abstract: Crown width (CW) is an important tree variable and is often used as a covariate predictor in forest growth models. The precise measurement and prediction of CW is therefore critical for forest management. In this study, we introduced tree species as a random effect to develop nonlinear mixed-effects CW models for individual trees in multi-species secondary forests, accounting for the effects of competition. We identified a simple power function for the basic CW model. In addition to diameter at breast height (DBH), other significant predictor variables including height to crown base (HCB), tree height (TH), and competition indices (CI) were selected for the mixed-effects CW model. The sum of relative DBH (SRD) was identified the optimal distance-independent CI and as a covariate predictor for spatially non-explicit CW models, whereas the sum of the Hegyi index for fixed number competitors (SHGN) was the optimal distance-dependent CI for spatially explicit CW models, with significant linear correlation (R2 = 0.943, P < 0.001). Both spatially non-explicit and spatially explicit mixed-effects CW models were developed for studied secondary forests. We found that these models can describe more than 50% of the variation in CW without significant residual trends. Spatially explicit models exhibited a significantly larger effect on CW than spatially non-explicit ones; however, spatially explicit models are computationally complex and difficult and can be replaced by corresponding spatially non-explicit models due to the small differences in the fit statistics. The models we present may be useful for forestry inventory practices and have the potential to aid the evaluation and management of secondary forests in the region.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a statistical model to analyze dyadic interactions between pigs in social interaction data, and performed posterior predictive checks of the model through different validation strategies: stratified 5-fold random cross-validation, block-by-social-group cross validation, and block by-focal-animals validation.

Journal ArticleDOI
TL;DR: In this paper , the authors propose to calculate unexplained variations conditional on individual random and/or fixed effects so as to keep individual heterogeneity brought by available predictors, which can be defined for a generalized linear mixed model using a distance measured along its variance function, accounting for its heteroscedasticity.
Abstract: The coefficient of determination is well defined for linear models and its extension is long wanted for mixed-effects models in agricultural, biological, and ecological research. We revisit its extension to define measures for proportions of variation explained by the whole model, fixed effects only, and random effects only. We propose to calculate unexplained variations conditional on individual random and/or fixed effects so as to keep individual heterogeneity brought by available predictors. While these measures were naturally defined for linear mixed models, they can be defined for a generalized linear mixed model using a distance measured along its variance function, accounting for its heteroscedasticity. We demonstrate the promising performance and utility of our proposed methods via simulation studies as well as applications to real data sets in agricultural and ecological studies.

Journal ArticleDOI
TL;DR: In this paper , the authors extended Rights and Sterba's framework of R2$$ {R}^2 $$ measures for multilevel models, which is based on model-implied variances, to MELS models.
Abstract: Ecological momentary assessment and other modern data collection technologies facilitate research on both within‐subject and between‐subject variability of health outcomes and behaviors. For such intensively measured longitudinal data, Hedeker et al extended the usual two‐level mixed‐effects model to a two‐level mixed‐effects location scale (MELS) model to accommodate covariates' influence as well as random subject effects on both mean (location) and variability (scale) of the outcome. However, there is a lack of existing standardized effect size measures for the MELS model. To fill this gap, our study extends Rights and Sterba's framework of R2$$ {R}^2 $$ measures for multilevel models, which is based on model‐implied variances, to MELS models. Our proposed framework applies to two different specifications of the random location effects, namely, through covariate‐influenced random intercepts and through random intercepts combined with random slopes of observation‐level covariates. We also provide an R function, R2MELS, that outputs summary tables and visualization for values of our R2$$ {R}^2 $$ measures. This framework is validated through a simulation study, and data from a health behaviors study and a depression study are used as examples to demonstrate this framework. These R2$$ {R}^2 $$ measures can help researchers provide greater interpretation of their findings using MELS models.

Journal ArticleDOI
23 May 2022-Forests
TL;DR: In this article , a methodology that established site index modeling of larch plantations with site types as a random effect in northern China was proposed, and the best model (M8) was selected (R2 = 0.5773) as the base model.
Abstract: As the dominant height of the stand at the baseline age, the site index is an important index to evaluate site quality. However, due to the variability of environmental factors, the growth process of the dominant height of the same tree species was variable in different regions which influenced the estimation results of the site index. In this study, a methodology that established site index modeling of larch plantations with site types as a random effect in northern China was proposed. Based on 394 sample plots, nine common base models were developed, and the best model (M8) was selected (R2 = 0.5773) as the base model. Moreover, elevation, aspect, and slope position were the main site factors influencing stand dominant height through the random forest method. Then, the three site factors and their combinations (site types) were selected as random effects and simulated by the nonlinear mixed-effects model based on the model M8. The R2 values had raised from 0.5773 to 0.8678, and the model with combinations (94 kinds) of three site factors had the best performance (R2 = 0.8678). Considering the model accuracy and practical application, the 94 combinations were divided into three groups of site types (3, 5, and 8) by hierarchical clustering. Furthermore, a mixed-effects model considering the random effects of these three groups was established. All the three groups of site types got a better fitting effect (groups 3 R2 = 0.8333, groups 5 R2 = 0.8616, groups 8 R2 = 0.8683), and a better predictive performance (groups 3 R2 = 0.8157, groups 5 R2 = 0.8464, groups 8 R2 = 0.8479 for 20 percent of plots randomly selected per group in the calibration procedure) using the leave-one-out cross-validation approach. Therefore, groups 5 of site types had better applicability and estimation of forest productivity at the regional level and management plan design.

Journal ArticleDOI
TL;DR: In this paper , a method for fitting random regression is outlined in a multi-environment situation, using underlying cubic smoothing splines to model the mean trend over time, which is illustrated on six wheat experiments, using data on grain-filling over thermal time.
Abstract: Context In order to identify best crop genotypes for recommendation to breeders, and ultimately for use in breeding, evaluation is usually conducted in field trials across a range of environments, known as multi-environment trials. Increasingly, many breeding traits are measured over time, for example with high-throughput phenotyping at different growth stages in annual crops or repeated harvests in perennial crops. Aims This study aims to provide an efficient, accurate approach for modelling genotype response over time and across environments, accounting for non-genetic sources of variation such as spatial and temporal correlation. Methods Because the aim is genotype selection, genetic effects are fitted as random effects, and so the approach is based on random regression, in which linear or non-linear models are used to model genotype responses. A method for fitting random regression is outlined in a multi-environment situation, using underlying cubic smoothing splines to model the mean trend over time. This approach is illustrated on six wheat experiments, using data on grain-filling over thermal time. Key results The method correlates genetic effects over time and environments, providing predicted genotype responses while incorporating spatial and temporal correlation between observations. Conclusions The approach provides robust genotype predictions by accounting for temporal and spatial effects simultaneously under various situations including those in which trials have different measurement times or where genotypes within trials are not measured at the same times. The approach facilitates investigation into genotype by environment interaction (G × E) both within and across environments. Implications The models presented have potential to increase accuracy of predictions over measurement times and trials, provide predictions at times other than those observed, and give a greater understanding of G × E interaction, hence improving genotype selection across environments for repeated-measures traits.

Journal ArticleDOI
TL;DR: A peptide‐based linear mixed models tool—PBLMM, a standalone desktop application for differential expression analysis of proteomics data and a Python package that allows streamlined data analysis workflows implementing the PBLMM algorithm are provided.
Abstract: Here, we present a peptide‐based linear mixed models tool—PBLMM, a standalone desktop application for differential expression analysis of proteomics data. We also provide a Python package that allows streamlined data analysis workflows implementing the PBLMM algorithm. PBLMM is easy to use without scripting experience and calculates differential expression by peptide‐based linear mixed regression models. We show that peptide‐based models outperform classical methods of statistical inference of differentially expressed proteins. In addition, PBLMM exhibits superior statistical power in situations of low effect size and/or low sample size. Taken together our tool provides an easy‐to‐use, high‐statistical‐power method to infer differentially expressed proteins from proteomics data.


Journal ArticleDOI
TL;DR: In this paper , the authors proposed a nonlinear mixed-effects model based on quasi-linear modeling, which can provide a more flexible fit than the generalized linear mixed model when there is a non-linear relation between fixed and random effects.
Abstract: The generalized linear mixed model (GLMM) is one of the most common method in the analysis of longitudinal and clustered data in biological sciences. However, issues of model complexity and misspecification can occur when applying the GLMM. To address these issues, we extend the standard GLMM to a nonlinear mixed-effects model based on quasi-linear modeling. An estimation algorithm for the proposed model is provided by extending the penalized quasi-likelihood and the restricted maximum likelihood which are known in the GLMM inference. Also, the conditional AIC is formulated for the proposed model. The proposed model should provide a more flexible fit than the GLMM when there is a nonlinear relation between fixed and random effects. Otherwise, the proposed model is reduced to the GLMM. The performance of the proposed model under model misspecification is evaluated in several simulation studies. In the analysis of respiratory illness data from a randomized controlled trial, we observe the proposed model can capture heterogeneity; that is, it can detect a patient subgroup with specific clinical character in which the treatment is effective.

Journal ArticleDOI
TL;DR: GAMs as mentioned in this paper relax the linearity assumption of rm•ANOVA and LMEM and allow the data to determine the fit of the model while also permitting incomplete observations and different correlation structures.
Abstract: In biomedical research, the outcome of longitudinal studies has been traditionally analyzed using the repeated measures analysis of variance (rm‐ANOVA) or more recently, linear mixed models (LMEMs). Although LMEMs are less restrictive than rm‐ANOVA as they can work with unbalanced data and non‐constant correlation between observations, both methodologies assume a linear trend in the measured response. It is common in biomedical research that the true trend response is nonlinear and in these cases the linearity assumption of rm‐ANOVA and LMEMs can lead to biased estimates and unreliable inference. In contrast, GAMs relax the linearity assumption of rm‐ANOVA and LMEMs and allow the data to determine the fit of the model while also permitting incomplete observations and different correlation structures. Therefore, GAMs present an excellent choice to analyze longitudinal data with non‐linear trends in the context of biomedical research. This paper summarizes the limitations of rm‐ANOVA and LMEMs and uses simulated data to visually show how both methods produce biased estimates when used on data with non‐linear trends. We present the basic theory of GAMs and using reported trends of oxygen saturation in tumors, we simulate example longitudinal data (2 treatment groups, 10 subjects per group, 5 repeated measures for each group) to demonstrate their implementation in R. We also show that GAMs are able to produce estimates with non‐linear trends even when incomplete observations exist (with 40% of the simulated observations missing). To make this work reproducible, the code and data used in this paper are available at: https://github.com/aimundo/GAMs‐biomedical‐research.

Journal ArticleDOI
TL;DR: In this article , the authors decompose the variations of the genotype × environment interaction through fixed multivariate models, as well as to understand the genetic variations through mixed models, for the estimation and prediction of the genetic value of soybean (Glycine max) genotypes in the state of Rio Grande do Sul, Brazil.
Abstract: Abstract The objective of this work was to decompose the variations of the genotype × environment interaction through fixed multivariate models, as well as to understand the genetic variations through mixed models, for the estimation and prediction of the genetic value of soybean (Glycine max) genotypes in the state of Rio Grande do Sul, Brazil. Tests were carried out during the 2016/2017, 2017/2018, and 2018/2019 crop seasons in different municipalities in six regions of the state, using the additive main effects and multiplicative interaction (AMMI) and genotype main effects + genotype-by-environment interaction (GGE) models. The genotypes were also evaluated using an index that allows weighting between mean performance and stability (WAASBY) and by the restricted maximum likelihood (REML) and the best linear unbiased prediction (BLUP) models. The used experimental design was randomized complete blocks (18 environments x 12 genotypes), with three replicates. The best performing genotypes in favorable environments are: 'BMX Valente RR', 'BMX Alvo RR', 'NS 5959 IPRO', 'DM 5958RSF IPRO', and 'BMX Ativa RR'. The favorable environments are the 2017/2018 season in the municipality of Bagé and the 2016/2017 season in the municipalities of São Luiz Gonzaga and Cachoeira do Sul, where higher grain yields were obtained. The genotypes that show excellent performance in unfavorable environments are cultivars BMX Ativa RR, DM 5958RSF IPRO, NS 5959 IPRO, and TMG 7262 RR. The 2016/2017 season is considered unfavorable in the municipalities of São Luiz Gonzaga and Cachoeira do Sul. The AMMI, GGE, and WAASBY or BLUP models for genotype selection must be used simultaneously.

Journal ArticleDOI
08 Sep 2022-Test
TL;DR: In this article , empirical stochastic processes constructed from appropriately ordered and standardized residuals from the model are used to test whether the design matrices of the fitted LMM are correctly specified.
Abstract: Abstract Linear mixed effects models (LMMs) are a popular and powerful tool for analysing grouped or repeated observations for numeric outcomes. LMMs consist of a fixed and a random component, which are specified in the model through their respective design matrices. Verifying the correct specification of the two design matrices is important since mis-specifying them can affect the validity and efficiency of the analysis. We show how to use empirical stochastic processes constructed from appropriately ordered and standardized residuals from the model to test whether the design matrices of the fitted LMM are correctly specified. We define two different processes: one can be used to test whether both design matrices are correctly specified, and the other can be used only to test whether the fixed effects design matrix is correctly specified. The proposed empirical stochastic processes are smoothed versions of cumulative sum processes, which have a nice graphical representation in which model mis-specification can easily be observed. The amount of smoothing can be adjusted, which facilitates visual inspection and can potentially increase the power of the tests. We propose a computationally efficient procedure for estimating p -values in which refitting of the LMM is not necessary. Its validity is shown by using theoretical results and a large Monte Carlo simulation study. The proposed methodology could be used with LMMs with multilevel or crossed random effects.

Journal ArticleDOI
TL;DR: In this article , a generalized additive mixed model (GAMM) is proposed to estimate an unconflated multilevel interaction without assuming a prespecified form of the interaction.
Abstract: A cluster randomized controlled trial (C-RCT) is common in educational intervention studies. Multilevel modelling (MLM) is a dominant analytic method to evaluate treatment effects in a C-RCT. In most MLM applications intended to detect an interaction effect, a single interaction effect (called a conflated effect) is considered instead of level-specific interaction effects in a multilevel design (called unconflated multilevel interaction effects), and the linear interaction effect is modelled. In this paper we present a generalized additive mixed model (GAMM) that allows an unconflated multilevel interaction to be estimated without assuming a prespecified form of the interaction. R code is provided to estimate the model parameters using maximum likelihood estimation and to visualize the nonlinear treatment-by-covariate interaction. The usefulness of the model is illustrated using instructional intervention data from a C-RCT. Results of simulation studies showed that the GAMM outperformed an alternative approach to recover an unconflated logistic multilevel interaction. In addition, the parameter recovery of the GAMM was relatively satisfactory in multilevel designs found in educational intervention studies, except when the number of clusters, cluster sizes, and intraclass correlations were small. When modelling a linear multilevel treatment-by-covariate interaction in the presence of a nonlinear effect, biased estimates (such as overestimated standard errors and overestimated random effect variances) and incorrect predictions of the unconflated multilevel interaction were found.

Journal ArticleDOI
01 Jan 2022
TL;DR: In this article , the authors consider simultaneous optimal prediction and estimation problems in the context of linear random effects models and find analytical formulas for calculating best linear unbiased predictors (BLUPs) of all unknown parameters in the two models by means of solving a constrained quadratic matrix optimization problem in the Löwner sense.
Abstract: <p style='text-indent:20px;'>This paper considers simultaneous optimal prediction and estimation problems in the context of linear random-effects models. Assume a pair of seemingly unrelated linear random-effects models (SULREMs) with the random-effects and the error terms correlated. Our aim is to find analytical formulas for calculating best linear unbiased predictors (BLUPs) of all unknown parameters in the two models by means of solving a constrained quadratic matrix optimization problem in the Löwner sense. We also present a variety of theoretical and statistical properties of the BLUPs under the two models.</p>