scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2019"


Journal ArticleDOI
TL;DR: This paper presents a special capability of Sisvar to deal with fixed effect models with several restriction in the randomization procedure, which lead to models with fixed treatment effects, but with several random errors.
Abstract: This paper presents a special capability of Sisvar to deal with fixed effect models with several restriction in the randomization procedure. These restrictions lead to models with fixed treatment effects, but with several random errors. One way do deal with models of this kind is to perform a mixed model analysis, considering only the error effects in the model as random effects and with different covariance structure for the error terms. Another way is to perform a analysis of variance with several error. These kind of analysis, when the data are balanced, can be done by using Sisvar. The software lead a exact $F$ test for the fixed effects and allow the user to applied multiple comparison procedures or regression analysis for the levels of the fixed effect factors, regarding they are single effects, interaction effects or hierarchical effects. Sisvar is an interesting statistical computer system for using in balanced agricultural and industrial data sets.

398 citations


Book
05 Sep 2019
TL;DR: This chapter discusses Mixed Effects Models with Missing Covariates, Joint Modeling for Longitudinal Data and Survival Data, and Bayesian Joint Models of Longitudinal and Survival data.
Abstract: Introduction Introduction Longitudinal Data and Clustered Data Some Examples Regression Models Mixed Effects Models Complex or Incomplete Data Software Outline and Notation Mixed Effects Models Introduction Linear Mixed Effects (LME) Models Nonlinear Mixed Effects (NLME) Models Generalized Linear Mixed Models (GLMMs) Nonparametric and Semiparametric Mixed Effects Models Computational Strategies Further Topics Software Missing Data, Measurement Errors, and Outliers Introduction Missing Data Mechanisms and Ignorability General Methods for Missing Data EM Algorithms Multiple Imputation General Methods for Measurement Errors General Methods for Outliers Software Mixed Effects Models with Missing Data Introduction Mixed Effects Models with Missing Covariates Approximate Methods Mixed Effects Models with Missing Responses Multiple Imputation Methods Computational Strategies Examples Mixed Effects Models with Covariate Measurement Errors Introduction Measurement Error Models and Methods Two-Step Methods and Regression Calibration Methods Likelihood Methods Approximate Methods Measurement Error and Missing Data Mixed Effects Models with Censoring Introduction Mixed Effects Models with Censored Responses Mixed Effects Models with Censoring and Measurement Errors Mixed Effects Models with Censoring and Missing Data Appendix Survival Mixed Effects (Frailty) Models Introduction Survival Models Frailty Models Survival and Frailty Models with Missing Covariates Frailty Models with Measurement Errors Joint Modeling Longitudinal and Survival Data Introduction Joint Modeling for Longitudinal Data and Survival Data Two-Step Methods Joint Likelihood Inference Joint Models with Incomplete Data Joint Modeling of Several Longitudinal Processes Robust Mixed Effects Models Introduction Robust Methods Mixed Effects Models with Robust Distributions M-Estimators for Mixed Effects Models Robust Inference for Mixed Effects Models with Incomplete Data Generalized Estimating Equations (GEEs) Introduction Marginal Models Estimating Equations with Incomplete Data Discussion Bayesian Mixed Effects Models Introduction Bayesian Methods Bayesian Mixed Effects Models Bayesian Mixed Models with Missing Data Bayesian Models with Covariate Measurement Errors Bayesian Joint Models of Longitudinal and Survival Data Appendix: Background Materials Likelihood Methods The Gibbs Sampler and MCMC Methods Rejection Sampling and Importance Sampling Methods Numerical Integration and the Gauss-Hermite Quadrature Method Optimization Methods and the Newton-Raphson Algorithm Bootstrap Methods Matrix Algebra and Vector Differential Calculus References Index Abstract

244 citations


Book ChapterDOI
28 Oct 2019
TL;DR: In this article, the authors describe a class of statistical model that is able to account for most of the cases of nonindependence that are typically encountered in psychological experiments, linear mixed-effects models, or mixed models for short.
Abstract: This chapter describes a class of statistical model that is able to account for most of the cases of nonindependence that are typically encountered in psychological experiments, linear mixed-effects models, or mixed models for short. It introduces the concepts underlying mixed models and how they allow accounting for different types of nonindependence that can occur in psychological data. The chapter discusses how to set up a mixed model and how to perform statistical inference with a mixed model. The most important concept for understanding how to estimate and how to interpret mixed models is the distinction between fixed and random effects. One important characteristic of mixed models is that they allow random effects for multiple, possibly independent, random effects grouping factors. Mixed models are a modern class of statistical models that extend regular regression models by including random-effects parameters to account for dependencies among related data points.

211 citations


Journal ArticleDOI
TL;DR: It is shown how the one-stage method for meta-analysis of non-linear curves is particularly suited for dose–response meta-analyses of aggregated where the complexity of the research question is better addressed by including all the studies.
Abstract: The standard two-stage approach for estimating non-linear dose-response curves based on aggregated data typically excludes those studies with less than three exposure groups. We develop the one-stage method as a linear mixed model and present the main aspects of the methodology, including model specification, estimation, testing, prediction, goodness-of-fit, model comparison, and quantification of between-studies heterogeneity. Using both fictitious and real data from a published meta-analysis, we illustrated the main features of the proposed methodology and compared it to a traditional two-stage analysis. In a one-stage approach, the pooled curve and estimates of the between-studies heterogeneity are based on the whole set of studies without any exclusion. Thus, even complex curves (splines, spike at zero exposure) defined by several parameters can be estimated. We showed how the one-stage method may facilitate several applications, in particular quantification of heterogeneity over the exposure range, prediction of marginal and conditional curves, and comparison of alternative models. The one-stage method for meta-analysis of non-linear curves is implemented in the dosresmeta R package. It is particularly suited for dose-response meta-analyses of aggregated where the complexity of the research question is better addressed by including all the studies.

173 citations


Journal ArticleDOI
TL;DR: This paper proposes the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework, and shows that all the proposed SMMATs correctly control type I error rates for both continuous andbinary traits in the presence of population structure and relatedness.
Abstract: With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.

88 citations


Journal ArticleDOI
TL;DR: A new approach to account for heteroscedasticity and covariance among observations present in residual error or induced by random effects is proposed and is universally applicable for arbitrary variance-covariance structures including spatial models and repeated measures.
Abstract: Extensions of linear models are very commonly used in the analysis of biological data. Whereas goodness of fit measures such as the coefficient of determination (R2 ) or the adjusted R2 are well established for linear models, it is not obvious how such measures should be defined for generalized linear and mixed models. There are by now several proposals but no consensus has yet emerged as to the best unified approach in these settings. In particular, it is an open question how to best account for heteroscedasticity and for covariance among observations present in residual error or induced by random effects. This paper proposes a new approach that addresses this issue and is universally applicable for arbitrary variance-covariance structures including spatial models and repeated measures. It is exemplified using three biological examples.

63 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the best model for drought forecasting and determined which differences if any were present in model performance using standardised precipitation index (SPI), in addition, the most effective combination of the SPI with its respective timescale and lead time was investigated.
Abstract: Quality and reliable drought prediction is essential for mitigation strategies and planning in disaster-stricken regions globally. Prediction models such as empirical or data-driven models play a fundamental role in forecasting drought. However, selecting a suitable prediction model remains a challenge because of the lack of succinct information available on model performance. Therefore, this review evaluated the best model for drought forecasting and determined which differences if any were present in model performance using standardised precipitation index (SPI). In addition, the most effective combination of the SPI with its respective timescale and lead time was investigated. The effectiveness of data-driven models was analysed using meta-regression analysis by applying a linear mixed model to the coefficient of determination and the root mean square error of the validated model results. Wavelet-transformed neural networks had superior performance with the highest correlation and minimum error. Preprocessing data to eliminate non-stationarity performed substantially better than did the regular artificial neural network (ANN) model. Additionally, the best timescale to calculate the SPI was 24 and 12 months and a lead time of 1–3 months provided the most accurate forecasts. Studies from China and Sicily had the most variation based on geographical location as a random effect; while studies from India rendered consistent results overall. Variation in the result can be attributed to geographical differences, seasonal influence, incorporation of climate indices and author bias. Conclusively, this review recommends use of the wavelet-based ANN (WANN) model to forecast drought indices.

54 citations


Journal ArticleDOI
TL;DR: The results indicate the usefulness of FFNNs as they are found to be statistically significantly superior for modelling the particulate matter spatial variability and selected for forecasting air quality limit exceedances set by the European Union and World Health Organization air quality standards.

37 citations


Journal ArticleDOI
TL;DR: Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines as mentioned in this paper, however, despite the best efforts of generations of statisticians, they have not yet achieved the state-of-the-art performance.
Abstract: Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines. Despite the best efforts of generations of statisticia...

33 citations


Journal ArticleDOI
TL;DR: A distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, is introduced to handle diverse types of traits, such as Gaussian, Binomial or Poisson traits, and is found to be robustly powerful while correctly controlling type I error rates.
Abstract: Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.

30 citations


Journal ArticleDOI
TL;DR: This paper proposes to reformulate the two‐sided PI to be generalizable under a wide variety of designs (one random factor, nested and crossed designs for multiple random factors, and balanced or unbalanced designs).
Abstract: The literature about Prediction Interval (PI) and Tolerance Interval (TI) in linear mixed models is usually developed for specific designs, which is a main limitation to their use. This paper proposes to reformulate the two-sided PI to be generalizable under a wide variety of designs (one random factor, nested and crossed designs for multiple random factors, and balanced or unbalanced designs). This new methodology is based on the Hessian matrix, namely, the inverse of (observed) Fisher Information matrix, and is built with a cell mean model. The degrees of freedom for the total variance are calculated with the generalized Satterthwaite method and compared to the Kenward-Roger's degrees of freedom for fixed effects. Construction of two-sided TIs are also detailed with one random factor, and two nested and two crossed random variables. An extensive simulation study is carried out to compare the widths and coverage probabilities of Confidence Intervals (CI), PIs, and TIs to their nominal levels. It shows excellent coverage whatever the design and the sample size are. Finally, these CIs, PIs, and TIs are applied to two real data sets: one from orthopedic surgery study (intralesional resection risk) and the other from assay validation study during vaccine development.

Journal ArticleDOI
TL;DR: The results obtained by the statistical indicators and the graphical analysis show the great potential of AI techniques and mixed models in the estimation of volume and biomass of individual trees in Brazilian savanna vegetation.

Journal ArticleDOI
01 Apr 2019
TL;DR: This paper proposes Bayesian spatial variable selection for Pareto regression based on Bradley et al. and Hu and shows that these two Bayesian criteria have analytic connections with conditional AIC under the linear mixed model setting.
Abstract: Generalized linear models are routinely used in many environment statistics problems such as earthquake magnitudes prediction. Hu et al. proposed Pareto regression with spatial random effects for earthquake magnitudes. In this paper, we propose Bayesian spatial variable selection for Pareto regression based on Bradley et al. and Hu et al. to tackle variable selection issue in generalized linear regression models with spatial random effects. A Bayesian hierarchical latent multivariate log gamma model framework is applied to account for spatial random effects to capture spatial dependence. We use two Bayesian model assessment criteria for variable selection including Conditional Predictive Ordinate (CPO) and Deviance Information Criterion (DIC). Furthermore, we show that these two Bayesian criteria have analytic connections with conditional AIC under the linear mixed model setting. We examine empirical performance of the proposed method via a simulation study and further demonstrate the applicability of the proposed method in an analysis of the earthquake data obtained from the United States Geological Survey (USGS).

Journal ArticleDOI
TL;DR: This work applies a novel approach for the flexible modeling of complex exposure‐lag‐response associations in time‐to‐event data to analyze the association of both the timing and the amount of artificial nutrition with the short term survival of critically ill patients.
Abstract: We propose a novel approach for the flexible modeling of complex exposure-lag-response associations in time-to-event data, where multiple past exposures within a defined time window are cumulatively associated with the hazard. Our method allows for the estimation of a wide variety of effects, including potentially smooth and smoothly time-varying effects as well as cumulative effects with leads and lags, taking advantage of the inference methods that have recently been developed for generalized additive mixed models. We apply our method to data from a large observational study of intensive care patients in order to analyze the association of both the timing and the amount of artificial nutrition with the short term survival of critically ill patients. We evaluate the properties of the proposed method by performing extensive simulation studies and provide a systematic comparison with related approaches.

Journal ArticleDOI
TL;DR: This simulation study looks at the effects of misspecifying an LMM for SCED count data simulated according to a generalized linear mixed model (GLMM), and compares the performance of a misspecified LMM and of a GLMM in terms of goodness of fit, fixed effect parameter recovery, type I error rate, and power.
Abstract: When (meta-)analyzing single-case experimental design (SCED) studies by means of hierarchical or multilevel modeling, applied researchers almost exclusively rely on the linear mixed model (LMM). This type of model assumes that the residuals are normally distributed. However, very often SCED studies consider outcomes of a discrete rather than a continuous nature, like counts, percentages or rates. In those cases the normality assumption does not hold. The LMM can be extended into a generalized linear mixed model (GLMM), which can account for the discrete nature of SCED count data. In this simulation study, we look at the effects of misspecifying an LMM for SCED count data simulated according to a GLMM. We compare the performance of a misspecified LMM and of a GLMM in terms of goodness of fit, fixed effect parameter recovery, type I error rate, and power. Because the LMM and the GLMM do not estimate identical fixed effects, we provide a transformation to compare the fixed effect parameter recovery. The results show that, compared to the GLMM, the LMM has worse performance in terms of goodness of fit and power. Performance in terms of fixed effect parameter recovery is equally good for both models, and in terms of type I error rate the LMM performs better than the GLMM. Finally, we provide some guidelines for applied researchers about aspects to consider when using an LMM for analyzing SCED count data.

Journal ArticleDOI
TL;DR: A likelihood ratio test procedure is studied, to test that the variances of any subset of therandom effects are equal to zero in nonlinear mixed effects model, and it is highlighted that the limiting distribution depends strongly on the presence of correlations between the random effects.


Journal ArticleDOI
TL;DR: A novel two-stage approach to estimate and map disease risk in the presence of local discontinuities and clusters is proposed in both spatial and spatio-temporal domains, and the methodology is applied to two important public health problems in Spain.
Abstract: Disease risk maps for areal unit data are often estimated from Poisson mixed models with local spatial smoothing, for example by incorporating random effects with a conditional autoregressive prior distribution. However, one of the limitations is that local discontinuities in the spatial pattern are not usually modelled, leading to over-smoothing of the risk maps and a masking of clusters of hot/coldspot areas. In this paper, we propose a novel two-stage approach to estimate and map disease risk in the presence of such local discontinuities and clusters. We propose approaches in both spatial and spatio-temporal domains, where for the latter the clusters can either be fixed or allowed to vary over time. In the first stage, we apply an agglomerative hierarchical clustering algorithm to training data to provide sets of potential clusters, and in the second stage, a two-level spatial or spatio-temporal model is applied to each potential cluster configuration. The superiority of the proposed approach with regard to a previous proposal is shown by simulation, and the methodology is applied to two important public health problems in Spain, namely stomach cancer mortality across Spain and brain cancer incidence in the Navarre and Basque Country regions of Spain.

Journal ArticleDOI
TL;DR: In this paper, a two-stage method and a joint model were proposed to reduce the bias due to regression dilution, where the mixed effects submodel and time-to-event submodel were simultaneously fitted using shared random effects.
Abstract: The association between visit-to-visit systolic blood pressure variability and cardiovascular events has recently received a lot of attention in the cardiovascular literature. But, blood pressure variability is usually estimated on a person-by-person basis and is therefore subject to considerable measurement error. We demonstrate that hazard ratios estimated using this approach are subject to bias due to regression dilution, and we propose alternative methods to reduce this bias: a two-stage method and a joint model. For the two-stage method, in stage one, repeated measurements are modelled using a mixed effects model with a random component on the residual standard deviation (SD). The mixed effects model is used to estimate the blood pressure SD for each individual, which, in stage two, is used as a covariate in a time-to-event model. For the joint model, the mixed effects submodel and time-to-event submodel are fitted simultaneously using shared random effects. We illustrate the methods using data from the Atherosclerosis Risk in Communities study.

Journal ArticleDOI
TL;DR: A Bayesian approach for analyzing high-dimensional multinomial data that are referenced over space and time by assuming a logit link to a latent spatio-temporal mixed effects model, which allows for covariances that are nonstationarity in both space andTime, asymmetric, and parsimonious.
Abstract: We introduce a Bayesian approach for analyzing high‐dimensional multinomial data that are referenced over space and time. In particular, the proportions associated with multinomial data are assumed to have a logit link to a latent spatio‐temporal mixed effects model. This strategy allows for covariances that are nonstationarity in both space and time, asymmetric, and parsimonious. We also introduce the use of the conditional multivariate logit‐beta distribution into the dependent multinomial data setting, which leads to conjugate full‐conditional distributions for use in a collapsed Gibbs sampler. We refer to this model as the multinomial spatio‐temporal mixed effects model (MN‐STM). Additionally, we provide methodological developments including: the derivation of the associated full‐conditional distributions, a relationship with a latent Gaussian process model, and the stability of the non‐stationary vector autoregressive model. We illustrate the MN‐STM through simulations and through a demonstration with public‐use quarterly workforce indicators data from the longitudinal employer household dynamics program of the US Census Bureau.

Journal ArticleDOI
01 Jun 2019-Test
TL;DR: In this article, the generalized beta of the second kind (GB2) distribution is proposed to model skewed response variables, which is a more flexible version of the GB2 distribution, with four parameters with two of them controlling the shape of each tail.
Abstract: Models with random (or mixed) effects are commonly used for panel data, in microarrays, small area estimation and many other applications. When the variable of interest is continuous, normality is commonly assumed, either in the original scale or after some transformation. However, the normal distribution is not always well suited for modeling data on certain variables, such as those found in Econometrics, which often show skewness even at the log scale. Finding the correct transformation to achieve normality is not straightforward since the true distribution is not known in practice. As an alternative, we propose to consider a much more flexible distribution called generalized beta of the second kind (GB2). The GB2 distribution contains four parameters with two of them controlling the shape of each tail, which makes it very flexible to accommodate different forms of skewness. Based on a multivariate extension of the GB2 distribution, we propose a new model with random effects designed for skewed response variables that extends the usual log-normal-nested error model. Under this new model, we find empirical best predictors of linear and nonlinear characteristics, including poverty indicators, in small areas. Simulation studies illustrate the good properties, in terms of bias and efficiency, of the estimators based on the proposed multivariate GB2 model. Results from an application to poverty mapping in Spanish provinces also indicate efficiency gains with respect to the conventional log-normal-nested error model used for poverty mapping.

Journal ArticleDOI
TL;DR: The proposed models represent missingness via a latent trait that corresponds to the students' "ability" to respond to the prompting device that was a significant predictor of both positive affect and negative affect outcomes.
Abstract: Latent trait shared-parameter mixed models for ecological momentary assessment (EMA) data containing missing values are developed in which data are collected in an intermittent manner. In such studies, data are often missing due to unanswered prompts. Using item response theory models, a latent trait is used to represent the missing prompts and modeled jointly with a mixed model for bivariate longitudinal outcomes. Both one- and two-parameter latent trait shared-parameter mixed models are presented. These new models offer a unique way to analyze missing EMA data with many response patterns. Here, the proposed models represent missingness via a latent trait that corresponds to the students' "ability" to respond to the prompting device. Data containing more than 10 300 observations from an EMA study involving high school students' positive and negative affects are presented. The latent trait representing missingness was a significant predictor of both positive affect and negative affect outcomes. The models are compared to a missing at random mixed model. A simulation study indicates that the proposed models can provide lower bias and increased efficiency compared to the standard missing at random approach commonly used with intermittent missing longitudinal data.

Posted Content
TL;DR: In this paper, a multivariate Poisson log-normal mixed model and a logistic linear mixed model were proposed for single-cell data and the authors used Hamiltonian Monte Carlo to provide Bayesian uncertainty quantification.
Abstract: Mass cytometry technology enables the simultaneous measurement of over 40 proteins on single cells. This has helped immunologists to increase their understanding of heterogeneity, complexity, and lineage relationships of white blood cells. Current statistical methods often collapse the rich single-cell data into summary statistics before proceeding with downstream analysis, discarding the information in these multivariate datasets. In this article, our aim is to exhibit the use of statistical analyses on the raw, uncompressed data thus improving replicability, and exposing multivariate patterns and their associated uncertainty profiles. We show that multivariate generative models are a valid alternative to univariate hypothesis testing. We propose two models: a multivariate Poisson log-normal mixed model and a logistic linear mixed model. We show that these models are complementary and that either model can account for different confounders. We use Hamiltonian Monte Carlo to provide Bayesian uncertainty quantification. Our models applied to a recent pregnancy study successfully reproduce key findings while quantifying increased overall protein-to-protein correlations between first and third trimester.

Journal ArticleDOI
TL;DR: The R2 statistic as mentioned in this paper measures the proportion of generalized variance explained by fixed effects in the linear mixed model, and is used to evaluate covariance model selection in a manner similar to information criteria.
Abstract: The linear mixed model, sometimes referred to as the multi-level model, is one of the most widely used tools for analyses involving clustered data. Various definitions of R2 have been proposed for the linear mixed model, but several limitations prevail. Presently, there is no method to compute R2 for the linear mixed model that accommodates an interpretation based on variance partitioning, a method to quantify uncertainty and produce confidence limits for the R2 statistic, and a capacity to use the R2 statistic to conduct covariance model selection in a manner similar to information criteria. In this article, we introduce such an R2 statistic. The proposed R2 measures the proportion of generalized variance explained by fixed effects in the linear mixed model. Simulated and real longitudinal data are used to illustrate the statistical properties of the proposed R2 and its capacity to be applied to covariance model selection.

Journal ArticleDOI
TL;DR: In this paper, the large-eddy simulation (LES) of isothermal turbulent channel flows is studied and the performance of the subgrid-scale models is assessed using the same finite difference numerical method and physical configuration.
Abstract: This paper studies the large-eddy simulation (LES) of isothermal turbulent channel flows. We investigate zero-equation algebraic models without wall function or wall model: functional models, structural models, and mixed models. In addition to models from the literature, new models are proposed and their relevance is examined. Dynamic versions of each type of model are also analyzed. The performance of the subgrid-scale models is assessed using the same finite difference numerical method and physical configuration. The friction Reynolds number of the simulations is 180. Three different mesh resolutions are used. The predictions of large-eddy simulations are compared to those of a direct numerical simulation filtered at the resolution of the LES meshes. The results are more accurate than those of a simulation without model. The predictions of functional eddy-viscosity models can be improved using constant-parameter or dynamic tensorial methods.

Journal ArticleDOI
TL;DR: It is shown that despite the great popularity of LMMs in environmental monitoring and pollution assessment, LMMs are statistically inferior to GMRF for measuring PM in the Northeastern USA.

Journal ArticleDOI
TL;DR: In this paper, four area-level Poisson mixed models with time effects are proposed and a parametric bootstrap algorithm is given to measure the accuracy of the plug-in predictor of fire number under the temporal models.
Abstract: Wildfires are considered one of the main causes of forest destruction. In recent years, the number of forest fires and burned area in Mediterranean regions have increased. This problem particularly affects Galicia (north-west of Spain). Conventional modelling of the number of forest fires in small areas may have a high error. For this reason, four area-level Poisson mixed models with time effects are proposed. The first two models contain independent time effects, whereas the random effects of the other models are distributed according to an autoregressive process AR(1). A parametric bootstrap algorithm is given to measure the accuracy of the plug-in predictor of fire number under the temporal models. A significant prediction improvement is observed when using Poisson regression models with random time effects. Analysis of historical data finds significant meteorological and socioeconomic variables explaining the number of forest fires by area and reveals the presence of a temporal correlation structure captured by the area-level Poisson mixed model with AR(1) time effects.

Journal ArticleDOI
TL;DR: A novel method for the estimation of variance parameters in generalised linear mixed models where multiple quadratic penalties act on the same regression coefficients is presented and penalised splines for locally adaptive smoothness and for hierarchical curve data are discussed.
Abstract: We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (J Am Stat Assoc 72(358):320–338, 1977)’s work, but it is able to deal with models that have a precision matrix for the random effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (separation of overlapping precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.

Journal ArticleDOI
TL;DR: This article introduces several Bayesian MTC models based on baseline treatment contrasts and evaluates the practical advantages of these models to produce yield ratio estimates, revealing that the model showing the lowest deviance information criterion (DIC) includes both study random effects and study-specific residual variances.
Abstract: The mixed treatment comparison (MTC) method has been proposed to combine results across trials comparing several treatments. MTC allows coherent judgments on which of the treatments is the most effective. It produces estimates of the relative effects of each treatment compared with every other treatment by pooling direct and indirect evidence. In this article, we explore how this methodological framework can be used to rank a large number of agricultural crop species from yield data collected in field experiments. Our approach is illustrated in a meta-analysis of yield data obtained in 67 field studies for 36 different bioenergy crop species. The considered dataset defines a network of comparisons of crop species. We introduce several Bayesian MTC models based on baseline treatment contrasts and evaluate the practical advantages of these models to produce yield ratio estimates. We explore the consistency of some estimates by node-splitting and compare our results to those obtained with a classical two-way linear mixed model. Results reveal that the model showing the lowest deviance information criterion (DIC) includes both study random effects and study-specific residual variances. But all the tested models including study random effects lead to similar yield ratio estimates. The proposed Bayesian framework allows an in-depth analysis of the uncertainty in the species ranking.

Journal ArticleDOI
TL;DR: The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant impact on the microbiome composition only in infected subjects who remained infected.
Abstract: Clustered overdispersed multivariate count data are challenging to model due to the presence of correlation within and between samples. Typically, the first source of correlation needs to be addressed but its quantification is of less interest. Here, we focus on the correlation between time points. In addition, the effects of covariates on the multivariate counts distribution need to be assessed. To fulfill these requirements, a regression model based on the Dirichlet-multinomial distribution for association between covariates and the categorical counts is extended by using random effects to deal with the additional clustering. This model is the Dirichlet-multinomial mixed regression model. Alternatively, a negative binomial regression mixed model can be deployed where the corresponding likelihood is conditioned on the total count. It appears that these two approaches are equivalent when the total count is fixed and independent of the random effects. We consider both subject-specific and categorical-specific random effects. However, the latter has a larger computational burden when the number of categories increases. Our work is motivated by microbiome data sets obtained by sequencing of the amplicon of the bacterial 16S rRNA gene. These data have a compositional structure and are typically overdispersed. The microbiome data set is from an epidemiological study carried out in a helminth-endemic area in Indonesia. The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant effect on the microbiome composition only in infected subjects who remained infected.