Showing papers on "Mixed model published in 2017"

PDF

Open Access

Journal Article•DOI•

lmerTest Package: Tests in Linear Mixed Effects Models

[...]

Alexandra Kuznetsova, Per B. Brockhoff, Rune Haubo Bojesen Christensen

06 Dec 2017-Journal of Statistical Software

TL;DR: The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects, and implementing the Satterthwaite's method for approximating degrees of freedom for the t and F tests.

...read moreread less

Abstract: One of the frequent questions by users of the mixed model function lmer of the lme4 package has been: How can I get p values for the F and t tests for objects returned by lmer? The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects. We have implemented the Satterthwaite's method for approximating degrees of freedom for the t and F tests. We have also implemented the construction of Type I - III ANOVA tables. Furthermore, one may also obtain the summary as well as the anova table using the Kenward-Roger approximation for denominator degrees of freedom (based on the KRmodcomp function from the pbkrtest package). Some other convenient mixed model analysis tools such as a step method, that performs backward elimination of nonsignificant effects - both random and fixed, calculation of population means and multiple comparison tests together with plot facilities are provided by the package as well.

...read moreread less

12,305 citations

Book•

Generalized Additive Models: An Introduction with R, Second Edition

[...]

Simon N Wood

30 May 2017

TL;DR: In this article, a simple linear model is proposed to describe the geometry of linear models, and a general linear model specification in R is presented. But the theory of linear model theory is not discussed.

...read moreread less

Abstract: LINEAR MODELS A simple linear model Linear models in general The theory of linear models The geometry of linear modelling Practical linear models Practical modelling with factors General linear model specification in R Further linear modelling theory Exercises GENERALIZED LINEAR MODELS The theory of GLMs Geometry of GLMs GLMs with R Likelihood Exercises INTRODUCING GAMS Introduction Univariate smooth functions Additive models Generalized additive models Summary Exercises SOME GAM THEORY Smoothing bases Setting up GAMs as penalized GLMs Justifying P-IRLS Degrees of freedom and residual variance estimation Smoothing Parameter Estimation Criteria Numerical GCV/UBRE: performance iteration Numerical GCV/UBRE optimization by outer iteration Distributional results Confidence interval performance Further GAM theory Other approaches to GAMs Exercises GAMs IN PRACTICE: mgcv Cherry trees again Brain imaging example Air pollution in Chicago example Mackerel egg survey example Portuguese larks example Other packages Exercises MIXED MODELS and GAMMs Mixed models for balanced data Linear mixed models in general Linear mixed models in R Generalized linear mixed models GLMMs with R Generalized additive mixed models GAMMs with R Exercises APPENDICES A Some matrix algebra B Solutions to exercises Bibliography Index

...read moreread less

8,393 citations

Journal Article•DOI•

glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling

[...]

Mollie Elizabeth Brooks, Kasper Kristensen, Koen J. van Benthem¹, Arni Magnusson, Casper Willestofte Berg, Anders Nielsen, Hans J. Skaug², Martin Mächler³, Benjamin M. Bolker⁴ - Show less +5 more•Institutions (4)

University of Zurich¹, University of Bergen², École Polytechnique Fédérale de Lausanne³, McMaster University⁴

01 Dec 2017-R Journal

TL;DR: The glmmTMB package fits many types of GLMMs and extensions, including models with continuously distributed responses, but here the authors focus on count responses and its ability to estimate the Conway-Maxwell-Poisson distribution parameterized by the mean is unique.

...read moreread less

Abstract: Count data can be analyzed using generalized linear mixed models when observations are correlated in ways that require random effects However, count data are often zero-inflated, containing more zeros than would be expected from the typical error distributions We present a new package, glmmTMB, and compare it to other R packages that fit zero-inflated mixed models The glmmTMB package fits many types of GLMMs and extensions, including models with continuously distributed responses, but here we focus on count responses glmmTMB is faster than glmmADMB, MCMCglmm, and brms, and more flexible than INLA and mgcv for zero-inflated modeling One unique feature of glmmTMB (among packages that fit zero-inflated mixed models) is its ability to estimate the Conway-Maxwell-Poisson distribution parameterized by the mean Overall, its most appealing features for new users may be the combination of speed, flexibility, and its interface’s similarity to lme4

...read moreread less

4,497 citations

Journal Article•DOI•

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

[...]

David R. Roberts¹, Volker Bahn², Simone Ciuti¹, Mark S. Boyce³, Jane Elith⁴, Gurutzeta Guillera-Arroita⁴, Severin Hauenstein¹, José J. Lahoz-Monfort⁴, Boris Schröder, Wilfried Thuiller, David I. Warton⁵, Brendan A. Wintle⁴, Florian Hartig⁶, Florian Hartig¹, Carsten F. Dormann¹ - Show less +11 more•Institutions (6)

University of Freiburg¹, Wright State University², University of Alberta³, University of Melbourne⁴, University of New South Wales⁵, University of Regensburg⁶

01 Aug 2017-Ecography

TL;DR: It is recommended that block cross-validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

...read moreread less

Abstract: Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross-validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross-validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non-causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross-validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non-random and blocked cross-validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross-validation is nearly universally more appropriate than random cross-validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross-validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

...read moreread less

998 citations

Journal Article•DOI•

Estimation of extended mixed models using latent classes and latent processes: The R package lcmm

[...]

Cécile Proust-Lima¹, Viviane Philipps, Benoit Liquet•Institutions (1)

University of Bordeaux¹

01 Jun 2017-Journal of Statistical Software

TL;DR: This paper constitutes a companion paper to the R package lcmm by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.

...read moreread less

Abstract: The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event outcome that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.

...read moreread less

470 citations

Journal Article•DOI•

Methodological implementation of mixed linear models in multi-locus genome-wide association studies.

[...]

Yang Jun Wen¹, Hanwen Zhang², Yuan Li Ni¹, Bo Huang¹, Jin Zhang¹, Jian Ying Feng¹, Shibo Wang³, Jim M. Dunwell⁴, Yuan-Ming Zhang¹, Yuan-Ming Zhang³, Rongling Wu⁵, Rongling Wu⁶ - Show less +8 more•Institutions (6)

Nanjing Agricultural University¹, University of British Columbia², Huazhong Agricultural University³, University of Reading⁴, Beijing Forestry University⁵, Pennsylvania State University⁶

01 Sep 2017-Briefings in Bioinformatics

TL;DR: A fast multi‐locus random‐SNP‐effect EMMA (FASTmrEMMA) model for GWAS, built on random single nucleotide polymorphism (SNP) effects and a new algorithm that whitens the covariance matrix of the polygenic matrix K and environmental noise, and specifies the number of nonzero eigenvalues as one.

...read moreread less

Abstract: The mixed linear model has been widely used in genome-wide association studies (GWAS), but its application to multi-locus GWAS analysis has not been explored and assessed. Here, we implemented a fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) model for GWAS. The model is built on random single nucleotide polymorphism (SNP) effects and a new algorithm. This algorithm whitens the covariance matrix of the polygenic matrix K and environmental noise, and specifies the number of nonzero eigenvalues as one. The model first chooses all putative quantitative trait nucleotides (QTNs) with ≤ 0.005 P-values and then includes them in a multi-locus model for true QTN detection. Owing to the multi-locus feature, the Bonferroni correction is replaced by a less stringent selection criterion. Results from analyses of both simulated and real data showed that FASTmrEMMA is more powerful in QTN detection and model fit, has less bias in QTN effect estimation and requires a less running time than existing single- and multi-locus methods, such as empirical Bayes, settlement of mixed linear model under progressively exclusive relationship (SUPER), efficient mixed model association (EMMA), compressed MLM (CMLM) and enriched CMLM (ECMLM). FASTmrEMMA provides an alternative for multi-locus GWAS.

...read moreread less

201 citations

Journal Article•DOI•

A unified framework for variance component estimation with summary statistics in genome-wide association studies

[...]

Xiang Zhou¹•Institutions (1)

University of Michigan¹

01 Dec 2017-The Annals of Applied Statistics

TL;DR: MQS is based on the method of moments and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework.

...read moreread less

Abstract: Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z-scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.

...read moreread less

118 citations

Journal Article•DOI•

Methods for Analysis of Pre-Post Data in Clinical Research: A Comparison of Five Common Methods

[...]

Nathaniel O'Connell¹, Lin Dai¹, Yunyun Jiang¹, Jaime L. Speiser¹, Ralph C. Ward¹, Wei Wei¹, Rachel Carroll¹, Mulugeta Gebregziabher¹ - Show less +4 more•Institutions (1)

Medical University of South Carolina¹

24 Feb 2017-Journal of biometrics & biostatistics

TL;DR: The results demonstrate that each method leads to unbiased treatment effect estimates, and based on precision of estimates, 95% coverage probability, and power, ANCOVA modeling of either change scores or post-treatment score as the outcome, prove to be the most effective.

...read moreread less

Abstract: Often repeated measures data are summarized into pre-post-treatment measurements. Various methods exist in the literature for estimating and testing treatment effect, including ANOVA, analysis of covariance (ANCOVA), and linear mixed modeling (LMM). Under the first two methods, outcomes can either be modeled as the post treatment measurement (ANOVA-POST or ANCOVA-POST), or a change score between pre and post measurements (ANOVACHANGE, ANCOVA-CHANGE). In LMM, the outcome is modeled as a vector of responses with or without Kenward- Rogers adjustment. We consider five methods common in the literature, and discuss them in terms of supporting simulations and theoretical derivations of variance. Consistent with existing literature, our results demonstrate that each method leads to unbiased treatment effect estimates, and based on precision of estimates, 95% coverage probability, and power, ANCOVA modeling of either change scores or post-treatment score as the outcome, prove to be the most effective. We further demonstrate each method in terms of a real data example to exemplify comparisons in real clinical context.

...read moreread less

103 citations

Journal Article•DOI•

A guide to analyzing biodiversity experiments

[...]

Bernhard Schmid¹, Martin Baruffol¹, Zhiheng Wang², Zhiheng Wang¹, Pascal A. Niklaus¹ - Show less +1 more•Institutions (2)

University of Zurich¹, Peking University²

01 Feb 2017-Journal of Plant Ecology

TL;DR: In this article, the authors use least squares-based linear models (LMs) together with restricted maximum likelihood-based mixed models (MMs) for the analysis of hierarchical data.

...read moreread less

Abstract: Aims: The aim of this guide is to provide practical help for ecologists who analyze data from biodiversity–ecosystem functioning experiments. Our approach differs from others in the use of least squares-based linear models (LMs) together with restricted maximum likelihood-based mixed models (MMs) for the analysis of hierarchical data. An original data set containing diameter and height of young trees grown in monocultures, 2- or 4-species mixtures under ambient light or shade is used as an example. Methods: Starting with a simple LM, basic features of model fitting and the subsequent analysis of variance (ANOVA) for significance tests are summarized. From this, more complex models are developed. We use the statistical software R for model fitting and to demonstrate similarities and complementarities between LMs and MMs. The formation of contrasts and the use of error (LMs) or random-effects (MMs) terms to account for hierarchical data structure in ANOVAs are explained. Important Findings: Data from biodiversity experiments can be analyzed at the level of entire plant communities (plots) and plant individuals. The basic explanatory term is species composition, which can be divided into contrasts in many ways depending on specific biological hypotheses. Typically, these contrasts code for aspects of species richness or the presence of particular species. For significance tests in ANOVAs, contrast terms generally are compared with remaining variation of the explanatory terms from which they have been ‘carved out’. Once a final model has been selected, parameters (e.g. means or slopes for fixed-effects terms and variance components for error or random-effects terms) can be estimated to indicate the direction and size of effects.

...read moreread less

82 citations

Journal Article•DOI•

Modelling spatial trends in sorghum breeding field trials using a two-dimensional P-spline mixed model

[...]

Julio Gabriel Velazco¹, Julio Gabriel Velazco², María Xosé Rodríguez-Álvarez³, Martin P. Boer², David Jordan⁴, Paul H. C. Eilers⁵, Marcos Malosetti², Fred A. van Eeuwijk² - Show less +4 more•Institutions (5)

International Trademark Association¹, Wageningen University and Research Centre², Basque Center for Applied Mathematics³, University of Queensland⁴, Erasmus University Rotterdam⁵

03 Apr 2017-Theoretical and Applied Genetics

TL;DR: A flexible and user-friendly spatial method called SpATS performed comparably to more elaborate and trial-specific spatial models in a series of sorghum breeding trials, and should be considered as an efficient and easy-to-use alternative for routine analyses of plant breeding trials.

...read moreread less

Abstract: A flexible and user-friendly spatial method called SpATS performed comparably to more elaborate and trial-specific spatial models in a series of sorghum breeding trials. Adjustment for spatial trends in plant breeding field trials is essential for efficient evaluation and selection of genotypes. Current mixed model methods of spatial analysis are based on a multi-step modelling process where global and local trends are fitted after trying several candidate spatial models. This paper reports the application of a novel spatial method that accounts for all types of continuous field variation in a single modelling step by fitting a smooth surface. The method uses two-dimensional P-splines with anisotropic smoothing formulated in the mixed model framework, referred to as SpATS model. We applied this methodology to a series of large and partially replicated sorghum breeding trials. The new model was assessed in comparison with the more elaborate standard spatial models that use autoregressive correlation of residuals. The improvements in precision and the predictions of genotypic values produced by the SpATS model were equivalent to those obtained using the best fitting standard spatial models for each trial. One advantage of the approach with SpATS is that all patterns of spatial trend and genetic effects were modelled simultaneously by fitting a single model. Furthermore, we used a flexible model to adequately adjust for field trends. This strategy reduces potential parameter identification problems and simplifies the model selection process. Therefore, the new method should be considered as an efficient and easy-to-use alternative for routine analyses of plant breeding trials.

...read moreread less

81 citations

Journal Article•DOI•

Differential expression analysis for RNAseq using Poisson mixed models.

[...]

Shiquan Sun¹, Shiquan Sun², Michelle M Hood², Laura J. Scott², Qinke Peng¹, Sayan Mukherjee³, Jenny Tung³, Xiang Zhou² - Show less +4 more•Institutions (3)

Xi'an Jiaotong University¹, University of Michigan², Duke University³

20 Jun 2017-Nucleic Acids Research

TL;DR: A Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample non-independence is presented and a scalable sampling-based inference algorithm using a latent variable representation of the Poisson distribution is developed.

...read moreread less

Abstract: Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample non-independence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample non-independence. We also develop a scalable sampling-based inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes). We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html.

...read moreread less

Journal Article•DOI•

A multiphase non-linear mixed effects model: An application to spirometry after lung transplantation.

[...]

Jeevanantham Rajeswaran¹, Eugene H. Blackstone¹•Institutions (1)

Cleveland Clinic¹

01 Feb 2017-Statistical Methods in Medical Research

TL;DR: A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase.

...read moreread less

Abstract: In medical sciences, we often encounter longitudinal temporal relationships that are non-linear in nature. The influence of risk factors may also change across longitudinal follow-up. A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase. Application of this model is illustrated using spirometry data after lung transplantation using readily available statistical software. This application illustrates the usefulness of our flexible model when dealing with complex non-linear patterns and time-varying coefficients.

...read moreread less

Journal Article•DOI•

Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models.

[...]

Machelle D. Wilson¹, Sunjay Sethi¹, Pamela J. Lein¹, Kimberly P. Keil¹•Institutions (1)

University of California, Davis¹

01 Mar 2017-Journal of Neuroscience Methods

TL;DR: Mixed effects models avoid faulty inference in Sholl analysis of data sampled from multiple neurons per animal by accounting for intra-class correlation, which leads to correct inference.

...read moreread less

Journal Article•DOI•

Developing approaches for linear mixed modeling in landscape genetics through landscape-directed dispersal simulations.

[...]

Jeffrey R. Row¹, Steven T. Knick², Sara J. Oyler-McCance², Stephen C. Lougheed³, Bradley C. Fedy¹ - Show less +1 more•Institutions (3)

University of Waterloo¹, United States Geological Survey², Queen's University³

01 Jun 2017-Ecology and Evolution

TL;DR: This study develops landscape‐directed simulations and test a series of replicates that emulate independent empirical datasets of two species with different life history characteristics, and helps establish methods for using linear mixed models to identify the features underlying patterns of dispersal across a variety of landscapes.

...read moreread less

Abstract: Dispersal can impact population dynamics and geographic variation, and thus, genetic approaches that can establish which landscape factors influence population connectivity have ecological and evolutionary importance. Mixed models that account for the error structure of pairwise datasets are increasingly used to compare models relating genetic differentiation to pairwise measures of landscape resistance. A model selection framework based on information criteria metrics or explained variance may help disentangle the ecological and landscape factors influencing genetic structure, yet there are currently no consensus for the best protocols. Here, we develop landscape-directed simulations and test a series of replicates that emulate independent empirical datasets of two species with different life history characteristics (greater sage-grouse; eastern foxsnake). We determined that in our simulated scenarios, AIC and BIC were the best model selection indices and that marginal R2 values were biased toward more complex models. The model coefficients for landscape variables generally reflected the underlying dispersal model with confidence intervals that did not overlap with zero across the entire model set. When we controlled for geographic distance, variables not in the underlying dispersal models (i.e., nontrue) typically overlapped zero. Our study helps establish methods for using linear mixed models to identify the features underlying patterns of dispersal across a variety of landscapes.

...read moreread less

Journal Article•DOI•

Bias and inference from misspecified mixed-effect models in stepped wedge trial analysis.

[...]

Jennifer Thompson¹, Katherine Fielding¹, Calum Davey¹, Alex Aiken¹, James R Hargreaves¹, Richard J. Hayes¹ - Show less +2 more•Institutions (1)

University of London¹

15 Oct 2017-Statistics in Medicine

TL;DR: In the SWTs simulated here, mixed‐effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within‐cluster comparisons.

...read moreread less

Abstract: Many stepped wedge trials (SWTs) are analysed by using a mixed-effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common-to-all or varied-between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within-cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within-cluster comparisons in the standard model. In the SWTs simulated here, mixed-effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within-cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

...read moreread less

Journal Article•DOI•

Evaluating a Bayesian modelling approach (INLA-SPDE) for environmental mapping.

[...]

Jingyi Huang¹, Brendan P. Malone¹, Budiman Minasny¹, Alex B. McBratney¹, John Triantafilis² - Show less +1 more•Institutions (2)

University of Sydney¹, University of New South Wales²

31 Dec 2017-Science of The Total Environment

TL;DR: It was concluded that INLA-SPDE had the potential to map the spatial distribution of environmental variables along with their posterior marginal distributions for environmental management and some drawbacks were identified, including artefacts of model response due to the use of triangle meshes and a longer computational time when dealing with non-Gaussian likelihood families.

...read moreread less

Journal Article•

A Bayesian Mixed-Effects Model to Learn Trajectories of Changes from Repeated Manifold-Valued Observations

[...]

Jean-Baptiste Schiratti¹, Stéphanie Allassonnière, Olivier Colliot, Stanley Durrleman•Institutions (1)

École Polytechnique¹

01 Dec 2017-Journal of Machine Learning Research

TL;DR: A generic Bayesian mixed-effects model to estimate the temporal progression of a biological phenomenon from observations obtained at multiple time points for a group of individuals and shows that the estimated spatiotemporal transformations effectively put into correspondence significant events in the progression of individuals.

...read moreread less

Abstract: We propose a generic Bayesian mixed-effects model to estimate the temporal progression of a biological phenomenon from observations obtained at multiple time points for a group of individuals. The progression is modeled by continuous trajectories in the space of measurements. Individual trajectories of progression result from spatiotemporal transformations of an average trajectory. These transformations allow to quantify the changes in direction and pace at which the trajectories are followed. The framework of Rieman-nian geometry allows the model to be used with any kind of measurements with smooth constraints. A stochastic version of the Expectation-Maximization algorithm is used to produce produce maximum a posteriori estimates of the parameters. We evaluate our method using series of neuropsychological test scores from patients with mild cognitive impairments later diagnosed with Alzheimer's disease, and simulated evolutions of symmetric positive definite matrices. The data-driven model of the impairment of cognitive functions shows the variability in the ordering and timing of the decline of these functions in the population. We show also that the estimated spatiotemporal transformations effectively put into correspondence significant events in the progression of individuals.

...read moreread less

Journal Article•DOI•

Analysis of categorical moderators in mixed-effects meta-analysis: Consequences of using pooled versus separate estimates of the residual between-studies variances.

[...]

María Rubio-Aparicio¹, Julio Sánchez-Meca¹, José A López-López², Juan Botella³, Fulgencio Marín-Martínez¹ - Show less +1 more•Institutions (3)

University of Murcia¹, University of Bristol², Autonomous University of Madrid³

06 Feb 2017-British Journal of Mathematical and Statistical Psychology

TL;DR: The results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories, unless the residual between-studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates.

...read moreread less

Abstract: Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta-analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between-studies variance on the statistical performance of the Q B(P) and Q B(S) tests for subgroup analyses assuming a mixed-effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between-studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between-studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates.

...read moreread less

Journal Article•DOI•

Joint Selection in Mixed Models using Regularized PQL

[...]

Francis K. C. Hui¹, Samuel Müller², Alan H. Welsh¹•Institutions (2)

Australian National University¹, University of Sydney²

13 Jun 2017-Journal of the American Statistical Association

TL;DR: Simulations demonstrate regularized PQL outperforms several currently employed methods for joint selection even if the cluster size is small compared to the number of clusters, while also offering dramatic reductions in computation time.

...read moreread less

Abstract: The application of generalized linear mixed models presents some major challenges for both estimation, due to the intractable marginal likelihood, and model selection, as we usually want to jointly select over both fixed and random effects. We propose to overcome these challenges by combining penalized quasi-likelihood (PQL) estimation with sparsity inducing penalties on the fixed and random coefficients. The resulting approach, referred to as regularized PQL, is a computationally efficient method for performing joint selection in mixed models. A key aspect of regularized PQL involves the use of a group based penalty for the random effects: sparsity is induced such that all the coefficients for a random effect are shrunk to zero simultaneously, which in turn leads to the random effect being removed from the model. Despite being a quasi-likelihood approach, we show that regularized PQL is selection consistent, that is, it asymptotically selects the true set of fixed and random effects, in the setti...

...read moreread less

Journal Article•DOI•

A quantile parametric mixed regression model for bounded response variables

[...]

Cristian L. Bayes¹, Jorge Luis Bazán², Mário de Castro²•Institutions (2)

Pontifical Catholic University of Peru¹, Spanish National Research Council²

01 Jan 2017-Statistics and Its Interface

TL;DR: In this paper, a quantile parametric mixed regression model for bounded response variables is presented by considering the distribution introduced by [27] and a Bayesian approach is adopted for inference using Markov Chain Monte Carlo (MCMC) methods.

...read moreread less

Abstract: Bounded response variables are common in many applications where the responses are percentages, proportions, or rates. New regression models have been proposed recently to model the relationship among one or more covariates and the conditional mean of a response variable based on the beta distribution or a mixture of beta distributions. However, when we are interested in knowing how covariates impact different levels of the response variable, quantile regression models play an important role. A new quantile parametric mixed regression model for bounded response variables is presented by considering the distribution introduced by [27]. A Bayesian approach is adopted for inference using Markov Chain Monte Carlo (MCMC) methods. Model comparison criteria are also discussed. The inferential methods can be easily programmed and then easily used for data modeling. Results from a simulation study are reported showing the good performance of the proposed inferential methods. Furthermore, results from data analyses using regression models with fixed and mixed effects are given. Specifically, we show that the quantile parametric model proposed here is an alternative and complementary modeling tool for bounded response variables such as the poverty index in Brazilian municipalities, which is linked to the Gini coefficient and the human development index.

...read moreread less

Journal Article•DOI•

Diagnosing misspecification of the random-effects distribution in mixed models.

[...]

Reza Drikvandi¹, Reza Drikvandi², Geert Verbeke², Geert Verbeke³, Geert Molenberghs³, Geert Molenberghs² - Show less +2 more•Institutions (3)

Imperial College London¹, Katholieke Universiteit Leuven², University of Hasselt³

01 Mar 2017-Biometrics

TL;DR: A novel diagnostic test based on the so-called gradient function proposed by Verbeke and Molenberghs (2013) is introduced to assess the random-effects distribution and can be used to check the adequacy of any distribution for random effects in a wide class of mixed models.

...read moreread less

Abstract: It is traditionally assumed that the random effects in mixed models follow a multivariate normal distribution, making likelihood-based inferences more feasible theoretically and computationally. However, this assumption does not necessarily hold in practice which may lead to biased and unreliable results. We introduce a novel diagnostic test based on the so-called gradient function proposed by Verbeke and Molenberghs (2013) to assess the random-effects distribution. We establish asymptotic properties of our test and show that, under a correctly specified model, the proposed test statistic converges to a weighted sum of independent chi-squared random variables each with one degree of freedom. The weights, which are eigenvalues of a square matrix, can be easily calculated. We also develop a parametric bootstrap algorithm for small samples. Our strategy can be used to check the adequacy of any distribution for random effects in a wide class of mixed models, including linear mixed models, generalized linear mixed models, and non-linear mixed models, with univariate as well as multivariate random effects. Both asymptotic and bootstrap proposals are evaluated via simulations and a real data analysis of a randomized multicenter study on toenail dermatophyte onychomycosis.

...read moreread less

Journal Article•DOI•

Poisson mixed models for studying the poverty in small areas

[...]

Miguel Boubeta¹, María José Lombardía¹, Domingo Morales²•Institutions (2)

University of A Coruña¹, Universidad Miguel Hernández de Elche²

01 Mar 2017-Computational Statistics & Data Analysis

TL;DR: The developed methodology is applied to estimate the proportion of people under the poverty line by counties and sex in Galicia (a region in north-west of Spain).

...read moreread less

Journal Article•DOI•

Assessing variation in life-history tactics within a population using mixture regression models: a practical guide for evolutionary ecologists.

[...]

Sandra Hamel¹, Nigel G. Yoccoz¹, Jean-Michel Gaillard²•Institutions (2)

University of Tromsø¹, University of Lyon²

01 May 2017-Biological Reviews

TL;DR: The aim is to demonstrate the value of using mixture models to describe variation in individual life‐history tactics within a population, and to promote the use of these models by ecologists and evolutionary ecologists.

...read moreread less

Abstract: Mixed models are now well-established methods in ecology and evolution because they allow accounting for and quantifying within- and between-individual variation. However, the required normal distribution of the random effects can often be violated by the presence of clusters among subjects, which leads to multi-modal distributions. In such cases, using what is known as mixture regression models might offer a more appropriate approach. These models are widely used in psychology, sociology, and medicine to describe the diversity of trajectories occurring within a population over time (e.g. psychological development, growth). In ecology and evolution, however, these models are seldom used even though understanding changes in individual trajectories is an active area of research in life-history studies. Our aim is to demonstrate the value of using mixture models to describe variation in individual life-history tactics within a population, and hence to promote the use of these models by ecologists and evolutionary ecologists. We first ran a set of simulations to determine whether and when a mixture model allows teasing apart latent clustering, and to contrast the precision and accuracy of estimates obtained from mixture models versus mixed models under a wide range of ecological contexts. We then used empirical data from long-term studies of large mammals to illustrate the potential of using mixture models for assessing within-population variation in life-history tactics. Mixture models performed well in most cases, except for variables following a Bernoulli distribution and when sample size was small. The four selection criteria we evaluated [Akaike information criterion (AIC), Bayesian information criterion (BIC), and two bootstrap methods] performed similarly well, selecting the right number of clusters in most ecological situations. We then showed that the normality of random effects implicitly assumed by evolutionary ecologists when using mixed models was often violated in life-history data. Mixed models were quite robust to this violation in the sense that fixed effects were unbiased at the population level. However, fixed effects at the cluster level and random effects were better estimated using mixture models. Our empirical analyses demonstrated that using mixture models facilitates the identification of the diversity of growth and reproductive tactics occurring within a population. Therefore, using this modelling framework allows testing for the presence of clusters and, when clusters occur, provides reliable estimates of fixed and random effects for each cluster of the population. In the presence or expectation of clusters, using mixture models offers a suitable extension of mixed models, particularly when evolutionary ecologists aim at identifying how ecological and evolutionary processes change within a population. Mixture regression models therefore provide a valuable addition to the statistical toolbox of evolutionary ecologists. As these models are complex and have their own limitations, we provide recommendations to guide future users.

...read moreread less

Journal Article•DOI•

Mixed models, linear dependency, and identification in age-period-cohort models

[...]

Robert M. O'Brien¹•Institutions (1)

University of Oregon¹

20 Jul 2017-Statistics in Medicine

TL;DR: This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations and shows how statistical model identification comes about in mixed models and why which effects aretreated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects.

...read moreread less

Abstract: This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging

[...]

Hyunwoo Kim¹, Nagesh Adluru¹, Heemanshu Suri¹, Baba C. Vemuri², Sterling C. Johnson¹, Vikas Singh¹ - Show less +2 more•Institutions (2)

University of Wisconsin-Madison¹, University of Florida²

01 Jul 2017

TL;DR: This paper generalizes non-linear mixed effects model to the regime where the response variable is manifold-valued, i.e., f: Rd → M: Rd; and demonstrates the immediate benefits such a model can provide and derive the underlying model and estimation schemes and demonstrate the direct consequence of the results.

...read moreread less

Abstract: Statistical machine learning models that operate on manifold-valued data are being extensively studied in vision, motivated by applications in activity recognition, feature tracking and medical imaging. While non-parametric methods have been relatively well studied in the literature, efficient formulations for parametric models (which may offer benefits in small sample size regimes) have only emerged recently. So far, manifold-valued regression models (such as geodesic regression) are restricted to the analysis of cross-sectional data, i.e., the so-called fixed effects in statistics. But in most longitudinal analysis (e.g., when a participant provides multiple measurements, over time) the application of fixed effects models is problematic. In an effort to answer this need, this paper generalizes non-linear mixed effects model to the regime where the response variable is manifold-valued, i.e., f: Rd → M. We derive the underlying model and estimation schemes and demonstrate the immediate benefits such a model can provide — both for group level and individual level analysis — on longitudinal brain imaging data. The direct consequence of our results is that longitudinal analysis of manifold-valued measurements (especially, the symmetric positive definite manifold) can be conducted in a computationally tractable manner.

...read moreread less

Journal Article•DOI•

Biomass Modeling of Larch (Larix spp.) Plantations in China Based on the Mixed Model, Dummy Variable Model, and Bayesian Hierarchical Model

[...]

Dongsheng Chen, Xingzhao Huang, Shougong Zhang, Xiaomei Sun

27 Jul 2017-Forests

TL;DR: The stem wood, branches, stem bark, needles, roots and total biomass models for larch were developed at the regional level, using a general allometric equation, a dummy variable model, a mixed effects model, and a Bayesian hierarchical model to select the most effective method for predicting large-scale forest biomass.

...read moreread less

Abstract: With the development of national-scale forest biomass monitoring work, accurate estimation of forest biomass on a large scale is becoming an important research topic in forestry. In this study, the stem wood, branches, stem bark, needles, roots and total biomass models for larch were developed at the regional level, using a general allometric equation, a dummy variable model, a mixed effects model, and a Bayesian hierarchical model, to select the most effective method for predicting large-scale forest biomass. Results showed total biomass of trees with the same diameter gradually decreased from southern to northern regions in China, except in the Hebei province. We found that the stem wood, branch, stem bark, needle, root, and total biomass model relationships were statistically significant (p-values < 0.01) for the general allometric equation, linear mixed model, dummy variable model, and Bayesian hierarchical model, but the linear mixed, dummy variable, and Bayesian hierarchical models showed better performance than the general allometric equation. An F-test also showed significant differences between the models. The R2 average values of the linear mixed model, dummy variable model, and Bayesian hierarchical model were higher than those of the general allometric equation by 0.007, 0.018, 0.015, 0.004, 0.09, and 0.117 for the total tree, root, stem wood, stem bark, branch, and needle models respectively. However, there were no significant differences between the linear mixed model, dummy variable model, and Bayesian hierarchical model. When the number of categories was increased, the linear mixed model and Bayesian hierarchical model were more flexible and applicable than the dummy variable model for the construction of regional biomass models.

...read moreread less

Journal Article•DOI•

Mixed generalized Akaike information criterion for small area models

[...]

María José Lombardía, Esther López-Vizcaíno, Cristina Rueda¹•Institutions (1)

University of Valladolid¹

21 Jun 2017-Journal of The Royal Statistical Society Series A-statistics in Society

TL;DR: In this paper, a mixed generalized Akaike information criterion xGAIC is introduced and validated, derived from a quasi-log-likelihood that focuses on the random effect and the variability between the areas, and from a generalized degree-of-freedom measure, as a model complexity penalty, which is calculated by the bootstrap.

...read moreread less

Abstract: Summary A mixed generalized Akaike information criterion xGAIC is introduced and validated. It is derived from a quasi-log-likelihood that focuses on the random effect and the variability between the areas, and from a generalized degree-of-freedom measure, as a model complexity penalty, which is calculated by the bootstrap. To study the performance of xGAIC, we consider three popular mixed models in small area inference: a Fay–Herriot model, a monotone model and a penalized spline model. A simulation study shows the good performance of xGAIC. Besides, we show its relevance in practice, with two real applications: the estimation of employed people by economic activity and the prevalence of smokers in Galician counties. In the second case, where it is unclear which explanatory variables should be included in the model, the problem of selection between these explanatory variables is solved simultaneously with the problem of the specification of the functional form between the linear, monotone or spline options.

...read moreread less

Journal Article•DOI•

An evaluation of ridge estimator in linear mixed models: an example from kidney failure data

[...]

M. Revan Özkale¹, Funda Can²•Institutions (2)

Çukurova University¹, American Public Health Association²

10 Sep 2017-Journal of Applied Statistics

TL;DR: In this paper, a penalized likelihood method is proposed to estimate the ridge estimation of fixed and random effects in the context of Henderson's mixed model equations in the linear mixed model.

...read moreread less

Abstract: This paper is concerned with the ridge estimation of fixed and random effects in the context of Henderson's mixed model equations in the linear mixed model. For this purpose, a penalized likelihood method is proposed. A linear combination of ridge estimator for fixed and random effects is compared to a linear combination of best linear unbiased estimator for fixed and random effects under the mean-square error (MSE) matrix criterion. Additionally, for choosing the biasing parameter, a method of MSE under the ridge estimator is given. A real data analysis is provided to illustrate the theoretical results and a simulation study is conducted to characterize the performance of ridge and best linear unbiased estimators approach in the linear mixed model.

...read moreread less

Journal Article•DOI•

Modelling Clustered Heterogeneity: Fixed Effects, Random Effects and Mixtures

[...]

Gerhard Tutz¹, Margret-Ruth Oelker¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Aug 2017-International Statistical Review

TL;DR: In this paper, the structural assumption made here is that there are clusters of units that share the same effects and it is shown how clusters can be identified by tailored regularized estimators, even if the latter is the data generating model.

...read moreread less

Abstract: Although each statistical unit on which measurements are taken is unique, typically there is not enough information available to account totally for its uniqueness. Therefore heterogeneity among units has to be limited by structural assumptions. One classical approach is to use random effects models which assume that heterogeneity can be described by distributional assumptions. However, inference may depend on the assumed mixing distribution and it is assumed that the random effects and the observed covariates are independent. An alternative considered here, are fixed effect models, which let each unit have its own parameter. They are quite flexible but suffer from the large number of parameters. The structural assumption made here is that there are clusters of units that share the same effects. It is shown how clusters can be identified by tailored regularized estimators. Moreover, it is shown that the regularized estimates compete well with estimates for the random effects model, even if the latter is the data generating model. They dominate if clusters are present.

...read moreread less

Journal Article•DOI•

Hierarchical selection of fixed and random effects in generalized linear mixed models

[...]

Francis K. C. Hui¹, Samuel Mueller², Alan H. Welsh¹•Institutions (2)

Australian National University¹, University of Sydney²

01 Apr 2017-Statistica Sinica

TL;DR: The authors proposed a regularization method that can deal with large numbers of candidate generalized linear mixed models (GLMMs) while preserving a hierarchical structure in the effects that needs to be taken into account when performing variable selection.

...read moreread less

Abstract: In many applications of generalized linear mixed models (GLMMs), there is a hierarchical structure in the effects that needs to be taken into account when performing variable selection. A prime example of this is when fitting mixed models to longitudinal data, where it is usual for covariates to be included as only fixed effects or as composite (fixed and random) effects. In this article, we propose the first regularization method that can deal with large numbers of candidate GLMMs while preserving this hierarchical structure: CREPE (Composite Random Effects PEnalty) for joint selection in mixed models. CREPE induces sparsity in a hierarchical manner, as the fixed effect for a covariate is shrunk to zero only if the corresponding random effect is or has already been shrunk to zero. In the setting where the number of fixed effects grow at a slower rate than the number of clusters, we show that CREPE is selection consistent for both fixed and random effects, and attains the oracle property. Simulations show that CREPE outperforms some currently available penalized methods for mixed models.

...read moreread less