Showing papers on "Mixed model published in 2015"

PDF

Open Access

Journal Article•DOI•

Efficient Bayesian mixed-model analysis increases association power in large cohorts

[...]

Po-Ru Loh¹, George Tucker¹, Brendan Bulik-Sullivan¹, Bjarni J. Vilhjálmsson², Bjarni J. Vilhjálmsson¹, Hilary K. Finucane³, Rany M. Salem⁴, Daniel I. Chasman⁵, Paul M. Ridker⁵, Benjamin M. Neale¹, Benjamin M. Neale², Bonnie Berger³, Nick Patterson², Alkes L. Price¹ - Show less +10 more•Institutions (5)

Harvard University¹, Broad Institute², Massachusetts Institute of Technology³, Boston Children's Hospital⁴, Brigham and Women's Hospital⁵

01 Mar 2015-Nature Genetics

TL;DR: BOLT-LMM is presented, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes.

...read moreread less

Abstract: Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

...read moreread less

1,232 citations

Efficient Bayesian mixed-model analysis increases association power in large cohorts

[...]

Po-Ru Loh, Brendan Bulik-Sullivan, Bjarni J. Vilhjálmsson, Rany M. Salem, Daniel I. Chasman, Paul M. Ridker, Benjamin M. Neale, Nick Patterson, Alkes L. Price, George Tucker, Hilary K. Finucane, Bonnie Berger Leighton - Show less +8 more

01 Mar 2015

TL;DR: BOLT-LMM is presented, which requires only a small number of O(MN) iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes.

...read moreread less

Abstract: Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts, and may not optimize power. All existing methods require time cost O(MN2) (where N = #samples and M = #SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here, we present a far more efficient mixed model association method, BOLT-LMM, which requires only a small number of O(MN) iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to nine quantitative traits in 23,294 samples from the Women’s Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for GWAS in large cohorts.

...read moreread less

322 citations

Journal Article•DOI•

Estimation of extended mixed models using latent classes and latent processes: the R package lcmm

[...]

Cécile Proust-Lima¹, Viviane Philipps, Benoit Liquet•Institutions (1)

University of Bordeaux¹

03 Mar 2015-arXiv: Computation

TL;DR: The R package lcmm as mentioned in this paper provides a series of functions to estimate statistical models based on linear mixed model theory, including the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes.

...read moreread less

Abstract: The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.

...read moreread less

229 citations

Journal Article•DOI•

Functional Additive Mixed Models

[...]

Fabian Scheipl¹, Ana-Maria Staicu², Sonja Greven¹•Institutions (2)

Ludwig Maximilian University of Munich¹, North Carolina State University²

01 Apr 2015-Journal of Computational and Graphical Statistics

TL;DR: An extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data is proposed.

...read moreread less

Abstract: We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.

...read moreread less

210 citations

Journal Article•DOI•

Marker-based estimation of heritability in immortal populations.

[...]

Willem Kruijer¹, Martin P. Boer¹, Marcos Malosetti¹, Pádraic J. Flood¹, Bas Engel¹, Rik Kooke¹, Joost J. B. Keurentjes², Joost J. B. Keurentjes¹, Fred A. van Eeuwijk¹ - Show less +5 more•Institutions (2)

Wageningen University and Research Centre¹, University of Amsterdam²

01 Feb 2015-Genetics

TL;DR: Mixed models at the individual plant or plot level produced more realistic heritability estimates, and for simulated traits standard errors were up to 13 times smaller, and genomic prediction was improved by using these mixed models, with up to a 49% increase in accuracy.

...read moreread less

Abstract: Heritability is a central parameter in quantitative genetics, from both an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within- and between-genotype variability. This approach estimates broad-sense heritability and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker-based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here is to use mixed models at the individual plant or plot level. Using statistical arguments, simulations, and real data we investigate the feasibility of both approaches and how these affect genomic prediction with the best linear unbiased predictor and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at the individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For genome-wide association studies on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.

...read moreread less

160 citations

Journal Article•DOI•

Do more detailed environmental covariates deliver more accurate soil maps

[...]

Alessandro Samuel-Rosa¹, Gerard B. M. Heuvelink, Gustavo Vasques², Lúcia Helena Cunha dos Anjos•Institutions (2)

Coordenadoria de Aperfeiçoamento de Pessoal de Nível Superior¹, Empresa Brasileira de Pesquisa Agropecuária²

01 Apr 2015-Geoderma

TL;DR: In this article, a case study from Southern Brazil to map clay content (CLAY), organic carbon content (SOC), and effective cation exchange capacity (ECEC) of the topsoil for a ~ 2000 ha area located on the edge of the plateau of the Parana Sedimentary Basin.

...read moreread less

100 citations

Journal Article•DOI•

Automated mixed ANOVA modeling of sensory and consumer data

[...]

Alexandra Kuznetsova, Rune Haubo Bojesen Christensen, Cécile Bavay, Per B. Brockhoff

01 Mar 2015-Food Quality and Preference

TL;DR: An approach for automated mixed ANOVA/ANCOVA modeling together with the open source R package lmerTest developed by the authors that can perform automated complex mixed-effects modeling is introduced.

...read moreread less

93 citations

Journal Article•DOI•

A New Generalized Heterogeneous Data Model (GHDM) to Jointly Model Mixed Types of Dependent Variables

[...]

Chandra R. Bhat¹, Chandra R. Bhat²•Institutions (2)

King Abdulaziz University¹, University of Texas at Austin²

01 Sep 2015-Transportation Research Part B-methodological

TL;DR: In this paper, a generalized heterogeneous data model (GHDM) is proposed to jointly handle mixed types of dependent variables by representing the covariance relationships among them through a reduced number of latent factors.

...read moreread less

Abstract: This paper formulates a generalized heterogeneous data model (GHDM) that jointly handles mixed types of dependent variables—including multiple nominal outcomes, multiple ordinal variables, and multiple count variables, as well as multiple continuous variables—by representing the covariance relationships among them through a reduced number of latent factors. Sufficiency conditions for identification of the GHDM parameters are presented. The maximum approximate composite marginal likelihood (MACML) method is proposed to estimate this jointly mixed model system. This estimation method provides computational time advantages since the dimensionality of integration in the likelihood function is independent of the number of latent factors. The study undertakes a simulation experiment within the virtual context of integrating residential location choice and travel behavior to evaluate the ability of the MACML approach to recover parameters. The simulation results show that the MACML approach effectively recovers underlying parameters, and also that ignoring the multi-dimensional nature of the relationship among mixed types of dependent variables can lead not only to inconsistent parameter estimation, but also have important implications for policy analysis.

...read moreread less

81 citations

Journal Article•DOI•

Analysis of risk factors associated with renal function trajectory over time: a comparison of different statistical approaches

[...]

Karen Leffondré¹, Julie Boucquemont¹, Giovanni Tripepi, Vianda S. Stel, Georg Heinze², Daniela Dunkler² - Show less +2 more•Institutions (2)

University of Bordeaux¹, Medical University of Vienna²

01 Aug 2015-Nephrology Dialysis Transplantation

TL;DR: This study illustrates that the linear mixed model is the preferred method to investigate risk factors associated with renal function trajectories in studies, where patients may dropout during the study period because of initiation of renal replacement therapy.

...read moreread less

Abstract: Background. The most commonly used methods to investigate risk factors associated with renal function trajectory over time include linear regression on individual glomerular filtration rate (GFR) slopes, linear mixed models and generalized estimating equations (GEEs). The objective of this study was to explain the principles of these three methods and to discuss their advantages and limitations in particular when renal function trajectories are not completely observable due to dropout. Methods. We generated data from a hypothetical cohort of 200 patients with chronic kidney disease at inclusion and seven subsequent annual measurements of GFR. The data were generated such that both baseline level and slope of GFR over time were associated with baseline albuminuria status. In a second version of the dataset, we assumed that patients systematically dropped out after a GFR measurement of <15 mL/min/1.73 m 2 .E ach dataset was analysed with the three methods. Results. The estimated effects of baseline albuminuria status on GFR slope were similar among the three methods when no patient dropped out. When 32.7% dropped out, standard GEE provided biased estimates of the mean GFR slope in normo-, micro- and macroalbuminuric patients. Linear regression on individual slopes and linear mixed models provided slope estimates of the same magnitude, likely because most patients had at least three GFR measurements. However, the linear mixed model was the only method to provide effect estimates on both slope and baseline level of GFR unaffected by dropout. Conclusion. This study illustrates that the linear mixed model is the preferred method to investigate risk factors associated with renal function trajectories in studies, where patients may dropout during the study period because of initiation of renal replacement therapy.

...read moreread less

79 citations

Journal Article•DOI•

Bayesian function‐on‐function regression for multilevel functional data

[...]

Mark J. Meyer¹, Brent A. Coull², Francesco Versace³, Paul M. Cinciripini³, Jeffrey S. Morris³ - Show less +1 more•Institutions (3)

Bucknell University¹, Harvard University², University of Texas MD Anderson Cancer Center³

01 Sep 2015-Biometrics

TL;DR: This work proposes a general function‐on‐function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing.

...read moreread less

Abstract: over a grid|is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a ne grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images.

...read moreread less

74 citations

Journal Article•DOI•

Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm

[...]

María Xosé Rodríguez-Álvarez¹, Dae-Jin Lee², Thomas Kneib³, María Durbán⁴, Paul H. C. Eilers⁵ - Show less +1 more•Institutions (5)

University of Vigo¹, Basque Center for Applied Mathematics², University of Göttingen³, Charles III University of Madrid⁴, Erasmus University Rotterdam⁵

01 Sep 2015-Statistics and Computing

TL;DR: The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)—for variance components estimation—to deal with non-standard structures of the covariance matrix of the random effects.

...read moreread less

Abstract: A new computational algorithm for estimating the smoothing parameters of a multidimensional penalized spline generalized linear model with anisotropic penalty is presented. This new proposal is based on the mixed model representation of a multidimensional P-spline, in which the smoothing parameter for each covariate is expressed in terms of variance components. On the basis of penalized quasi-likelihood methods, closed-form expressions for the estimates of the variance components are obtained. This formulation leads to an efficient implementation that considerably reduces the computational burden. The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)--for variance components estimation--to deal with non-standard structures of the covariance matrix of the random effects. The practical performance of the proposed algorithm is evaluated by means of simulations, and comparisons with alternative methods are made on the basis of the mean square error criterion and the computing time. Finally, we illustrate our proposal with the analysis of two real datasets: a two dimensional example of historical records of monthly precipitation data in USA and a three dimensional one of mortality data from respiratory disease according to the age at death, the year of death and the month of death.

...read moreread less

Journal Article•DOI•

Small area estimation of labour force indicators under a multinomial model with correlated time and area effects

[...]

Esther López-Vizcaíno, María José Lombardía, Domingo Morales¹•Institutions (1)

Universidad Miguel Hernández de Elche¹

01 Jun 2015-Journal of The Royal Statistical Society Series A-statistics in Society

TL;DR: In this paper, the estimation of small area labour force indicators like total employed and unemployed people and unemployment rates is derived from four multinomial logit mixed models, including a model with correlated time and area random effects.

...read moreread less

Abstract: Summary The aim of the paper is the estimation of small area labour force indicators like totals of employed and unemployed people and unemployment rates. Small area estimators of these quantities are derived from four multinomial logit mixed models, including a model with correlated time and area random effects. Mean-squared errors are used to measure the accuracy of the estimators proposed and they are estimated by analytic and bootstrap methods. The methodology introduced is applied to real data from the Spanish Labour Force Survey of Galicia.

...read moreread less

Posted Content•DOI•

MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information

[...]

Sang Hong Lee¹, Julius H. J. van der Werf¹•Institutions (1)

University of New England (United States)¹

18 Sep 2015-bioRxiv

TL;DR: An algorithm for genetic analysis of complex traits using genome-wide SNPs in a linear mixed model framework that could be more than 1000 times faster than current standard REML software based on the mixed model equation.

...read moreread less

Abstract: We have developed an algorithm for genetic analysis of complex traits using genome-wide SNPs in a linear mixed model framework. Compared to current standard REML software, our method could be more than 1000 times faster. The advantage is largest when there is only a single genetic covariance structure. The method is particularly useful for multivariate analysis, including random regression models for studying reaction norms. We applied our proposed method to publicly available mice and human data and discuss advantages and limitations.

...read moreread less

Journal Article•DOI•

Mixed Effects Modeling Using Stochastic Differential Equations: Illustrated by Pharmacokinetic Data of Nicotinic Acid in Obese Zucker Rats

[...]

Jacob Leander¹, Joachim Almquist¹, Christine Ahlström², Johan Gabrielsson³, Mats Jirstrand - Show less +1 more•Institutions (3)

Chalmers University of Technology¹, AstraZeneca², Swedish University of Agricultural Sciences³

19 Feb 2015-Aaps Journal

TL;DR: Stochastic differential mixed effects models are useful tools for identifying incomplete or inaccurate model dynamics and for reducing potential bias in parameter estimates due to such model deficiencies.

...read moreread less

Abstract: Inclusion of stochastic differential equations in mixed effects models provides means to quantify and distinguish three sources of variability in data. In addition to the two commonly encountered sources, measurement error and interindividual variability, we also consider uncertainty in the dynamical model itself. To this end, we extend the ordinary differential equation setting used in nonlinear mixed effects models to include stochastic differential equations. The approximate population likelihood is derived using the first-order conditional estimation with interaction method and extended Kalman filtering. To illustrate the application of the stochastic differential mixed effects model, two pharmacokinetic models are considered. First, we use a stochastic one-compartmental model with first-order input and nonlinear elimination to generate synthetic data in a simulated study. We show that by using the proposed method, the three sources of variability can be successfully separated. If the stochastic part is neglected, the parameter estimates become biased, and the measurement error variance is significantly overestimated. Second, we consider an extension to a stochastic pharmacokinetic model in a preclinical study of nicotinic acid kinetics in obese Zucker rats. The parameter estimates are compared between a deterministic and a stochastic NiAc disposition model, respectively. Discrepancies between model predictions and observations, previously described as measurement noise only, are now separated into a comparatively lower level of measurement noise and a significant uncertainty in model dynamics. These examples demonstrate that stochastic differential mixed effects models are useful tools for identifying incomplete or inaccurate model dynamics and for reducing potential bias in parameter estimates due to such model deficiencies.

...read moreread less

Journal Article•DOI•

A mixed effect model for bivariate meta-analysis of diagnostic test accuracy studies using a copula representation of the random effects distribution

[...]

Aristidis K. Nikoloulopoulos¹•Institutions (1)

University of East Anglia¹

26 Feb 2015-arXiv: Methodology

TL;DR: In this paper, a copula mixed model is proposed for bivariate meta-analysis of diagnostic test accuracy studies, which includes the generalized linear mixed model as a special case and can also operate on the original scale of sensitivity and specificity.

...read moreread less

Abstract: Diagnostic test accuracy studies typically report the number of true positives, false positives, true negatives and false negatives. There usually exists a negative association between the number of true positives and true negatives, because studies that adopt less stringent criterion for declaring a test positive invoke higher sensitivities and lower specificities. A generalized linear mixed model (GLMM) is currently recommended to synthesize diagnostic test accuracy studies. We propose a copula mixed model for bivariate meta-analysis of diagnostic test accuracy studies. Our general model includes the GLMM as a special case and can also operate on the original scale of sensitivity and specificity. Summary receiver operating characteristic curves are deduced for the proposed model through quantile regression techniques and different characterizations of the bivariate random effects distribution. Our general methodology is demonstrated with an extensive simulation study and illustrated by re-analysing the data of two published meta-analyses. Our study suggests that there can be an improvement on GLMM in fit to data and makes the argument for moving to copula random effects models. Our modelling framework is implemented in the package CopulaREMADA within the open source statistical environment R.

...read moreread less

Journal Article•DOI•

A mixed model reduces spurious genetic associations produced by population stratification in genome-wide association studies.

[...]

Jimin Shin¹, Chaeyoung Lee¹•Institutions (1)

Soongsil University¹

01 Apr 2015-Genomics

TL;DR: It is suggested that the mixed model methodology was useful to reduce spurious genetic associations produced by population stratification in GWAS, even with a high degree of admixture.

...read moreread less

Journal Article•DOI•

A mixed effect model for bivariate meta-analysis of diagnostic test accuracy studies using a copula representation of the random effects distribution.

[...]

Aristidis K. Nikoloulopoulos¹•Institutions (1)

University of East Anglia¹

20 Dec 2015-Statistics in Medicine

TL;DR: This study suggests that there can be an improvement on GLMM in fit to data and makes the argument for moving to copula random effects models.

...read moreread less

Journal Article•DOI•

Modelling and localizing a stem taper function for Pinus radiata in Spain

[...]

Manuel Arias-Rodil¹, Ulises Diéguez-Aranda¹, Francisco Rodríguez Puerta, Carlos A. López-Sánchez, Elena Canga Líbano, Asunción Cámara Obregón², Fernando Castedo-Dorado - Show less +3 more•Institutions (2)

University of Santiago de Compostela¹, University of Oviedo²

08 Jan 2015-Canadian Journal of Forest Research

TL;DR: In this paper, the parsimonious taper function proposed by Riemer et al. was fitted for radiata pine (Pinus radiata D. don) stems in Spain by using a nonlinear mixed modelling approach.

...read moreread less

Abstract: The parsimonious taper function proposed by Riemer et al. (1995. Allg. Forst.- Jagdztg. 166(7): 144–147) was fitted for radiata pine (Pinus radiata D. Don) stems in Spain by using a nonlinear mixed modelling approach. Eight candidate models (all possible expansion combinations of the three fixed parameters with random effects) were assessed, and the mixed model with three random effects performed the best according to the goodness-of-fit statistics. An evaluation data set was used to assess the performance of these models in predicting stem diameter along the bole, as well as total stem volume. Four prediction approaches were compared: one subject (tree) specific (SS) and three population specific (ordinary least squares (OLS), mean (M), and population averaged (PA)). The SSresponses for atree were estimated from a prior stem diameter measurement available for that tree, whereas OLS, M, and PA were obtained from the fixed-effects model, from the fixed parameters of mixed-effects models, and by computing mean predictions from the mixed-effects models over the distribution of random effects, respectively. Prediction errors were greater for the M and PA responses than for the OLS response, and therefore, from the prediction point of view, the use of the mixed-effects models is not recommended when an additional stem diameter measurement is not available. The mixed model with three random effects was also selected asthe best model for SSestimations. Measurementofan additional stem diameter at a relative tree height of approximately 0.5 provided the best calibrations for stem diameters along the bole and total stem volume predictions. The SS approach increased the flexibility and efficiency of the selected mixed-effects model for localized predictions and thus improved the overall predictive capacity of the base model.

...read moreread less

Journal Article•DOI•

Factor analysis using mixed models of multi-environment trials with different levels of unbalancing

[...]

J. J. Nuvunga¹, Luciano Antonio de Oliveira², Andrezza Kellen Alves Pamplona¹, Carla Silva¹, R.R. Lima¹, Marcio Balestre¹ - Show less +2 more•Institutions (2)

Universidade Federal de Lavras¹, Universidade Federal da Grande Dourados²

13 Nov 2015-Genetics and Molecular Research

TL;DR: Results revealed the applicability of the PRESS statistic to evaluate the performance of stable genotypes in the biplot, and mixed models can confidently be used to evaluate stability in plant breeding programs, even with highly unbalanced data.

...read moreread less

Abstract: This study aimed to analyze the robustness of mixed models for the study of genotype-environment interactions (G x E). Simulated unbalancing of real data was used to determine if the method could predict missing genotypes and select stable genotypes. Data from multi-environment trials containing 55 maize hybrids, collected during the 2005-2006 harvest season, were used in this study. Analyses were performed in two steps: the variance components were estimated by restricted maximum likelihood, using the expectation-maximization (EM) algorithm, and factor analysis (FA) was used to calculate the factor scores and relative position of each genotype in the biplot. Random unbalancing of the data was performed by removing 10, 30, and 50% of the plots; the scores were then re-estimated using the FA model. It was observed that 10, 30, and 50% unbalancing exhibited mean correlation values of 0.7, 0.6, and 0.56, respectively. Overall, the genotypes classified as stable in the biplot had smaller prediction error sum of squares (PRESS) value and prediction amplitude of ellipses. Therefore, our results revealed the applicability of the PRESS statistic to evaluate the performance of stable genotypes in the biplot. This result was confirmed by the sizes of the prediction ellipses, which were smaller for the stable genotypes. Therefore, mixed models can confidently be used to evaluate stability in plant breeding programs, even with highly unbalanced data.

...read moreread less

Journal Article•DOI•

Likelihood analysis for a class of beta mixed models

[...]

Wagner Hugo Bonat, Paulo Justiniano Ribeiro, Walmes Marques Zeviani

01 Feb 2015-Journal of Applied Statistics

TL;DR: In this article, a class of Beta mixed models is adopted for the analysis of real problems with grouped data structures, such as hierarchical, repeated measures and longitudinal data structures typically induce extra variability and/or dependence and can be explained by the inclusion of random effects.

...read moreread less

Abstract: Beta regression is a suitable choice for modelling continuous response variables taking values on the unit interval. Data structures such as hierarchical, repeated measures and longitudinal typically induce extra variability and/or dependence and can be accounted for by the inclusion of random effects. In this sense, Statistical inference typically requires numerical methods, possibly combined with sampling algorithms. A class of Beta mixed models is adopted for the analysis of two real problems with grouped data structures. We focus on likelihood inference and describe the implemented algorithms. The first is a study on the life quality index of industry workers with data collected according to an hierarchical sampling scheme. The second is a study assessing the impact of hydroelectric power plants upon measures of water quality indexes up, downstream and at the reservoirs of the dammed rivers, with a nested and longitudinal data structure. Results from different algorithms are reported for comparison in...

...read moreread less

Journal Article•DOI•

Heritability estimation in high dimensional sparse linear mixed models

[...]

Anna Bonnet¹, Elisabeth Gassiat², Céline Lévy-Leduc²•Institutions (2)

University of Paris-Sud¹, ParisTech²

01 Jan 2015-Electronic Journal of Statistics

TL;DR: In this paper, the authors proposed to estimate the heritability in high-dimensional sparse linear mixed models, where the random effects can be sparse, that is may contain null components, but we do not know either their proportion or their positions.

...read moreread less

Abstract: Motivated by applications in genetic fields, we propose to estimate the heritability in high-dimensional sparse linear mixed models. The heritability determines how the variance is shared between the different random components of a linear mixed model. The main novelty of our approach is to consider that the random effects can be sparse, that is may contain null components, but we do not know either their proportion or their positions. The estimator that we consider is strongly inspired by the one proposed by Pirinen, Donnelly and Spencer (2013), and is based on a maximum likelihood approach. We also study the theoretical properties of our estimator, namely we establish that our estimator of the heritability is $\sqrt{n}$-consistent when both the number of observations $n$ and the number of random effects $N$ tend to infinity under mild assumptions. We also prove that our estimator of the heritability satisfies a central limit theorem which gives as a byproduct a confidence interval for the heritability. Some Monte-Carlo experiments are also conducted in order to show the finite sample performances of our estimator.

...read moreread less

Journal Article•DOI•

A heteroscedastic measurement error model for method comparison data with replicate measurements.

[...]

Lakshika S. Nawarathna¹, Pankaj K. Choudhary²•Institutions (2)

University of Peradeniya¹, University of Texas at Dallas²

30 Mar 2015-Statistics in Medicine

TL;DR: This work presents a model for the case when the measurements are replicated, discusses its fitting, and explains how to evaluate similarity of measurement methods and agreement between them, which are two common goals of data analysis, under this model.

...read moreread less

Abstract: Measurement error models offer a flexible framework for modeling data collected in studies comparing methods of quantitative measurement. These models generally make two simplifying assumptions: (i) the measurements are homoscedastic, and (ii) the unobservable true values of the methods are linearly related. One or both of these assumptions may be violated in practice. In particular, error variabilities of the methods may depend on the magnitude of measurement, or the true values may be nonlinearly related. Data with these features call for a heteroscedastic measurement error model that allows nonlinear relationships in the true values. We present such a model for the case when the measurements are replicated, discuss its fitting, and explain how to evaluate similarity of measurement methods and agreement between them, which are two common goals of data analysis, under this model. Model fitting involves dealing with lack of a closed form for the likelihood function. We consider estimation methods that approximate either the likelihood or the model to yield approximate maximum likelihood estimates. The fitting methods are evaluated in a simulation study. The proposed methodology is used to analyze a cholesterol dataset.

...read moreread less

Journal Article•DOI•

Increasing the efficiency of MCMC for hierarchical phylogenetic models of categorical traits using reduced mixed models

[...]

Jarrod D. Hadfield¹•Institutions (1)

University of Edinburgh¹

01 Jun 2015-Methods in Ecology and Evolution

TL;DR: It is shown that in the context of phylogenetic mixed models, part of the G‐st structure can be moved into the R‐structure and integrated out deterministically, and that a GLMM with such an assumption is equivalent to the model proposed by Felsenstein.

...read moreread less

Abstract: Summary Integrating out the random effects in generalised linear mixed models (GLMM) cannot be done analytically unless the response is Gaussian. Many stochastic, deterministic or hybrid algorithms have been developed to perform the integration. With categorical data and probit link (aka the threshold model), the random effect structure can be partitioned into a part that can be easily integrated deterministically (the R-structure) and a part that cannot (the G-structure). We show that in the context of phylogenetic mixed models, part of the G-structure (the phylogenetic effects at the tips) can be moved into the R-structure and integrated out deterministically. This result follows directly from the concept of the reduced animal model from quantitative genetics (Journal of Animal Science, 51, 1980, 1277) and its implications for discrete data (Genetics Selection Evolution, 42, 2010, 1). Although the conditional distribution of the phylogenetic variance is no longer in standard from, it does provide a stable and efficient 2-block MCMC algorithm for situations when the phylogenetic heritability is assumed to be one. We show that a GLMM with such an assumption is equivalent to the model proposed by Felsenstein (American Naturalist, 179, 2005, 145). Extensions to multivariate models are straightforward and a 3-block algorithm can be constructed when there is only a single categorical trait but multiple Gaussian traits. With ≥2 categorical traits, an additional non-Gibbs update is required for the correlation (sub)matrix. An implementation of these algorithms is distributed in the r package MCMCglmm and is up to several orders of magnitude faster than published alternatives.

...read moreread less

Journal Article•DOI•

A novel individual-tree mixed model to account for competition and environmental heterogeneity: a Bayesian approach

[...]

Eduardo P. Cappa¹, Eduardo P. Cappa², Facundo Muñoz³, Leopoldo Sanchez³, Rodolfo Juan Carlos Cantet⁴, Rodolfo Juan Carlos Cantet² - Show less +2 more•Institutions (4)

International Trademark Association¹, National Scientific and Technical Research Council², Institut national de la recherche agronomique³, University of Buenos Aires⁴

26 Oct 2015-Tree Genetics & Genomes

TL;DR: An individual-tree mixed model with direct additive genetic, genetic, and environmental competition effects is extended by incorporating a two-dimensional smoothing surface to account for complex patterns of environmental heterogeneity (competition + spatial model (CSM).

...read moreread less

Abstract: Negative correlation caused by competition among individuals and positive spatial correlation due to environmental heterogeneity may lead to biases in estimating genetic parameters and predicting breeding values (BVs) from forest genetic trials. Former models dealing with competition and environmental heterogeneity did not account for the additive relationships among trees or for the full spatial covariance. This paper extends an individual-tree mixed model with direct additive genetic, genetic, and environmental competition effects, by incorporating a two-dimensional smoothing surface to account for complex patterns of environmental heterogeneity (competition + spatial model (CSM)). We illustrate the proposed model using simulated and real data from a loblolly pine progeny trial. The CSM was compared with three reduced individual-tree mixed models using a real dataset, while simulations comprised only CSM versus true-parameters comparisons. Dispersion parameters were estimated using Bayesian techniques via Gibbs sampling. Simulation results showed that the CSM yielded posterior mean estimates of variance components with slight or negligible biases in the studied scenarios, except for the permanent environment variance. The worst performance of the simulated CSM was under a scenario with weak competition effects and small-scale environmental heterogeneity. When analyzing real data, the CSM yielded a lower value of the deviance information criterion than the reduced models. Moreover, although correlations between predicted BVs calculated from CSM and from a standard model with block effects and direct genetic effects only were high, the ranking among the top 5 % ranked individuals showed differences which indicated that the two models will have quite different genotype selections for the next cycle of breeding.

...read moreread less

Book•

Dynamical Biostatistical Models

[...]

Daniel Commenges, Hélène Jacqmin-Gadda¹•Institutions (1)

University of Bordeaux¹

02 Oct 2015

TL;DR: The dynamic approach to causal reasoning in ageing studies Mechanistic models The issue of dynamic treatment regimes Appendix: Software Index.

...read moreread less

Abstract: Introduction General presentation of the book Organization of the book Notation Presentation of examples Classical Biostatistical Models Inference Generalities on inference: the concept of model Likelihood and applications Other types of likelihoods and estimation methods Model choice Optimization algorithms Survival Analysis Introduction Event, origin, and functions of interest Observation patterns: censoring and truncation Estimation of the survival function The proportional hazards model Accelerated failure time model Counting processes approach Additive hazards models Degradation models Models for Longitudinal Data Linear mixed models Generalized mixed linear models Non-linear mixed models Marginal models and generalized estimating equations (GEE) Incomplete longitudinal data Modeling strategies Advanced Biostatistical Models Extensions of Mixed Models Mixed models for curvilinear outcomes Mixed models for multivariate longitudinal data Latent class mixed models Advanced Survival Models Relative survival Competing risks models Frailty models Extension of frailty models Cure models Multistate Models Introduction Multistate processes Multistate models: generalities Observation schemes Statistical inference for multistate models observed in continuous time Inference for multistate models from interval-censored data Complex functions of parameters: individualized hazards, sojourn times Approach by counting processes Other approaches Joint Models for Longitudinal and Time-to-Event Data Introduction Models with shared random effects Latent class joint model Latent classes versus shared random effects The joint model as prognostic model Extension of joint models The Dynamic Approach to Causality Introduction Local independence, direct and indirect influence Causal influences The dynamic approach to causal reasoning in ageing studies Mechanistic models The issue of dynamic treatment regimes Appendix: Software Index

...read moreread less

Journal Article•DOI•

Asymptotic behavior of mixed power variations and statistical estimation in mixed models

[...]

Marco Dozzi¹, Yuliya Mishura², Georgiy Shevchenko²•Institutions (2)

University of Lorraine¹, Taras Shevchenko National University of Kyiv²

01 Jul 2015-Statistical Inference for Stochastic Processes

TL;DR: In this paper, the asymptotic behavior of power variations of a linear combination of independent Wiener process and fractional Brownian motion is studied. And the results are applied to construct consistent parameter estimators and approximate confidence intervals in mixed models.

...read moreread less

Abstract: In this paper we study asymptotic behaviour of power variations of a linear combination of independent Wiener process and fractional Brownian motion. These results are applied to construct consistent parameter estimators and approximate confidence intervals in mixed models.

...read moreread less

Journal Article•DOI•

A likelihood-based two-part marginal model for longitudinal semicontinuous data.

[...]

Li Su¹, Brian D. M. Tom¹, Vernon T. Farewell¹•Institutions (1)

Medical Research Council¹

01 Apr 2015-Statistical Methods in Medical Research

TL;DR: This work proposes a fully likelihood-based two-part marginal model that satisfies this need by using the bridge distribution for the random effect in the binary part of an underlying two- part mixed model; and its maximum likelihood estimation can be routinely implemented via standard statistical software such as the SAS NLMIXED procedure.

...read moreread less

Abstract: Two-part models are an attractive approach for analysing longitudinal semicontinuous data consisting of a mixture of true zeros and continuously distributed positive values. When the population-averaged (marginal) covariate effects are of interest, two-part models that provide straightforward interpretation of the marginal effects are desirable. Presently, the only available approaches for fitting two-part marginal models to longitudinal semicontinuous data are computationally difficult to implement. Therefore, there exists a need to develop two-part marginal models that can be easily implemented in practice. We propose a fully likelihood-based two-part marginal model that satisfies this need by using the bridge distribution for the random effect in the binary part of an underlying two-part mixed model; and its maximum likelihood estimation can be routinely implemented via standard statistical software such as the SAS NLMIXED procedure. We illustrate the usage of this new model by investigating the marginal effects of pre-specified genetic markers on physical functioning, as measured by the Health Assessment Questionnaire, in a cohort of psoriatic arthritis patients from the University of Toronto Psoriatic Arthritis Clinic. An added benefit of our proposed marginal model when compared to a two-part mixed model is the robustness in regression parameter estimation when departure from the true random effects structure occurs. This is demonstrated through simulation.

...read moreread less

Journal Article•DOI•

Methods to assess an exercise intervention trial based on 3-level functional data

[...]

Haocheng Li¹, Sarah Kozey Keadle, John Staudenmayer², Houssein Assaad³, Jianhua Z. Huang³, Raymond J. Carroll⁴ - Show less +2 more•Institutions (4)

University of Calgary¹, University of Massachusetts Amherst², Texas A&M University³, University of Technology, Sydney⁴

01 Oct 2015-Biostatistics

TL;DR: A model to assess the effects of a treatment when the data are functional with 3 levels (subjects, weeks and days in the authors' application) and possibly incomplete is developed, with 3-level mean structure effects, all stratified by treatment and subject random effects.

...read moreread less

Abstract: Motivated by data recording the effects of an exercise intervention on subjects' physical activity over time, we develop a model to assess the effects of a treatment when the data are functional with 3 levels (subjects, weeks and days in our application) and possibly incomplete. We develop a model with 3-level mean structure effects, all stratified by treatment and subject random effects, including a general subject effect and nested effects for the 3 levels. The mean and random structures are specified as smooth curves measured at various time points. The association structure of the 3-level data is induced through the random curves, which are summarized using a few important principal components. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed effects model framework for model fitting, prediction and inference. We develop an algorithm to fit the model iteratively with the Expectation/Conditional Maximization Either (ECME) version of the EM algorithm and eigenvalue decompositions. Selection of the number of principal components and handling incomplete data issues are incorporated into the algorithm. The performance of the Wald-type hypothesis test is also discussed. The method is applied to the physical activity data and evaluated empirically by a simulation study.

...read moreread less

Journal Article•DOI•

Coupling an oceanographic model to a Fishery Observing System through mixed models: the importance of fronts for anchovy in the Adriatic Sea

[...]

Piera Carpi¹, Michela Martinelli¹, Andrea Belardinelli¹, Aniello Russo¹, Enrico Arneri¹, Alessandro Coluccelli, Alberto Santojanni¹ - Show less +3 more•Institutions (1)

National Research Council¹

01 Nov 2015-Fisheries Oceanography

TL;DR: In this paper, the authors used generalized additive mixed models (GAMM) with and without random effects to identify a relationship between abundance in the catch and oceanographic conditions, and the results demonstrate that GAMM are a useful tool to combine geo-referenced catch data with oceanographic variables and that the use of a mixed model approach with spatial and temporal random effects is an effective way to depict the dynamics of marine species.

...read moreread less

Abstract: Anchovy, Engraulis encrasicolus, forms the basis of Italian small pelagic fisheries in the Adriatic Sea. The strong dependence of this stock on environmental factors and the consequent high variability makes the dynamics of this species particularly complicated to model. Weekly geo-referenced catch data of anchovy obtained by means of a Fishery Observing System (FOS) from 2005 to 2011 were referred to a 0.2 × 0.2 degree grid (about 20 km2) and associated with the environmental parameters calculated by a Regional Ocean Modelling System, AdriaROMS. Generalized Additive Mixed Models (GAMM) with and without random effects were used to identify a relationship between abundance in the catch and oceanographic conditions. The outcomes of models with no random effects, with random vessel effects and with the random vessel and random week-of-the-year effects were examined. The GAMM incorporating a random vessel and week-of-the-year effect were selected as the best model on the basis of the Akaike information criteria (AIC). This model indicated that catches (abundance) of anchovy in the Adriatic Sea correlate well with low temperatures, salinity fronts and sea surface height, and allowed the identification of areas where high concentrations of this species are most likely to occur. The results of this study demonstrate that GAMM are a useful tool to combine geo-referenced catch data with oceanographic variables and that the use of a mixed-model approach with spatial and temporal random effects is an effective way to depict the dynamics of marine species.

...read moreread less

Journal Article•DOI•

How sampling influences the statistical power to detect changes in abundance: an application to river restoration

[...]

Lise Vaudor¹, Nicolas Lamouroux, Jean-Michel Olivier¹, Maxence Forcellini•Institutions (1)

University of Lyon¹

01 Jun 2015-Freshwater Biology

TL;DR: In this article, the authors used local abundance counts from the Rho-ne river restoration monitoring program to quantify the statistical power of sampling strategies and population characteristics for detecting restoration effects.

...read moreread less

Abstract: 1. Assessing how populations respond to ecological restoration is particularly difficult because their abundance results from many sources of variation. In addition, abundance estimates depend on sam- pling efforts that are limited by financial or practical constraints. 2. We used local abundance counts from the Rho^ne river restoration monitoring programme to quan- tify how sampling strategies and population characteristics influenced statistical power for detecting restoration effects. 3. We first fitted observed changes in abundance of 13 fish taxa and 35 invertebrate taxa collected in microhabitats of four restored reaches of the Rho^ne river over 15 years, using a generalised linear mixed model. The model accounted for a restoration effect, random temporal variation between field surveys and spatial variation within surveys (i.e. microhabitat variation in abundance was assumed to follow a negative binomial distribution). We then used numerical simulations to calculate the sta- tistical power (i.e. the probability of detecting a true change) and the type I error (the probability of detecting a non-existent change) associated with various hypotheses of restoration effect size, mean abundance and temporal and spatial variation. 4. Model fits revealed that accounting for temporal variation is needed to reduce type I error associ- ated with the effect of restoration. Significant abundance changes were observed for 27 of 104 (26%) of the taxa-reach combinations. 5. When assuming temporal variation and population characteristics typical of our data sets, power simulations showed that the probability of detecting a moderate change (50–200%) in abundance was <38% in all tests. The average probability of detecting large changes (500–1000%) was 61%. In these conditions, power was increased by low spatial variation and high sampling effort. Large numbers of surveys (e.g. 16 instead of 4) increased the power by 20 points if surveys were balanced before and after restoration. 6. Because our simulations covered a wide variety of population characteristics and sampling strate- gies, they can be used a priori to determine which sampling strategy is best adapted for detecting res- toration effects from repeated abundance counts.

...read moreread less

Collapse