scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2015"


Journal ArticleDOI
TL;DR: BOLT-LMM is presented, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes.
Abstract: Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

1,232 citations


01 Mar 2015
TL;DR: BOLT-LMM is presented, which requires only a small number of O(MN) iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes.
Abstract: Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts, and may not optimize power. All existing methods require time cost O(MN2) (where N = #samples and M = #SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here, we present a far more efficient mixed model association method, BOLT-LMM, which requires only a small number of O(MN) iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to nine quantitative traits in 23,294 samples from the Women’s Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for GWAS in large cohorts.

322 citations


Journal ArticleDOI
TL;DR: The R package lcmm as mentioned in this paper provides a series of functions to estimate statistical models based on linear mixed model theory, including the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes.
Abstract: The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.

229 citations


Journal ArticleDOI
TL;DR: An extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data is proposed.
Abstract: We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.

210 citations


Journal ArticleDOI
01 Feb 2015-Genetics
TL;DR: Mixed models at the individual plant or plot level produced more realistic heritability estimates, and for simulated traits standard errors were up to 13 times smaller, and genomic prediction was improved by using these mixed models, with up to a 49% increase in accuracy.
Abstract: Heritability is a central parameter in quantitative genetics, from both an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within- and between-genotype variability. This approach estimates broad-sense heritability and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker-based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here is to use mixed models at the individual plant or plot level. Using statistical arguments, simulations, and real data we investigate the feasibility of both approaches and how these affect genomic prediction with the best linear unbiased predictor and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at the individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For genome-wide association studies on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.

160 citations


Journal ArticleDOI
01 Apr 2015-Geoderma
TL;DR: In this article, a case study from Southern Brazil to map clay content (CLAY), organic carbon content (SOC), and effective cation exchange capacity (ECEC) of the topsoil for a ~ 2000 ha area located on the edge of the plateau of the Parana Sedimentary Basin.

100 citations


Journal ArticleDOI
TL;DR: An approach for automated mixed ANOVA/ANCOVA modeling together with the open source R package lmerTest developed by the authors that can perform automated complex mixed-effects modeling is introduced.

93 citations


Journal ArticleDOI
TL;DR: In this paper, a generalized heterogeneous data model (GHDM) is proposed to jointly handle mixed types of dependent variables by representing the covariance relationships among them through a reduced number of latent factors.
Abstract: This paper formulates a generalized heterogeneous data model (GHDM) that jointly handles mixed types of dependent variables—including multiple nominal outcomes, multiple ordinal variables, and multiple count variables, as well as multiple continuous variables—by representing the covariance relationships among them through a reduced number of latent factors. Sufficiency conditions for identification of the GHDM parameters are presented. The maximum approximate composite marginal likelihood (MACML) method is proposed to estimate this jointly mixed model system. This estimation method provides computational time advantages since the dimensionality of integration in the likelihood function is independent of the number of latent factors. The study undertakes a simulation experiment within the virtual context of integrating residential location choice and travel behavior to evaluate the ability of the MACML approach to recover parameters. The simulation results show that the MACML approach effectively recovers underlying parameters, and also that ignoring the multi-dimensional nature of the relationship among mixed types of dependent variables can lead not only to inconsistent parameter estimation, but also have important implications for policy analysis.

81 citations


Journal ArticleDOI
TL;DR: This study illustrates that the linear mixed model is the preferred method to investigate risk factors associated with renal function trajectories in studies, where patients may dropout during the study period because of initiation of renal replacement therapy.
Abstract: Background. The most commonly used methods to investigate risk factors associated with renal function trajectory over time include linear regression on individual glomerular filtration rate (GFR) slopes, linear mixed models and generalized estimating equations (GEEs). The objective of this study was to explain the principles of these three methods and to discuss their advantages and limitations in particular when renal function trajectories are not completely observable due to dropout. Methods. We generated data from a hypothetical cohort of 200 patients with chronic kidney disease at inclusion and seven subsequent annual measurements of GFR. The data were generated such that both baseline level and slope of GFR over time were associated with baseline albuminuria status. In a second version of the dataset, we assumed that patients systematically dropped out after a GFR measurement of <15 mL/min/1.73 m 2 .E ach dataset was analysed with the three methods. Results. The estimated effects of baseline albuminuria status on GFR slope were similar among the three methods when no patient dropped out. When 32.7% dropped out, standard GEE provided biased estimates of the mean GFR slope in normo-, micro- and macroalbuminuric patients. Linear regression on individual slopes and linear mixed models provided slope estimates of the same magnitude, likely because most patients had at least three GFR measurements. However, the linear mixed model was the only method to provide effect estimates on both slope and baseline level of GFR unaffected by dropout. Conclusion. This study illustrates that the linear mixed model is the preferred method to investigate risk factors associated with renal function trajectories in studies, where patients may dropout during the study period because of initiation of renal replacement therapy.

79 citations


Journal ArticleDOI
TL;DR: This work proposes a general function‐on‐function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing.
Abstract: over a grid|is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a ne grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images.

74 citations


Journal ArticleDOI
TL;DR: The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)—for variance components estimation—to deal with non-standard structures of the covariance matrix of the random effects.
Abstract: A new computational algorithm for estimating the smoothing parameters of a multidimensional penalized spline generalized linear model with anisotropic penalty is presented. This new proposal is based on the mixed model representation of a multidimensional P-spline, in which the smoothing parameter for each covariate is expressed in terms of variance components. On the basis of penalized quasi-likelihood methods, closed-form expressions for the estimates of the variance components are obtained. This formulation leads to an efficient implementation that considerably reduces the computational burden. The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)--for variance components estimation--to deal with non-standard structures of the covariance matrix of the random effects. The practical performance of the proposed algorithm is evaluated by means of simulations, and comparisons with alternative methods are made on the basis of the mean square error criterion and the computing time. Finally, we illustrate our proposal with the analysis of two real datasets: a two dimensional example of historical records of monthly precipitation data in USA and a three dimensional one of mortality data from respiratory disease according to the age at death, the year of death and the month of death.

Journal ArticleDOI
TL;DR: In this paper, the estimation of small area labour force indicators like total employed and unemployed people and unemployment rates is derived from four multinomial logit mixed models, including a model with correlated time and area random effects.
Abstract: Summary The aim of the paper is the estimation of small area labour force indicators like totals of employed and unemployed people and unemployment rates. Small area estimators of these quantities are derived from four multinomial logit mixed models, including a model with correlated time and area random effects. Mean-squared errors are used to measure the accuracy of the estimators proposed and they are estimated by analytic and bootstrap methods. The methodology introduced is applied to real data from the Spanish Labour Force Survey of Galicia.

Posted ContentDOI
18 Sep 2015-bioRxiv
TL;DR: An algorithm for genetic analysis of complex traits using genome-wide SNPs in a linear mixed model framework that could be more than 1000 times faster than current standard REML software based on the mixed model equation.
Abstract: We have developed an algorithm for genetic analysis of complex traits using genome-wide SNPs in a linear mixed model framework. Compared to current standard REML software, our method could be more than 1000 times faster. The advantage is largest when there is only a single genetic covariance structure. The method is particularly useful for multivariate analysis, including random regression models for studying reaction norms. We applied our proposed method to publicly available mice and human data and discuss advantages and limitations.

Journal ArticleDOI
TL;DR: Stochastic differential mixed effects models are useful tools for identifying incomplete or inaccurate model dynamics and for reducing potential bias in parameter estimates due to such model deficiencies.
Abstract: Inclusion of stochastic differential equations in mixed effects models provides means to quantify and distinguish three sources of variability in data. In addition to the two commonly encountered sources, measurement error and interindividual variability, we also consider uncertainty in the dynamical model itself. To this end, we extend the ordinary differential equation setting used in nonlinear mixed effects models to include stochastic differential equations. The approximate population likelihood is derived using the first-order conditional estimation with interaction method and extended Kalman filtering. To illustrate the application of the stochastic differential mixed effects model, two pharmacokinetic models are considered. First, we use a stochastic one-compartmental model with first-order input and nonlinear elimination to generate synthetic data in a simulated study. We show that by using the proposed method, the three sources of variability can be successfully separated. If the stochastic part is neglected, the parameter estimates become biased, and the measurement error variance is significantly overestimated. Second, we consider an extension to a stochastic pharmacokinetic model in a preclinical study of nicotinic acid kinetics in obese Zucker rats. The parameter estimates are compared between a deterministic and a stochastic NiAc disposition model, respectively. Discrepancies between model predictions and observations, previously described as measurement noise only, are now separated into a comparatively lower level of measurement noise and a significant uncertainty in model dynamics. These examples demonstrate that stochastic differential mixed effects models are useful tools for identifying incomplete or inaccurate model dynamics and for reducing potential bias in parameter estimates due to such model deficiencies.

Journal ArticleDOI
TL;DR: In this paper, a copula mixed model is proposed for bivariate meta-analysis of diagnostic test accuracy studies, which includes the generalized linear mixed model as a special case and can also operate on the original scale of sensitivity and specificity.
Abstract: Diagnostic test accuracy studies typically report the number of true positives, false positives, true negatives and false negatives. There usually exists a negative association between the number of true positives and true negatives, because studies that adopt less stringent criterion for declaring a test positive invoke higher sensitivities and lower specificities. A generalized linear mixed model (GLMM) is currently recommended to synthesize diagnostic test accuracy studies. We propose a copula mixed model for bivariate meta-analysis of diagnostic test accuracy studies. Our general model includes the GLMM as a special case and can also operate on the original scale of sensitivity and specificity. Summary receiver operating characteristic curves are deduced for the proposed model through quantile regression techniques and different characterizations of the bivariate random effects distribution. Our general methodology is demonstrated with an extensive simulation study and illustrated by re-analysing the data of two published meta-analyses. Our study suggests that there can be an improvement on GLMM in fit to data and makes the argument for moving to copula random effects models. Our modelling framework is implemented in the package CopulaREMADA within the open source statistical environment R.

Journal ArticleDOI
01 Apr 2015-Genomics
TL;DR: It is suggested that the mixed model methodology was useful to reduce spurious genetic associations produced by population stratification in GWAS, even with a high degree of admixture.

Journal ArticleDOI
TL;DR: This study suggests that there can be an improvement on GLMM in fit to data and makes the argument for moving to copula random effects models.
Abstract: Diagnostic test accuracy studies typically report the number of true positives, false positives, true negatives and false negatives. There usually exists a negative association between the number of true positives and true negatives, because studies that adopt less stringent criterion for declaring a test positive invoke higher sensitivities and lower specificities. A generalized linear mixed model (GLMM) is currently recommended to synthesize diagnostic test accuracy studies. We propose a copula mixed model for bivariate meta-analysis of diagnostic test accuracy studies. Our general model includes the GLMM as a special case and can also operate on the original scale of sensitivity and specificity. Summary receiver operating characteristic curves are deduced for the proposed model through quantile regression techniques and different characterizations of the bivariate random effects distribution. Our general methodology is demonstrated with an extensive simulation study and illustrated by re-analysing the data of two published meta-analyses. Our study suggests that there can be an improvement on GLMM in fit to data and makes the argument for moving to copula random effects models. Our modelling framework is implemented in the package CopulaREMADA within the open source statistical environment R.

Journal ArticleDOI
TL;DR: In this paper, the parsimonious taper function proposed by Riemer et al. was fitted for radiata pine (Pinus radiata D. don) stems in Spain by using a nonlinear mixed modelling approach.
Abstract: The parsimonious taper function proposed by Riemer et al. (1995. Allg. Forst.- Jagdztg. 166(7): 144–147) was fitted for radiata pine (Pinus radiata D. Don) stems in Spain by using a nonlinear mixed modelling approach. Eight candidate models (all possible expansion combinations of the three fixed parameters with random effects) were assessed, and the mixed model with three random effects performed the best according to the goodness-of-fit statistics. An evaluation data set was used to assess the performance of these models in predicting stem diameter along the bole, as well as total stem volume. Four prediction approaches were compared: one subject (tree) specific (SS) and three population specific (ordinary least squares (OLS), mean (M), and population averaged (PA)). The SSresponses for atree were estimated from a prior stem diameter measurement available for that tree, whereas OLS, M, and PA were obtained from the fixed-effects model, from the fixed parameters of mixed-effects models, and by computing mean predictions from the mixed-effects models over the distribution of random effects, respectively. Prediction errors were greater for the M and PA responses than for the OLS response, and therefore, from the prediction point of view, the use of the mixed-effects models is not recommended when an additional stem diameter measurement is not available. The mixed model with three random effects was also selected asthe best model for SSestimations. Measurementofan additional stem diameter at a relative tree height of approximately 0.5 provided the best calibrations for stem diameters along the bole and total stem volume predictions. The SS approach increased the flexibility and efficiency of the selected mixed-effects model for localized predictions and thus improved the overall predictive capacity of the base model.

Journal ArticleDOI
TL;DR: Results revealed the applicability of the PRESS statistic to evaluate the performance of stable genotypes in the biplot, and mixed models can confidently be used to evaluate stability in plant breeding programs, even with highly unbalanced data.
Abstract: This study aimed to analyze the robustness of mixed models for the study of genotype-environment interactions (G x E). Simulated unbalancing of real data was used to determine if the method could predict missing genotypes and select stable genotypes. Data from multi-environment trials containing 55 maize hybrids, collected during the 2005-2006 harvest season, were used in this study. Analyses were performed in two steps: the variance components were estimated by restricted maximum likelihood, using the expectation-maximization (EM) algorithm, and factor analysis (FA) was used to calculate the factor scores and relative position of each genotype in the biplot. Random unbalancing of the data was performed by removing 10, 30, and 50% of the plots; the scores were then re-estimated using the FA model. It was observed that 10, 30, and 50% unbalancing exhibited mean correlation values of 0.7, 0.6, and 0.56, respectively. Overall, the genotypes classified as stable in the biplot had smaller prediction error sum of squares (PRESS) value and prediction amplitude of ellipses. Therefore, our results revealed the applicability of the PRESS statistic to evaluate the performance of stable genotypes in the biplot. This result was confirmed by the sizes of the prediction ellipses, which were smaller for the stable genotypes. Therefore, mixed models can confidently be used to evaluate stability in plant breeding programs, even with highly unbalanced data.

Journal ArticleDOI
TL;DR: In this article, a class of Beta mixed models is adopted for the analysis of real problems with grouped data structures, such as hierarchical, repeated measures and longitudinal data structures typically induce extra variability and/or dependence and can be explained by the inclusion of random effects.
Abstract: Beta regression is a suitable choice for modelling continuous response variables taking values on the unit interval. Data structures such as hierarchical, repeated measures and longitudinal typically induce extra variability and/or dependence and can be accounted for by the inclusion of random effects. In this sense, Statistical inference typically requires numerical methods, possibly combined with sampling algorithms. A class of Beta mixed models is adopted for the analysis of two real problems with grouped data structures. We focus on likelihood inference and describe the implemented algorithms. The first is a study on the life quality index of industry workers with data collected according to an hierarchical sampling scheme. The second is a study assessing the impact of hydroelectric power plants upon measures of water quality indexes up, downstream and at the reservoirs of the dammed rivers, with a nested and longitudinal data structure. Results from different algorithms are reported for comparison in...

Journal ArticleDOI
TL;DR: In this paper, the authors proposed to estimate the heritability in high-dimensional sparse linear mixed models, where the random effects can be sparse, that is may contain null components, but we do not know either their proportion or their positions.
Abstract: Motivated by applications in genetic fields, we propose to estimate the heritability in high-dimensional sparse linear mixed models. The heritability determines how the variance is shared between the different random components of a linear mixed model. The main novelty of our approach is to consider that the random effects can be sparse, that is may contain null components, but we do not know either their proportion or their positions. The estimator that we consider is strongly inspired by the one proposed by Pirinen, Donnelly and Spencer (2013), and is based on a maximum likelihood approach. We also study the theoretical properties of our estimator, namely we establish that our estimator of the heritability is $\sqrt{n}$-consistent when both the number of observations $n$ and the number of random effects $N$ tend to infinity under mild assumptions. We also prove that our estimator of the heritability satisfies a central limit theorem which gives as a byproduct a confidence interval for the heritability. Some Monte-Carlo experiments are also conducted in order to show the finite sample performances of our estimator.

Journal ArticleDOI
TL;DR: This work presents a model for the case when the measurements are replicated, discusses its fitting, and explains how to evaluate similarity of measurement methods and agreement between them, which are two common goals of data analysis, under this model.
Abstract: Measurement error models offer a flexible framework for modeling data collected in studies comparing methods of quantitative measurement. These models generally make two simplifying assumptions: (i) the measurements are homoscedastic, and (ii) the unobservable true values of the methods are linearly related. One or both of these assumptions may be violated in practice. In particular, error variabilities of the methods may depend on the magnitude of measurement, or the true values may be nonlinearly related. Data with these features call for a heteroscedastic measurement error model that allows nonlinear relationships in the true values. We present such a model for the case when the measurements are replicated, discuss its fitting, and explain how to evaluate similarity of measurement methods and agreement between them, which are two common goals of data analysis, under this model. Model fitting involves dealing with lack of a closed form for the likelihood function. We consider estimation methods that approximate either the likelihood or the model to yield approximate maximum likelihood estimates. The fitting methods are evaluated in a simulation study. The proposed methodology is used to analyze a cholesterol dataset.

Journal ArticleDOI
TL;DR: It is shown that in the context of phylogenetic mixed models, part of the G‐st structure can be moved into the R‐structure and integrated out deterministically, and that a GLMM with such an assumption is equivalent to the model proposed by Felsenstein.
Abstract: Summary Integrating out the random effects in generalised linear mixed models (GLMM) cannot be done analytically unless the response is Gaussian. Many stochastic, deterministic or hybrid algorithms have been developed to perform the integration. With categorical data and probit link (aka the threshold model), the random effect structure can be partitioned into a part that can be easily integrated deterministically (the R-structure) and a part that cannot (the G-structure). We show that in the context of phylogenetic mixed models, part of the G-structure (the phylogenetic effects at the tips) can be moved into the R-structure and integrated out deterministically. This result follows directly from the concept of the reduced animal model from quantitative genetics (Journal of Animal Science, 51, 1980, 1277) and its implications for discrete data (Genetics Selection Evolution, 42, 2010, 1). Although the conditional distribution of the phylogenetic variance is no longer in standard from, it does provide a stable and efficient 2-block MCMC algorithm for situations when the phylogenetic heritability is assumed to be one. We show that a GLMM with such an assumption is equivalent to the model proposed by Felsenstein (American Naturalist, 179, 2005, 145). Extensions to multivariate models are straightforward and a 3-block algorithm can be constructed when there is only a single categorical trait but multiple Gaussian traits. With ≥2 categorical traits, an additional non-Gibbs update is required for the correlation (sub)matrix. An implementation of these algorithms is distributed in the r package MCMCglmm and is up to several orders of magnitude faster than published alternatives.

Journal ArticleDOI
TL;DR: An individual-tree mixed model with direct additive genetic, genetic, and environmental competition effects is extended by incorporating a two-dimensional smoothing surface to account for complex patterns of environmental heterogeneity (competition + spatial model (CSM).
Abstract: Negative correlation caused by competition among individuals and positive spatial correlation due to environmental heterogeneity may lead to biases in estimating genetic parameters and predicting breeding values (BVs) from forest genetic trials. Former models dealing with competition and environmental heterogeneity did not account for the additive relationships among trees or for the full spatial covariance. This paper extends an individual-tree mixed model with direct additive genetic, genetic, and environmental competition effects, by incorporating a two-dimensional smoothing surface to account for complex patterns of environmental heterogeneity (competition + spatial model (CSM)). We illustrate the proposed model using simulated and real data from a loblolly pine progeny trial. The CSM was compared with three reduced individual-tree mixed models using a real dataset, while simulations comprised only CSM versus true-parameters comparisons. Dispersion parameters were estimated using Bayesian techniques via Gibbs sampling. Simulation results showed that the CSM yielded posterior mean estimates of variance components with slight or negligible biases in the studied scenarios, except for the permanent environment variance. The worst performance of the simulated CSM was under a scenario with weak competition effects and small-scale environmental heterogeneity. When analyzing real data, the CSM yielded a lower value of the deviance information criterion than the reduced models. Moreover, although correlations between predicted BVs calculated from CSM and from a standard model with block effects and direct genetic effects only were high, the ranking among the top 5 % ranked individuals showed differences which indicated that the two models will have quite different genotype selections for the next cycle of breeding.

Book
02 Oct 2015
TL;DR: The dynamic approach to causal reasoning in ageing studies Mechanistic models The issue of dynamic treatment regimes Appendix: Software Index.
Abstract: Introduction General presentation of the book Organization of the book Notation Presentation of examples Classical Biostatistical Models Inference Generalities on inference: the concept of model Likelihood and applications Other types of likelihoods and estimation methods Model choice Optimization algorithms Survival Analysis Introduction Event, origin, and functions of interest Observation patterns: censoring and truncation Estimation of the survival function The proportional hazards model Accelerated failure time model Counting processes approach Additive hazards models Degradation models Models for Longitudinal Data Linear mixed models Generalized mixed linear models Non-linear mixed models Marginal models and generalized estimating equations (GEE) Incomplete longitudinal data Modeling strategies Advanced Biostatistical Models Extensions of Mixed Models Mixed models for curvilinear outcomes Mixed models for multivariate longitudinal data Latent class mixed models Advanced Survival Models Relative survival Competing risks models Frailty models Extension of frailty models Cure models Multistate Models Introduction Multistate processes Multistate models: generalities Observation schemes Statistical inference for multistate models observed in continuous time Inference for multistate models from interval-censored data Complex functions of parameters: individualized hazards, sojourn times Approach by counting processes Other approaches Joint Models for Longitudinal and Time-to-Event Data Introduction Models with shared random effects Latent class joint model Latent classes versus shared random effects The joint model as prognostic model Extension of joint models The Dynamic Approach to Causality Introduction Local independence, direct and indirect influence Causal influences The dynamic approach to causal reasoning in ageing studies Mechanistic models The issue of dynamic treatment regimes Appendix: Software Index

Journal ArticleDOI
TL;DR: In this paper, the asymptotic behavior of power variations of a linear combination of independent Wiener process and fractional Brownian motion is studied. And the results are applied to construct consistent parameter estimators and approximate confidence intervals in mixed models.
Abstract: In this paper we study asymptotic behaviour of power variations of a linear combination of independent Wiener process and fractional Brownian motion. These results are applied to construct consistent parameter estimators and approximate confidence intervals in mixed models.

Journal ArticleDOI
TL;DR: This work proposes a fully likelihood-based two-part marginal model that satisfies this need by using the bridge distribution for the random effect in the binary part of an underlying two- part mixed model; and its maximum likelihood estimation can be routinely implemented via standard statistical software such as the SAS NLMIXED procedure.
Abstract: Two-part models are an attractive approach for analysing longitudinal semicontinuous data consisting of a mixture of true zeros and continuously distributed positive values. When the population-averaged (marginal) covariate effects are of interest, two-part models that provide straightforward interpretation of the marginal effects are desirable. Presently, the only available approaches for fitting two-part marginal models to longitudinal semicontinuous data are computationally difficult to implement. Therefore, there exists a need to develop two-part marginal models that can be easily implemented in practice. We propose a fully likelihood-based two-part marginal model that satisfies this need by using the bridge distribution for the random effect in the binary part of an underlying two-part mixed model; and its maximum likelihood estimation can be routinely implemented via standard statistical software such as the SAS NLMIXED procedure. We illustrate the usage of this new model by investigating the marginal effects of pre-specified genetic markers on physical functioning, as measured by the Health Assessment Questionnaire, in a cohort of psoriatic arthritis patients from the University of Toronto Psoriatic Arthritis Clinic. An added benefit of our proposed marginal model when compared to a two-part mixed model is the robustness in regression parameter estimation when departure from the true random effects structure occurs. This is demonstrated through simulation.

Journal ArticleDOI
TL;DR: A model to assess the effects of a treatment when the data are functional with 3 levels (subjects, weeks and days in the authors' application) and possibly incomplete is developed, with 3-level mean structure effects, all stratified by treatment and subject random effects.
Abstract: Motivated by data recording the effects of an exercise intervention on subjects' physical activity over time, we develop a model to assess the effects of a treatment when the data are functional with 3 levels (subjects, weeks and days in our application) and possibly incomplete. We develop a model with 3-level mean structure effects, all stratified by treatment and subject random effects, including a general subject effect and nested effects for the 3 levels. The mean and random structures are specified as smooth curves measured at various time points. The association structure of the 3-level data is induced through the random curves, which are summarized using a few important principal components. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed effects model framework for model fitting, prediction and inference. We develop an algorithm to fit the model iteratively with the Expectation/Conditional Maximization Either (ECME) version of the EM algorithm and eigenvalue decompositions. Selection of the number of principal components and handling incomplete data issues are incorporated into the algorithm. The performance of the Wald-type hypothesis test is also discussed. The method is applied to the physical activity data and evaluated empirically by a simulation study.

Journal ArticleDOI
TL;DR: In this paper, the authors used generalized additive mixed models (GAMM) with and without random effects to identify a relationship between abundance in the catch and oceanographic conditions, and the results demonstrate that GAMM are a useful tool to combine geo-referenced catch data with oceanographic variables and that the use of a mixed model approach with spatial and temporal random effects is an effective way to depict the dynamics of marine species.
Abstract: Anchovy, Engraulis encrasicolus, forms the basis of Italian small pelagic fisheries in the Adriatic Sea. The strong dependence of this stock on environmental factors and the consequent high variability makes the dynamics of this species particularly complicated to model. Weekly geo-referenced catch data of anchovy obtained by means of a Fishery Observing System (FOS) from 2005 to 2011 were referred to a 0.2 × 0.2 degree grid (about 20 km2) and associated with the environmental parameters calculated by a Regional Ocean Modelling System, AdriaROMS. Generalized Additive Mixed Models (GAMM) with and without random effects were used to identify a relationship between abundance in the catch and oceanographic conditions. The outcomes of models with no random effects, with random vessel effects and with the random vessel and random week-of-the-year effects were examined. The GAMM incorporating a random vessel and week-of-the-year effect were selected as the best model on the basis of the Akaike information criteria (AIC). This model indicated that catches (abundance) of anchovy in the Adriatic Sea correlate well with low temperatures, salinity fronts and sea surface height, and allowed the identification of areas where high concentrations of this species are most likely to occur. The results of this study demonstrate that GAMM are a useful tool to combine geo-referenced catch data with oceanographic variables and that the use of a mixed-model approach with spatial and temporal random effects is an effective way to depict the dynamics of marine species.

Journal ArticleDOI
TL;DR: In this article, the authors used local abundance counts from the Rho-ne river restoration monitoring program to quantify the statistical power of sampling strategies and population characteristics for detecting restoration effects.
Abstract: 1. Assessing how populations respond to ecological restoration is particularly difficult because their abundance results from many sources of variation. In addition, abundance estimates depend on sam- pling efforts that are limited by financial or practical constraints. 2. We used local abundance counts from the Rho^ne river restoration monitoring programme to quan- tify how sampling strategies and population characteristics influenced statistical power for detecting restoration effects. 3. We first fitted observed changes in abundance of 13 fish taxa and 35 invertebrate taxa collected in microhabitats of four restored reaches of the Rho^ne river over 15 years, using a generalised linear mixed model. The model accounted for a restoration effect, random temporal variation between field surveys and spatial variation within surveys (i.e. microhabitat variation in abundance was assumed to follow a negative binomial distribution). We then used numerical simulations to calculate the sta- tistical power (i.e. the probability of detecting a true change) and the type I error (the probability of detecting a non-existent change) associated with various hypotheses of restoration effect size, mean abundance and temporal and spatial variation. 4. Model fits revealed that accounting for temporal variation is needed to reduce type I error associ- ated with the effect of restoration. Significant abundance changes were observed for 27 of 104 (26%) of the taxa-reach combinations. 5. When assuming temporal variation and population characteristics typical of our data sets, power simulations showed that the probability of detecting a moderate change (50–200%) in abundance was <38% in all tests. The average probability of detecting large changes (500–1000%) was 61%. In these conditions, power was increased by low spatial variation and high sampling effort. Large numbers of surveys (e.g. 16 instead of 4) increased the power by 20 points if surveys were balanced before and after restoration. 6. Because our simulations covered a wide variety of population characteristics and sampling strate- gies, they can be used a priori to determine which sampling strategy is best adapted for detecting res- toration effects from repeated abundance counts.