scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2008"


Journal ArticleDOI
TL;DR: This paper identifies several serious problems with the widespread use of ANOVAs for the analysis of categorical outcome variables, and introduces ordinary logit models (i.e. logistic regression), which are well-suited to analyze categorical data and offer many advantages over ANOVA.

2,895 citations


Journal ArticleDOI
TL;DR: Recent developments in the application of BLUP in plant breeding and variety testing are reviewed, including the use of pedigree information to model and exploit genetic correlation among relatives and theUse of flexible variance–covariance structures for genotype-by-environment interaction.
Abstract: Best linear unbiased prediction (BLUP) is a standard method for estimating random effects of a mixed model. This method was originally developed in animal breeding for estimation of breeding values and is now widely used in many areas of research. It does not, however, seem to have gained the same popularity in plant breeding and variety testing as it has in animal breeding. In plants, application of mixed models with random genetic effects has up until recently been mainly restricted to the estimation of genetic and non- genetic components of variance, whereas estimation of genotypic values is mostly based on a model with fixed effects. This paper reviews recent developments in the application of BLUP in plant breeding and variety testing. These include the use of pedigree information to model and exploit genetic correlation among relatives and the use of flexible variance-covariance structures for genotype-by-environment interaction. We demonstrate that BLUP has good predictive accuracy compared to other procedures. While pedi- gree information is often included via the so-called numerator relationship matrix ðAÞ, we stress that it is frequently straightforward to exploit the same infor- mation by a simple mixed model without explicit reference to the A-matrix.

578 citations


Journal ArticleDOI
TL;DR: A recently developed approximation to the distribution of the RLRT is identified as a rapid, powerful and reliable alternative to computationally intensive parametric bootstrap procedures.

280 citations


Journal ArticleDOI
TL;DR: How covariates can influence the mood variances is described, and the standard mixed model is extended by adding a subject-level random effect to the within-subject variance specification, allowing subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses.
Abstract: For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In studies using ecological momentary assessment (EMA), up to 30 or 40 observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within and between subjects. In this article, we focus on an adolescent smoking study using EMA where interest is on characterizing changes in mood variation. We describe how covariates can influence the mood variances, and also extend the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.

209 citations


Journal ArticleDOI
TL;DR: In this article, profile monitoring is used when the product or process quality is best represented by a function at each time period, and then the estimated parameters are monitored over time to determine if there have been changes.
Abstract: Profile monitoring is used when the product or process quality is best represented by a function at each time period. Often some parametric method is used and then the estimated parameters are monitored over time to determine if there have been changes..

193 citations


Journal ArticleDOI
TL;DR: In this paper, a small area estimation approach that combines small area random effects with a smooth, non-parametrically specified trend is proposed, where penalized splines are used as the representation for the nonparametric trend and the resulting model is readily fitted by using existing model fitting approaches such as restricted maximum likelihood.
Abstract: The paper proposes a small area estimation approach that combines small area random effects with a smooth, non-parametrically specified trend. By using penalized splines as the representation for the non-parametric trend, it is possible to express the non-parametric small area estimation problem as a mixed effect model regression. The resulting model is readily fitted by using existing model fitting approaches such as restricted maximum likelihood. We present theoretical results on the prediction mean-squared error of the estimator proposed and on likelihood ratio tests for random effects, and we propose a simple non-parametric bootstrap approach for model inference and estimation of the small area prediction mean-squared error. The applicability of the method is demonstrated on a survey of lakes in north-eastern USA.

179 citations


Journal ArticleDOI
TL;DR: It is shown that kernel machine estimation of the model components can be formulated using a logistic mixed model, and hence can proceed within a mixed model framework using standard statistical software.
Abstract: Background: Growing interest on biological pathways has called for new statistical methods for modeling and testing a genetic pathway effect on a health outcome. The fact that genes within a pathway tend to interact with each other and relate to the outcome in a complicated way makes nonparametric methods more desirable. The kernel machine method provides a convenient, powerful and unified method for multi-dimensional parametric and nonparametric modeling of the pathway effect. Results: In this paper we propose a logistic kernel machine regression model for binary outcomes. This model relates the disease risk to covariates parametrically, and to genes within a genetic pathway parametrically or nonparametrically using kernel machines. The nonparametric genetic pathway effect allows for possible interactions among the genes within the same pathway and a complicated relationship of the genetic pathway and the outcome. We show that kernel machine estimation of the model components can be formulated using a logistic mixed model. Estimation hence can proceed within a mixed model framework using standard statistical software. A score test based on a Gaussian process approximation is developed to test for the genetic pathway effect. The methods are illustrated using a prostate cancer data set and evaluated using simulations. An extension to continuous and discrete outcomes using generalized kernel machine models and its connection with generalized linear mixed models is discussed. Conclusion: Logistic kernel machine regression and its extension generalized kernel machine regression provide a novel and flexible statistical tool for modeling pathway effects on discrete and continuous outcomes. Their close connection to mixed models and attractive performance make them have promising wide applications in bioinformatics and other biomedical areas.

171 citations


Journal ArticleDOI
TL;DR: This joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout and is an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk.
Abstract: In this article we study a joint model for longitudinal measurements and competing risks survival data. Our joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout. It is also an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk. Our model consists of a linear mixed effects submodel for the longitudinal outcome and a proportional cause-specific hazards frailty submodel (Prentice et al., 1978, Biometrics 34, 541-554) for the competing risks survival data, linked together by some latent random effects. We propose to obtain the maximum likelihood estimates of the parameters by an expectation maximization (EM) algorithm and estimate their standard errors using a profile likelihood method. The developed method works well in our simulation studies and is applied to a clinical trial for the scleroderma lung disease.

158 citations


Journal ArticleDOI
TL;DR: The present article proposes a model to integrate fixed-, random-, and mixed-effects meta-analyses into the SEM framework, and shows how the SEM-based meta-analysis can be used to handle missing covariates, to quantify the heterogeneity of effect sizes, and to address the heterogeneityof effect sizes with mixture models.
Abstract: Meta-analysis and structural equation modeling (SEM) are two important statistical methods in the behavioral, social, and medical sciences. They are generally treated as two unrelated topics in the literature. The present article proposes a model to integrate fixed-, random-, and mixed-effects meta-analyses into the SEM framework. By applying an appropriate transformation on the data, studies in a meta-analysis can be analyzed as subjects in a structural equation model. This article also highlights some practical benefits of using the SEM approach to conduct a meta-analysis. Specifically, the SEM-based meta-analysis can be used to handle missing covariates, to quantify the heterogeneity of effect sizes, and to address the heterogeneity of effect sizes with mixture models. Examples are used to illustrate the equivalence between the conventional meta-analysis and the SEM-based meta-analysis. Future directions on and issues related to the SEM-based meta-analysis are discussed.

153 citations


Journal ArticleDOI
TL;DR: A modelling framework to study the relationship between two paired longitudinally observed variables using penalized splines to model the mean curves and the principal component curves and cast the proposed model into a mixed-effects model framework for model fitting, prediction and inference.
Abstract: We propose a modelling framework to study the relationship between two paired longitudinally observed variables. The data for each variable are viewed as smooth curves measured at discrete time-points plus random errors. While the curves for each variable are summarized using a few important principal components, the association of the two longitudinal variables is modelled through the association of the principal component scores. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed-effects model framework for model fitting, prediction and inference. The proposed method can be applied in the difficult case in which the measurement times are irregular and sparse and may differ widely across individuals. Use of functional principal components enhances model interpretation and improves statistical and numerical stability of the parameter estimates.

145 citations


Journal ArticleDOI
TL;DR: This work develops functional principal components analysis for this situation and demonstrates the prediction of individual trajectories from sparse observations and can handle missing data and lead to predictions of the functional principal component scores which serve as random effects in this model.
Abstract: Summary In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time The repeated observations could be binomial, Poisson or of another discrete type or could be continuousThe timings of the repeated measurements are often sparse and irregular We introduce a latent Gaussian process model for such data, establishing a connection to functional data analysis The functional methods proposed are non-parametric and computationally straightforward as they do not involve a likelihood We develop functional principal components analysis for this situation and demonstrate the prediction of individual trajectories from sparse observations This method can handle missing data and leads to predictions of the functional principal component scores which serve as random effects in this modelThese scores can then be used for further statistical analysis, such as inference, regression, discriminant analysis or clustering We illustrate these non-parametric methods with longitudinal data on primary biliary cirrhosis and show in simulations that they are competitive in comparisons with generalized estimating equations and generalized linear mixed models

Journal ArticleDOI
TL;DR: In this article, the authors provide a transparent, robust, and computationally feasible statistical platform for restricted likelihood ratio testing (RLRT) for zero variance components in linear mixed models.
Abstract: The goal of our article is to provide a transparent, robust, and computationally feasible statistical platform for restricted likelihood ratio testing (RLRT) for zero variance components in linear mixed models. This problem is nonstandard because under the null hypothesis the parameter is on the boundary of the parameter space. Our proposed approach is different from the asymptotic results of Stram and Lee who assumed that the outcome vector can be partitioned into many independent subvectors. Thus, our methodology applies to a wider class of mixed models, which includes models with a moderate number of clusters or nonparametric smoothing components. We propose two approximations to the finite sample null distribution of the RLRT statistic. Both approximations converge weakly to the asymptotic distribution obtained by Stram and Lee when their assumptions hold. When their assumptions do not hold, we show in extensive simulation studies that both approximations outperform the Stram and Lee approximation and...

Journal ArticleDOI
TL;DR: The recently developed Bayesian wavelet‐based functional mixed model methodology is applied to analyze MALDI‐TOF mass spectrometry proteomic data to identify spectral regions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a prespecified level.
Abstract: In this article, we apply the recently developed Bayesian wavelet-based functional mixed model methodology to analyze MALDI-TOF mass spectrometry proteomic data. By modeling mass spectra as functions, this approach avoids reliance on peak detection methods. The flexibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical or experimental covariates that may affect both the intensities and locations of peaks in the spectra. For example, this provides a straightforward way to account for systematic block and batch effects that characterize these data. From the model output, we identify spectral regions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a prespecified level. We apply this method to two cancer studies.

Journal ArticleDOI
TL;DR: Calculations are done for a variety of mixed models to show patterns in the asymptotic bias of the estimator based on the maximum of the Laplace approximation of the log-likelihood.

Journal ArticleDOI
TL;DR: Convenient parameterizations requiring few random effects are proposed, which allow such models to be estimated using widely available software for linear mixed models (continuous phenotypes) or generalized linear mixed Models (categorical phenotypes).
Abstract: Biometrical genetic modeling of twin or other family data can be used to decompose the variance of an observed response or 'phenotype' into genetic and environmental components. Convenient parameterizations requiring few random effects are proposed, which allow such models to be estimated using widely available software for linear mixed models (continuous phenotypes) or generalized linear mixed models (categorical phenotypes). We illustrate the proposed approach by modeling family data on the continuous phenotype birth weight and twin data on the dichotomous phenotype depression. The example data sets and commands for Stata and R/S-PLUS are available at the Biometrics website.

01 Jan 2008
TL;DR: In this paper, a semiparametric extension of multi-level regression models that includes mixed and fixed effects models as its two extreme cases is introduced, where the smoothing parameter serves as its switcher.
Abstract: We introduce a semiparametric extension of multi-level regression models that includes mixed and fixed effects models as its two extreme cases. In some practical cases, one could consider the fixed effects model as an over parametrized model without modeling but just plugging in dummies. In other words, it suffers from “too many parameters but too little model”. The mixed effects model tries to overcome this by using just random effects and therefore has “too few parameters but too much model”, where “too much model” refers to the necessary assumptions made. We propose including a nonparametric term that allows the practitioner to shift the model smoothly between these extremes, depending on its data and underlying problem. Thereby, the smoothing parameter serves as its switcher. We will show that so we can filter out possible dependency between covariates and random effects. We further provide consistent bootstrap procedures for possible inference and to analyze prediction power. The positive implications of using this model are highlighted in particular for small area statistics and econometrics. This is underlined by simulation studies and a real data application.1

Journal ArticleDOI
TL;DR: A simple and robust Bayesian parametric approach that relaxes this assumption ofNormality of random effects and error terms by using a multivariate skew-elliptical distribution, which includes the SkeW-t, Skew-normal, t-Student, and Normal distributions as special cases and provides flexibility in capturing a broad range of non-normal and asymmetric behavior is presented.

Journal ArticleDOI
TL;DR: A new, artificial intelligence (AI)‐based approach is suggested for improving the accuracy of traffic predictions through suitably combining the forecasts derived from a set of individual predictors, which employs a fuzzy rule‐based system (FRBS), which is augmented with an appropriate metaheuristic technique to automate the tuning of the system parameters within an online adaptive rolling horizon framework.
Abstract: This paper looks at the problem of accuracy of short-term traffic flow forecasting in the complex case of urban signalized arterial networks. A new, artificial intelligence-based approach is offered for improving accuracy of traffic predictions through suitably combining forecasts derived from a set of individual predictors. This approach employs a fuzzy rule-based system (FRBS), which is augmented with an appropriate metaheuristic (direct search) technique to automate the tuning of the system parameters within an online adaptive rolling horizon framework. The proposed hybrid FRBS is used to nonlinearly combine traffic flow forecasts resulting from an online adaptive Kalman filter and an artificial neural network model. Empirical results obtained from the model's implementation into an actual urban signalized arterial show the ability of the proposed approach to considerably overperform the given individual traffic predictors.

Journal ArticleDOI
TL;DR: An efficient hybrid ECME-NR algorithm for the computation of maximum-likelihood estimates of parameters is presented and a score test statistic for testing the existence of skewness preference among random effects is developed.
Abstract: This paper extends the classical linear mixed model by considering a multivariate skew-normal assumption for the distribution of random effects. We present an efficient hybrid ECME-NR algorithm for the computation of maximum-likelihood estimates of parameters. A score test statistic for testing the existence of skewness preference among random effects is developed. The technique for the prediction of future responses under this model is also investigated. The methodology is illustrated through an application to Framingham cholesterol data and a simulation study.

Journal ArticleDOI
TL;DR: This work proposes a parametric bootstrap approach to estimate the entire distribution of a suitably centered and scaled EBLUP, and shows the superiority of this method over existing techniques of constructing prediction intervals in linear mixed models.
Abstract: Empirical best linear unbiased prediction (EBLUP) method uses a linear mixed model in combining information from different sources of information. This method is particularly useful in small area problems. The variability of an EBLUP is traditionally measured by the mean squared prediction error (MSPE), and interval estimates are generally constructed using estimates of the MSPE. Such methods have shortcomings like under-coverage or over-coverage, excessive length and lack of interpretability. We propose a parametric bootstrap approach to estimate the entire distribution of a suitably centered and scaled EBLUP. The bootstrap histogram is highly accurate, and differs from the true EBLUP distribution by only $O(d^3n^{-3/2})$, where $d$ is the number of parameters and $n$ the number of observations. This result is used to obtain highly accurate prediction intervals. Simulation results demonstrate the superiority of this method over existing techniques of constructing prediction intervals in linear mixed models.

Journal ArticleDOI
TL;DR: It is shown that R^2 statistics that involve the residuals are unable to adequately discriminate between the correct model and one from which important fixed-effect covariates are omitted if the computation of the predicted values for the residualS included the random effects.

Journal ArticleDOI
TL;DR: This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models, and gives sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure.
Abstract: Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied, but these do not naturally extend to nonstandard situations such as mixed effects models, where simple definition of the sample size is not meaningful. This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure. The methods are illustrated through simulation studies and real data analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models, and give sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure.
Abstract: Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied, but these do not naturally extend to nonstandard situations such as mixed effects models, where simple definition of the sample size is not meaningful. This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure. The methods are illustrated through simulation studies and real data analysis.

Journal ArticleDOI
TL;DR: In this paper, a parametric bootstrap approach is proposed to estimate the entire distribution of a suitably centered and scaled EBLUP, which is used to obtain highly accurate prediction intervals.
Abstract: Empirical best linear unbiased prediction (EBLUP) method uses a linear mixed model in combining information from different sources of information. This method is particularly useful in small area problems. The variability of an EBLUP is traditionally measured by the mean squared prediction error (MSPE), and interval estimates are generally constructed using estimates of the MSPE. Such methods have shortcomings like under-coverage or over-coverage, excessive length and lack of interpretability. We propose a parametric bootstrap approach to estimate the entire distribution of a suitably centered and scaled EBLUP. The bootstrap histogram is highly accurate, and differs from the true EBLUP distribution by only O(d 3 n -3/2 ), where d is the number of parameters and n the number of observations. This result is used to obtain highly accurate prediction intervals. Simulation results demonstrate the superiority of this method over existing techniques of constructing prediction intervals in linear mixed models.

Journal ArticleDOI
TL;DR: In this paper, a diameter increment model is developed and evaluated for individual trees of ponderosa pine throughout the species range in the United States using a multilevel linear mixed model.

Journal ArticleDOI
TL;DR: It is important that generalized mixed models are available which relax the normality assumption, and a replacement of the normal distribution with a mixture of Gaussian distributions specified on a grid whereby only the weights of the mixture components are estimated using a penalized approach ensuring a smooth distribution for the random effects is proposed.

Journal ArticleDOI
TL;DR: Weighted GEE (WGEE) has been proposed as an elegant way to ensure validity under MAR and results provide striking evidence that MI-GEE is both less biased and more accurate in the small to moderate sample sizes which typically arise in clinical trials.

Journal ArticleDOI
TL;DR: It is concluded that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.
Abstract: Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called “heterotachy.” Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model—which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree—and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.

Journal ArticleDOI
TL;DR: Two bootstrap-corrected variants of the Akaike information criterion are proposed for the purpose of small-sample mixed model selection and suggest that both criteria provide effective tools for choosing a mixed model with an appropriate mean and covariance structure.

Journal ArticleDOI
TL;DR: This paper proposes a practical computational method to obtain the maximum likelihood estimates (MLE) for mixed models with non-normal random effects by simply multiplying and dividing a standard normal density, which substantially reduces computational time.
Abstract: In this paper, we propose a practical computational method to obtain the maximum likelihood estimates (MLE) for mixed models with non-normal random effects. By simply multiplying and dividing a standard normal density, we reformulate the likelihood conditional on the non-normal random effects to that conditional on the normal random effects. Gaussian quadrature technique, conveniently implemented in SAS Proc NLMIXED, can then be used to carry out the estimation process. Our method substantially reduces computational time, while yielding similar estimates to the probability integral transformation method (J. Comput. Graphical Stat. 2006; 15:39-57). Furthermore, our method can be applied to more general situations, e.g. finite mixture random effects or correlated random effects from Clayton copula. Simulations and applications are presented to illustrate our method.