scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2012"


Journal ArticleDOI
TL;DR: This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods, and applies the resulting estimation method to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects.
Abstract: Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data, where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard methodology in the statistics literature for this type of data is the mixed effects model, where these differences between objects are represented by so-called "random effects" that are estimated from the data (population-level relationships are termed "fixed effects," together resulting in a mixed effects model). This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities, and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations.

184 citations


Journal ArticleDOI
TL;DR: The authors examined the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effect and estimation of variance.
Abstract: Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects and estimation of random effects variances. We describe examples, theoretical calculations and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.

171 citations


Journal ArticleDOI
TL;DR: A new class of functional models in which smoothing splines are used to model fixed effects as well as random effects is introduced, which inherit the flexibility of the linear mixed effects models in handling complex designs and correlation structures.
Abstract: Functional mixed effects model (FMM) is a mixed effects modeling framework that both the fixed effects and the random effects are modeled by nonparametric curves. The combination of mixed effects model and nonparametric smoothing enables FMMs to handle outcomes with complex profiles and at the same time to incorporate complex experimental designs and include covariates. Estimation and inference can be performed either using techniques from linear mixed effects models or using fully Bayesian approaches. As in functional data analysis, inference in FMMs is preliminary and needs to be further investigated. Several software packages have been developed to implement FMMs, although computational challenges do exist no matter which smoothing method is used. WIREs Comput Stat 2012, 4:527–534. doi: 10.1002/wics.1226 For further resources related to this article, please visit the WIREs website

144 citations


Journal ArticleDOI
TL;DR: A fully efficient stage-wise method, which carries forward the full variance-covariance matrix of adjusted means from the individual environments to the analysis across the series of trials, and has close connections with meta-analysis, where environments correspond to centres and genotypes to medical treatments.
Abstract: Plant breeders and variety testing agencies routinely test candidate genotypes (crop varieties, lines, test hybrids) in multiple environments. Such multi-environment trials can be efficiently analysed by mixed models. A single-stage analysis models the entire observed data at the level of individual plots. This kind of analysis is usually considered as the gold standard. In practice, however, it is more convenient to use a two-stage approach, in which experiments are first analysed per environment, yielding adjusted means per genotype, which are then summarised across environments in the second stage. Stage-wise approaches suggested so far are approximate in that they cannot fully reproduce a single-stage analysis, except in very simple cases, because the variance-covariance matrix of adjusted means from individual environments needs to be approximated by a diagonal matrix. This paper proposes a fully efficient stage-wise method, which carries forward the full variance-covariance matrix of adjusted means from the individual environments to the analysis across the series of trials. Provided the variance components are known, this method can fully reproduce the results of a single-stage analysis. Computations are made efficient by a diagonalisation of the residual variance-covariance matrix, which necessitates a corresponding linear transformation of both the first-stage estimates (e.g. adjusted means and regression slopes for plot covariates) and the corresponding design matrices for fixed and random effects. We also exemplify the extension of the general approach to a three-stage analysis. The method is illustrated using two datasets, one real and the other simulated. The proposed approach has close connections with meta-analysis, where environments correspond to centres and genotypes to medical treatments. We therefore compare our theoretical results with recently published results from a meta-analysis.

138 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the tradeoff between sampling many individuals a few times versus sampling few individuals often, and showed that when all individuals experience the same conditions during a sampling event, sampling each individual only twice should be strictly avoided.
Abstract: Summary 1. Quantifying individual heterogeneity in plasticity is becoming common in studies of evolutionary ecology, climate change ecology and animal personality. Individual variation in reaction norms is typically quantified using random effects in a mixed modelling framework. However, little is known about what sampling effort and design provide sufficient accuracy, precision and power. 2. I developed ‘odprism’, an easy-to-use software package for the statistical language R, which can be used to investigate the accuracy, precision and power of random regression models for various types of data structures. Moreover, I conducted simulations to derive rules-of-thumb for four design decisions that biologists often face. 3. First, I investigated the trade-off between sampling many individuals a few times versus sampling few individuals often. Generally, at least 40 individuals should be sampled with a total sample size of at least 1000 to obtain accurate and precise estimates of individual variation in elevation and slopes of linear reaction norms and their correlation. Contrasting a previous recommendation, it is worthwhile to bias the ratio of number of individuals over replicates towards sampling more individuals. 4. Second, I considered how the range of environmental conditions over which individuals are sampled affects the optimal sampling strategy. I show that when all individuals experience the same conditions during a sampling event, sampling each individual only twice should be strictly avoided. 5. Third, I examined the case where the number of replicates per individual is constrained by their lifespan, as is common when sampling annual traits in the wild. I show that for a given sampling effort, it is much easier to detect individual variation in reaction norms for long-lived than for short-lived species. 6. Fourth, I investigated the performance of random regression models when studying traits under selection. Reassuringly, directional viability selection barely caused any bias in estimates of variance components. 7. Random regression models are inherently data hungry, and reviewing the literature shows that particularly behavioural studies have low sampling effort. Therefore, the software and rules-of-thumbs I identified for designing reaction-norm studies should help researchers make more informed choices, which likely improve the reliability and interpretation of plasticity studies.

121 citations


Journal ArticleDOI
TL;DR: A standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately is considered, which has been successfully applied to a large-scale association study of multiple sclerosis.
Abstract: Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately. Three novel contributions are (1) a transformation between the linear and log-odds scales which is accurate for the important genetic case of small effect sizes; (2) a likelihood-maximization algorithm that is an order of magnitude faster than the previously published approaches; and (3) efficient methods for computing marginal likelihoods which allow Bayesian model comparison. The methodology has been successfully applied to a large-scale association study of multiple sclerosis including over 20,000 individuals and 500,000 genetic variants.

119 citations


Journal ArticleDOI
TL;DR: 3 approaches for analyzing longitudinal data: repeated measures analysis of variance, covariance pattern models, and growth curve models are compared and the utility of Akaike information criterion and Bayesian information criterion are indicated in the selection of a proper residual covariance structure are presented.
Abstract: With increasing popularity, growth curve modeling is more and more often considered as the 1st choice for analyzing longitudinal data. Although the growth curve approach is often a good choice, other modeling strategies may more directly answer questions of interest. It is common to see researchers fit growth curve models without considering alterative modeling strategies. In this article we compare 3 approaches for analyzing longitudinal data: repeated measures analysis of variance, covariance pattern models, and growth curve models. As all are members of the general linear mixed model family, they represent somewhat different assumptions about the way individuals change. These assumptions result in different patterns of covariation among the residuals around the fixed effects. In this article, we first indicate the kinds of data that are appropriately modeled by each and use real data examples to demonstrate possible problems associated with the blanket selection of the growth curve model. We then present a simulation that indicates the utility of Akaike information criterion and Bayesian information criterion in the selection of a proper residual covariance structure. The results cast doubt on the popular practice of automatically using growth curve modeling for longitudinal data without comparing the fit of different models. Finally, we provide some practical advice for assessing mean changes in the presence of correlated data.

118 citations


Journal ArticleDOI
TL;DR: Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Abstract: Health-related quality of life (HRQL) has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.

81 citations


Journal ArticleDOI
TL;DR: Overall, combining information from related populations and increasing the number of genotypes improved predictive ability, but further allowing for population-specific marker effects made minor improvement.
Abstract: Using different populations in genomic selection raises the possibility of marker effects varying across populations. However, common models for genomic selection only account for the main marker effects, assuming that they are consistent across populations. We present an approach in which the main plus population-specific marker effects are simultaneously estimated in a single mixed model. Cross-validation is used to compare the predictive ability of this model to that of the ridge regression best linear unbiased prediction (RR-BLUP) method involving only either the main marker effects or the population-specific marker effects. We used a maize (Zea mays L.) data set with 312 genotypes derived from five biparental populations, which were genotyped with 39,339 markers. A combined analysis incorporating genotypes for all the populations and hence using a larger training set was better than separate analyses for each population. Modeling the main plus the population-specific marker effects simultaneously improved predictive ability only slightly compared with modeling only the main marker effects. The performance of the RR-BLUP method was comparable to that of two regularization methods, namely the ridge regression and the elastic net, and was more accurate than that of the least absolute shrinkage and selection operator (LASSO). Overall, combining information from related populations and increasing the number of genotypes improved predictive ability, but further allowing for population-specific marker effects made minor improvement.

72 citations


Journal ArticleDOI
TL;DR: This work proposes an exact estimation procedure to obtain the maximum likelihood estimates of the fixed-effects and variance components, using a stochastic approximation of the EM algorithm, and compares the performance of the normal and the SMN models with two real data sets.
Abstract: Nonlinear mixed-effects models are very useful to analyze repeated measures data and are used in a variety of applications. Normal distributions for random effects and residual errors are usually assumed, but such assumptions make inferences vulnerable to the presence of outliers. In this work, we introduce an extension of a normal nonlinear mixed-effects model considering a subclass of elliptical contoured distributions for both random effects and residual errors. This elliptical subclass, the scale mixtures of normal (SMN) distributions, includes heavy-tailed multivariate distributions, such as Student-t, the contaminated normal and slash, among others, and represents an interesting alternative to outliers accommodation maintaining the elegance and simplicity of the maximum likelihood theory. We propose an exact estimation procedure to obtain the maximum likelihood estimates of the fixed-effects and variance components, using a stochastic approximation of the EM algorithm. We compare the performance of the normal and the SMN models with two real data sets.

68 citations


Journal ArticleDOI
TL;DR: In this article, a brief survey of some issues in the application of geostatistics in soil science is presented, where the authors show how the recasting of classical geostatic methods in the linear mixed model (LMM) framework has allowed the more effective integration of soil knowledge (classifications, covariates) with statistical spatial prediction of soil properties.
Abstract: In a brief survey of some issues in the application of geostatistics in soil science it is shown how the recasting of classical geostatistical methods in the linear mixed model (LMM) framework has allowed the more effective integration of soil knowledge (classifications, covariates) with statistical spatial prediction of soil properties. The LMM framework has also allowed the development of models in which the spatial covariance need not be assumed to be stationary. Such models are generally more plausible than stationary ones from a pedological perspective, and when applied to soil data they have been found to give prediction error variances that better describe the uncertainty of predictions at validation sites. Finally consideration is given to how scientific understanding of variable processes in the soil might be used to infer the likely statistical form of the observed soil variation.

Journal ArticleDOI
TL;DR: Three types of probabilistic models are distinguished for time headway (TH) distribution in this paper: the single model, the combined model and the mixed model, and the two mixed models are shown to be statistically equivalent and provide the best fits in a wide range of TH samples.
Abstract: Three types of probabilistic models are distinguished for time headway (TH) distribution in this paper: the single model, the combined model and the mixed model. To challenge the flexibility of the models, a sample set is established based on different sampling methods according to different data bases from the roadways in France. Particularly, the data from the RN118 national roadway are aggregated over 6 min and classified according to traffic flow and traffic occupancy. An estimation process is proposed for the existing estimation methods when calibrating combined and mixed models. As a result, the two mixed models, the gamma based Semi-Poisson Model and the gamma based Generalized Queuing Model (gamma-GQM) are shown to be statistically equivalent, provide the best fits in a wide range of TH samples. The gamma-GQM without location parameter is recommended to use in TH modeling. Besides, the Shifted Hyper Log-normal Model (HyperLNM) is examined for the first time and fits to TH data very well in many cases. The statistical role of the location parameter in TH models is also discussed. Moreover, it is found that the Ratio between time Headway and Instantaneous Speed (RHIS) can be modeled well using the gamma-GQM.

Journal ArticleDOI
TL;DR: The use of random effects in statistical modeling has become more commonplace as research questions have become more sophisticated as discussed by the authors, coupled with the rapid advancement in computational abilities, and this treatment of the independent variable is often sufficient.
Abstract: Traditional linear regression at the level taught in most introductory statistics courses involves the use of ‘fixed effects’ as predictors of a particular outcome. This treatment of the independent variable is often sufficient. However, as research questions have become more sophisticated, coupled with the rapid advancement in computational abilities, the use of random effects in statistical modeling has become more commonplace. Treating predictors in a model as a random effect allows for more general conclusions—a great example being the treatment of the studies that comprise a meta-analysis as random rather than fixed. In addition, utilization of random effects allows for more accurate representation of data that arise from complicated study designs, such as multilevel and longitudinal studies, which in turn allows for more accurate inference on the fixed effects that tend to be of primary interest. It is important to note the distinctions between fixed and random effects in the most general of settings, while also knowing the benefits and risks to their simultaneous use in specific yet common situations. WIREs Comput Stat 2012, 4:181–190. doi: 10.1002/wics.201

Journal ArticleDOI
TL;DR: The aim of this article is to increase the use of mixed models by giving a concise practical introduction and by giving clear directions for undertaking the analysis in the most popular statistical packages.
Abstract: Psychologists, psycholinguists, and other researchers using language stimuli have been struggling for more than 30 years with the problem of how to analyze experimental data that contain two crossed random effects (items and participants). The classical analysis of variance does not apply; alternatives have been proposed but have failed to catch on, and a statistically unsatisfactory procedure of using two approximations (known as F1 and F2) has become the standard. A simple and elegant solution using mixed model analysis has been available for 15 years, and recent improvements in statistical software have made mixed models analysis widely available. The aim of this article is to increase the use of mixed models by giving a concise practical introduction and by giving clear directions for undertaking the analysis in the most popular statistical packages. The article also introduces the djmixed add-on package for SPSS, which makes entering the models and reporting their results as straightforward as possible.

Journal ArticleDOI
TL;DR: The concept of boosting is extended to generalized additive mixed models and an appropriate algorithm is presented that uses two different approaches for the fitting procedure of the variance components of the random effects.
Abstract: Objective: With the emergence of semi- and nonparametric regression the generalized linear mixed model has been extended to account for additive predictors. However, available fitting methods fail in high dimensional settings where many explanatory variables are present. We extend the concept of boosting to generalized additive mixed models and present an appropriate algorithm that uses two different approaches for the fitting procedure of the variance components of the random effects. Methods: The main tool developed is likelihood-based componentwise boosting that enforces variable selection in generalized additive mixed models. In contrast to common procedures they can be used in high-dimensional settings where many covariates are available and the form of the influence is unknown. The complexity of the resulting estimators is determined by information criteria. The performance of the methods is investigated in simulation studies for binary and Poisson responses with comparisons to alternative approaches and it is applied to clinical real world data. Results: Simulations show that the proposed methods are considerably more stable and more accurate in estimating the regression function than the conventional approach, especially when a large number of predictors is available. The methods also produce reasonable results in applications to real data sets, which is illustrated by the Multicenter AIDS Cohort Study. Conclusions: The boosting algorithm allows to extract relevant predictors in generalized additive mixed models. It works in high-dimensional settings and is very stable.

Journal ArticleDOI
TL;DR: An understanding of mixed models and marginal models is provided via a thorough exploration of the methods that have been used historically in the biomedical literature to summarize and make inferences about this type of data.
Abstract: Background Researchers often describe the collection of repeated measurements on each individual in a study design. Advanced statistical methods, namely, mixed and marginal models, are the preferred analytic choices for analyzing this type of data. Objective The aim was to provide a conceptual understanding of these modeling techniques. Approach An understanding of mixed models and marginal models is provided via a thorough exploration of the methods that have been used historically in the biomedical literature to summarize and make inferences about this type of data. The limitations are discussed, as is work done on expanding the classic linear regression model to account for repeated measurements taken on an individual, leading to the broader mixed-model framework. Results A description is provided of a variety of common types of study designs and data structures that can be analyzed using a mixed model and a marginal model. Discussion This work provides an overview of advanced statistical modeling techniques used for analyzing the many types of correlated .data collected in a research study.

Journal ArticleDOI
TL;DR: A simple way to choose the associated hyper-parameters is proposed and is valid for any generalized linear mixed model and particular attention is paid to the study of probit mixed models when some variables are linear combinations of others.

Journal ArticleDOI
TL;DR: Simulations with a two-compartment pharmacokinetic model verified that shrinkage has a reversed relationship with the relative ratio of interindividual variability to residual variability and sample size has very limited impact on shrinkage of the PK parameters of the two-Compartment model.
Abstract: Shrinkage of empirical Bayes estimates (EBEs) of posterior individual parameters in mixed-effects models has been shown to obscure the apparent correlations among random effects and relationships between random effects and covariates. Empirical quantification equations have been widely used for population pharmacokinetic/pharmacodynamic models. The objectives of this manuscript were (1) to compare the empirical equations with theoretically derived equations, (2) to investigate and confirm the influencing factor on shrinkage, and (3) to evaluate the impact of shrinkage on estimation errors of EBEs using Monte Carlo simulations. A mathematical derivation was first provided for the shrinkage in nonlinear mixed effects model. Using a linear mixed model, the simulation results demonstrated that the shrinkage estimated from the empirical equations matched those based on the theoretically derived equations. Simulations with a two-compartment pharmacokinetic model verified that shrinkage has a reversed relationship with the relative ratio of interindividual variability to residual variability. Fewer numbers of observations per subject were associated with higher amount of shrinkage, consistent with findings from previous research. The influence of sampling times appeared to be larger when fewer PK samples were collected for each individual. As expected, sample size has very limited impact on shrinkage of the PK parameters of the two-compartment model. Assessment of estimation error suggested an average 1:1 relationship between shrinkage and median estimation error of EBEs.

Journal ArticleDOI
TL;DR: In this paper, a geographical weighted empirical best linear unbiased predictor (GWEBLUP) for a small area average is proposed, and an estimator of its conditional mean squared error is developed.

Journal ArticleDOI
TL;DR: A mixed model framework for censored longitudinal data in which the random effects are represented by the flexible seminonparametric density is developed andSimulations show that this approach can lead to reduction in bias and increase in efficiency relative to assuming Gaussian random effects.
Abstract: Mixed models are commonly used to represent longitudinal or repeated measures data. An additional complication arises when the response is censored, for example, due to limits of quantification of the assay used. While Gaussian random effects are routinely assumed, little work has characterized the consequences of misspecifying the random-effects distribution nor has a more flexible distribution been studied for censored longitudinal data. We show that, in general, maximum likelihood estimators will not be consistent when the random-effects density is misspecified, and the effect of misspecification is likely to be greatest when the true random-effects density deviates substantially from normality and the number of noncensored observations on each subject is small. We develop a mixed model framework for censored longitudinal data in which the random effects are represented by the flexible seminonparametric density and show how to obtain estimates in SAS procedure NLMIXED. Simulations show that this approach can lead to reduction in bias and increase in efficiency relative to assuming Gaussian random effects. The methods are demonstrated on data from a study of hepatitis C virus.

Journal ArticleDOI
TL;DR: This work proposes a modeling framework for predicting a binary event from longitudinal measurements where a shared random effect links the two processes together and shows that estimates of predictive accuracy under a Gaussian random effects distribution are robust to severe misspecification of this distribution.
Abstract: The use of longitudinal data for predicting a subsequent binary event is often the focus of diagnostic studies. This is particularly important in obstetrics, where ultrasound measurements taken during fetal development may be useful for predicting various poor pregnancy outcomes. We propose a modeling framework for predicting a binary event from longitudinal measurements where a shared random effect links the two processes together. Under a Gaussian random effects assumption, the approach is simple to implement with standard statistical software. Using asymptotic and simulation results, we show that estimates of predictive accuracy under a Gaussian random effects distribution are robust to severe misspecification of this distribution. However, under some circumstances, estimates of individual risk may be sensitive to severe random effects misspecification. We illustrate the methodology with data from a longitudinal fetal growth study.

Journal ArticleDOI
09 Nov 2012-PLOS ONE
TL;DR: This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects.
Abstract: Prediction accuracies of estimated breeding values for economically important traits are expected to benefit from genomic information. Single nucleotide polymorphism (SNP) panels used in genomic prediction are increasing in density, but the Markov Chain Monte Carlo (MCMC) estimation of SNP effects can be quite time consuming or slow to converge when a large number of SNPs are fitted simultaneously in a linear mixed model. Here we present an EM algorithm (termed “fastBayesA”) without MCMC. This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects. In each EM iteration, SNP effects are predicted as a linear combination of best linear unbiased predictions of breeding values from a mixed linear animal model that incorporates a weighted marker-based realized relationship matrix. Method fastBayesA converges after a few iterations to a joint posterior mode of SNP effects under the BayesA model. When applied to simulated quantitative traits with a range of genetic architectures, fastBayesA is shown to predict GEBV as accurately as BayesA but with less computing effort per SNP than BayesA. Method fastBayesA can be used as a computationally efficient substitute for BayesA, especially when an increasing number of markers bring unreasonable computational burden or slow convergence to MCMC approaches.

Journal ArticleDOI
TL;DR: This article analyzed the application of the linear mixed model (LMM) to a mixed repeated measures design and showed that the degree of robustness increased in line with the amount of kurtosis, and the robustness of KR was null.
Abstract: Using a Monte Carlo simulation and the Kenward–Roger (KR) correction for degrees of freedom, in this article we analyzed the application of the linear mixed model (LMM) to a mixed repeated measures design. The LMM was first used to select the covariance structure with three types of data distribution: normal, exponential, and log-normal. This showed that, with homogeneous between-groups covariance and when the distribution was normal, the covariance structure with the best fit was the unstructured population matrix. However, with heterogeneous between-groups covariance and when the pairing between covariance matrices and group sizes was null, the best fit was shown by the between-subjects heterogeneous unstructured population matrix, which was the case for all of the distributions analyzed. By contrast, with positive or negative pairings, the within-subjects and between-subjects heterogeneous first-order autoregressive structure produced the best fit. In the second stage of the study, the robustness of the LMM was tested. This showed that the KR method provided adequate control of Type I error rates for the time effect with normally distributed data. However, as skewness increased—as occurs, for example, in the log-normal distribution—the robustness of KR was null, especially when the assumption of sphericity was violated. As regards the influence of kurtosis, the analysis showed that the degree of robustness increased in line with the amount of kurtosis.

Journal ArticleDOI
TL;DR: In this paper, a hierarchical Bayes (HB) method using a time series generalization of a widely used cross-sectional model in small-area estimation was used for the analysis of the U.S. state-level unemployment rate.

Journal ArticleDOI
TL;DR: It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves and is applied to two clinical studies and compared to the existing approach.

Journal ArticleDOI
TL;DR: The authors investigated the potential of such models to increase the efficiency of low and high-input field trials, 17 experiments with 70 sorghum genotypes conducted in Mali, West Africa, were analysed for grain yield using different mixed models including models with autoregressive spatial correlation terms.
Abstract: Breeding sorghum for low-input conditions is hindered by soil heterogeneity. Spatial adjustment using mixed models can help account for this variation and increase precision of low-input field trials. Large small-scale spatial variation (CV 39.4 %) for plant available phosphorus was mapped in an intensely sampled low-input field. Spatial adjustments were shown to account for residual yield differences because of this and other growth factors. To investigate the potential of such models to increase the efficiency of low- and high-input field trials, 17 experiments with 70 sorghum genotypes conducted in Mali, West Africa, were analysed for grain yield using different mixed models including models with autoregressive spatial correlation terms. Spatial models (AR1, AR2) improved broad sense heritability estimates for grain yield, averaging gains of 10 and 6 % points relative to randomized complete block (RCB) and lattice models, respectively. The heritability estimate gains were even higher under low phosphorus conditions and in two-replicate analyses. No specific model was best for all environments. A single spatial model, AR1 × AR1, captured most of the gains for heritability and relative efficiency provided by the best model identified for each environment using Akaike's Information Criterion. Spatial modelling resulted in important changes in genotype ranking for grain yield. Thus, the use of spatial models was shown to have potentially important consequences for aiding effective sorghum selection in West Africa, particularly under low-input conditions and for trials with fewer replications. Thus, using spatial models can improve the resource allocation of a breeding program. Furthermore, our results show that good experimental design with optimal placement and orientation of blocks is essential for efficient statistical analysis with or without spatial adjustment.

Journal ArticleDOI
TL;DR: A robust and unified framework for automatically selecting random effects and estimating covariance components in linear mixed models is proposed and extended to incorporate the selection of fixed effects as well.
Abstract: The selection of random eects in linear mixed models is an important yet challenging problem in practice. We propose a robust and unied framework for automatically selecting random eects and estimating covariance components in linear mixed models. A moment-based loss function is rst constructed for esti- mating the covariance matrix of random eects. Two types of shrinkage penalties, a hard thresholding operator and a new sandwich-type soft-thresholding penalty, are then imposed for sparse estimation and random eects selection. Compared with existing approaches, the new procedure does not require any distributional assumption on the random eects and error terms. We establish the asymptotic properties of the resulting estimator in terms of its consistency in both random eects selection and variance component estimation. Optimization strategies are

Journal ArticleDOI
TL;DR: This work proposes a method for direct classified risk mapping based on a Poisson log-linear mixed model with a latent discrete Markov random field and uses a Monte Carlo version of the expectation-maximization algorithm to estimate parameters and determine risk classes.
Abstract: Risk mapping in epidemiology enables areas with a low or high risk of disease contamination to be localized and provides a measure of risk differences between these regions. Risk mapping models for pooled data currently used by epidemiologists focus on the estimated risk for each geographical unit. They are based on a Poisson log-linear mixed model with a latent intrinsic continuous hidden Markov random field (HMRF) generally corresponding to a Gaussian autoregressive spatial smoothing. Risk classification, which is necessary to draw clearly delimited risk zones (in which protection measures may be applied), generally must be performed separately. We propose a method for direct classified risk mapping based on a Poisson log-linear mixed model with a latent discrete HMRF. The discrete hidden field (HF) corresponds to the assignment of each spatial unit to a risk class. The risk values attached to the classes are parameters and are estimated. When mapping risk using HMRFs, the conditional distribution of the observed field is modeled with a Poisson rather than a Gaussian distribution as in image segmentation. Moreover, abrupt changes in risk levels are rare in disease maps. The spatial hidden model should favor smoothed out risks, but conventional discrete Markov random fields (e.g. the Potts model) do not impose this. We therefore propose new potential functions for the HF that take into account class ordering. We use a Monte Carlo version of the expectation-maximization algorithm to estimate parameters and determine risk classes. We illustrate the method's behavior on simulated and real data sets. Our method appears particularly well adapted to localize high-risk regions and estimate the corresponding risk levels.

Journal ArticleDOI
TL;DR: A semiparametric transformation model that can be fitted to a general nonlinear mixed model, including linear or nonlinear regression models, mixed effect models, factor analysis models, and other latent variable models as special cases is developed.
Abstract: In this paper, we aim to develop a semiparametric transformation model. Nonparametric transformation functions are modeled with Bayesian P-splines. The transformed variables can be fitted to a general nonlinear mixed model, including linear or nonlinear regression models, mixed effect models, factor analysis models, and other latent variable models as special cases. Markov chain Monte Carlo algorithms are implemented to estimate transformation functions and unknown quantities in the model. The performance of the developed methodology is demonstrated with a simulation study. Its application to a real study on polydrug use is presented.

Journal Article
TL;DR: A new hybrid prediction model including two single models of nonparametric regression model and BP neural network model was proposed according to the periodicity and randomness properties of short-term traffic flow to fully illustrate the cyclical stability of traffic flow.
Abstract: A new hybrid prediction model including two single models of nonparametric regression model and BP neural network model was proposed according to the periodicity and randomness properties of short-term traffic flowRelevant historical traffic flow data were used in nonparametric regression model to make the prediction result abtained from the databases matching proceeding fully illustrate the cyclical stability of traffic flowThree-tier BP neural network model was used to reflect the dynamic and nonlinear characters of traffic flowFuzzy control algorithm was adopted to get the weight coefficient of each modelNew mixed model was constituted by the two single models according to different weight coefficientsThe prediction effect of hybrid prediction model was verified by the traffic flow data in 30 d from a certain section in Xi'anExperimental result indicates that the average relative error of mixed model is 126%,and its maximum relative error is 353%,so the prediction accuracy of mixed model is obviously higher than two single models,and can accurately reflect the real situation of traffic flow6 tabs,5 figs,16 refs